With only a monocular sensor, an extensible framework of simultaneous localization and mapping (SLAM) is designed with depth recovery for a DJI-autonomous drone, which is integrated in an onboard computer by utilizing the Robot Operating System (ROS). Specifically, on the basis of the repository ORB-SLAM2, the scaled pose of the onboard monocular vision is obtained, and the AprilTag2 is utilizied to make fusion operation to recover scene depth information. In addition, based on the monocular sensor, a pose conversion module and a pose release module are designed to allow the drone to directly use the corresponding pose information. With the provided pose information, an outer-loop geometric tracking controller for the unmaned aerial vehicle (UAV) is integrated to complete flight tasks generated from the trajectory planning module, by calculating the required thrust and attitude. Since communication interfaces are defined clearly in ROS, the proposed framework can be easily implemented, making a UAV executes the trajectory tracking and landing tasks stably and accurately with merely monocular vision. Practical experiments are performed to validate the effectiveness of the proposed framework.