Simultaneous Localization and 3D-Semi Dense Mapping for Micro Drones Using Monocular Camera and Inertial Sensors
Danial, Asher, Klein
Monocular simultaneous localization and mapping (SLAM) algorithms estimate drone poses and build a 3D map using a single camera. Current algorithms include sparse methods that lack detailed geometry, while learning-driven approaches produce dense maps but are computationally intensive. Monocular SLAM also faces scale ambiguities, which affect its accuracy. To address these challenges, we propose an edge-aware lightweight monocular SLAM system combining sparse keypoint-based pose estimation with dense edge reconstruction. Our method employs deep learning-based depth prediction and edge detection, followed by optimization to refine keypoints and edges for geometric consistency, without relying on global loop closure or heavy neural computations. We fuse inertial data with vision by using an extended Kalman filter to resolve scale ambiguity and improve accuracy. The system operates in real time on low-power platforms, as demonstrated on a DJI Tello drone with a monocular camera and inertial sensors. In addition, we demonstrate robust autonomous navigation and obstacle avoidance in indoor corridors and on the TUM RGBD dataset. Our approach offers an effective, practical solution to real-time mapping and navigation in resource-constrained environments.
academic
Simultaneous Localization and 3D-Semi Dense Mapping for Micro Drones Using Monocular Camera and Inertial Sensors
This paper addresses the challenges of simultaneous localization and mapping (SLAM) for micro drones using monocular cameras by proposing an edge-aware lightweight monocular SLAM system. The system combines sparse keypoint pose estimation with dense edge reconstruction, employing deep learning for depth prediction and edge detection, while achieving geometric consistency through optimization without relying on global loop closure or heavy neural network computation. The system fuses inertial data with visual information using an Extended Kalman Filter to address scale ambiguity and improve accuracy. Real-time implementation on the DJI Tello drone demonstrates robust autonomous navigation and obstacle avoidance capabilities on the TUM RGB-D dataset.
Sparse Map Problem: Traditional feature-point-based SLAM systems (e.g., ORB-SLAM) effectively estimate pose but generate overly sparse 3D point clouds lacking structural richness, making them unsuitable for tasks requiring dense 3D understanding.
Computational Resource Constraints: Existing learning-driven dense SLAM methods (e.g., NeRF, NICE-SLAM) are computationally intensive and difficult to run in real-time on resource-constrained embedded platforms.
Scale Ambiguity: The inherent scale uncertainty in monocular SLAM affects localization accuracy.
Global Optimization Overhead: Traditional SLAM relies on loop closure detection and global bundle adjustment, incurring significant computational costs.
Autonomous navigation of micro drones requires real-time, accurate 3D perception capabilities for navigation, obstacle avoidance, and environmental interaction. Achieving this on resource-constrained embedded platforms represents a core challenge in robotics.
Develop a lightweight, real-time SLAM system capable of generating semi-dense maps on resource-constrained platforms while maintaining high-precision pose estimation.
Shape-Aware Structural Constraints: Geometric regularization based on L-shaped structures, enhancing structural consistency in indoor environments.
Local Geometric Optimization: Multi-objective bundle adjustment jointly optimizing camera poses, keypoints, and edge segments without global loop closure or dense voxel fusion.
Estimate essential matrix E from matched ORB features via epipolar geometry:
u_j^T E_ij u_i = 0
Use RANSAC to remove outliers; recover relative rotation R_ij and translation t_ij via SVD decomposition
Extended Kalman Filter Fusion:
State vector:
x = [p, α]^T = [x, y, z, φ, θ, ψ]^T
where p is global position and α is Euler angles (roll, pitch, yaw)
Prediction step:
p_{k|k-1} = p_{k-1} + R_imu(α_{k-1}) · v_imu · Δt
Adaptive Process Noise:
Q_k = β · (1 - b_k + λτ) · I_6
where b_k is battery level and τ is time since last monocular update, accounting for SDK data accuracy degradation with battery depletion and time progression
Measurement update:
Observation 1: Euler angles from SDK z_api = α_api
Observation 2: Global pose estimation from visual odometry (via accumulated relative poses)
Overall Assessment: This is a practical SLAM research work targeting resource-constrained platforms with reasonable technical approach and convincing experimental results. Main contributions lie in system engineering and method integration rather than single algorithmic breakthrough. The 74.7% accuracy improvement and 100× speedup have practical value. However, the paper has room for improvement in experimental comparison, ablation analysis, and theoretical depth. Suitable for publication in robotics application conferences or journals.