Visual SLAM for Autonomous Driving anf Flying

Abstract

The future of autonomous driving and flying lies in giving machines the power of sight. For autonomous driving and flying, Visual SLAM (vSLAM) is a critical technology that uses visual sensors to perceive the environment, offering a low-cost, lightweight alternatives. This tutorial motivates its focus on visual information by highlighting how vSLAM provides rich, pixel-level environmental data, mirroring the way humans navigate the world. We will tackle the core challenge of real-time pose estimation and mapping, allowing a vehicle or drone to localize itself and construct a 3D map of its surroundings simultaneously using only visual inputs. This approach is essential for applications where GPS signals are unreliable and pre-existing maps are unavailable. The tutorial will explore the evolution from classic geometric vSLAM methods to modern, learning-based techniques that address persistent challenges like dynamic environments, poor lighting conditions, and the inherent scale ambiguity of various camera setups.

The tutorial is divided into three major sections followed by last section on open challenges and research directions. In the first section (Introduction to Visual SLAM), we will introduce and motivate the concept of SLAM, its categorization, various configuration based on sensors for autonomous driving and flying. We will discuss in detail the building blocks of a typical visual SLAM system e.g., pose estimation, 3D mapping, loop closure etc. In the second section (Classical SLAM systems : Feature-based/sparse to Direct/Dense SLAM) , we will discuss two major type of SLAM approaches feature-based/spare and direct/dense SLAM. Particularly, we will discuss the working of key SLAM systems ORB-SLAM series and LSD SLAM in these two categories. In the third section (Learning Based SLAM Systems), we will discuss building block of learning based Visual SLAM systems. We will talk about key learning-based methods such as CNN-SLAM, Pseudo-RGBD SLAM, Droid-SLAM, including methods that represents the 3D world using implicit representations such as NeRF, Gaussian Splatting, Sign Distance Fields (SDFs). In the Last section, we will conclude with open challenges and research directions in the visual SLAM domain.

Speaker Bio

Dr. Lokender Tiwari, is a Senior Research Scientist / Principal Investigator @ TCS Research, IIT-Delhi R&I Park.

[ Personal Webpage ]

[ email : lokender.work@gmail.com ]

Schedule

Section 1: Introduction to SLAM (45 Mins) [PDF]
Section 2: Classical SLAM Systems : Feature-based/Sparse to Direct/Dense SLAM (60 Mins) [PDF]
Section 3: Learning Based SLAM Systems (60 Mins) [PDF]
Section 4: Open challenges and research directions (15 mins) [PDF]

Sample References

Lokender Tiwari, et al. "Pseudo rgb-d for self-improving monocular slam and depth prediction." ECCV, 2020. [Project Page]
Lokender Tiwari, et al. "A review on monocular tracking and mapping: from model-based to data-driven methods." The Visual Computer 39.11 (2023) [PDF]
Lokender Tiwari, et al. "DGSAC: Density guided sampling and consensus” IEEE WACV, 2018 [PDF]
Mur-Artal, Raul, and Juan D. Tardós. "Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras." IEEE transactions on robotics 33.5 (2017): 1255-1262. [PDF]
Campos, Carlos, et al. "Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam." IEEE transactions on robotics 37.6 (2021): 1874-1890. [PDF]
Teed, Zachary, and Jia Deng. "Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras." Advances in neural information processing systems 34 (2021): 16558-16569. [PDF]
Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Cham: Springer International Publishing, 2014. [PDF]
Murai, Riku, Eric Dexheimer, and Andrew J. Davison. "MASt3R-SLAM: Real-time dense SLAM with 3D reconstruction priors." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025. [PDF]
Keetha, Nikhil, et al. "Splatam: Splat track & map 3d gaussians for dense rgb-d slam." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. [Project Page]
Tateno, Keisuke, et al. "Cnn-slam: Real-time dense monocular slam with learned depth prediction." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [PDF]

A Tutorial on

Visual SLAM for Autonomous Driving and Flying

ICVGIP 2025 Conference @ IIT-Mandi, 17-20 December, 2025

Speaker Bio