Li, Huan
(2025)
Monocular depth estimation based on ground geometry, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 37 Ciclo. DOI 10.48676/unibo/amsdottorato/12061.
Documenti full-text disponibili:
Abstract
Monocular depth estimation, an ill-posed problem due to scale ambiguity, relies on single-image data, lacking multi-view consistency cues. Unlike stereo or LiDAR methods, it avoids costly sensors and complex calibration, making it suitable for autonomous driving, robotics, and augmented reality. Recent deep learning advancements enable end-to-end depth prediction using extensive annotations, but creating high-quality datasets is time-consuming and expensive. Self-supervised methods reduce reliance on labels by leveraging video sequences, yet they often assume static scenes, leading to failures with dynamic objects. To address these limitations, we integrate ground geometry into depth estimation. In static scenes, ground normal vectors from human probes provide accurate scale information, aligning predicted 3D scenes with real-world environments and enabling metric depth estimation. For dynamic scenes, we assume object depths align with their ground contact points. We propose a ground propagation module that iteratively propagates ground features to dynamic objects in the decoder’s latent space, improving depth calibration. Experimental results show enhanced accuracy for moving objects and superior generalization. In summary, leveraging ground geometry significantly improves monocular depth estimation in both static and dynamic environments, offering a reliable solution for diverse applications.
Abstract
Monocular depth estimation, an ill-posed problem due to scale ambiguity, relies on single-image data, lacking multi-view consistency cues. Unlike stereo or LiDAR methods, it avoids costly sensors and complex calibration, making it suitable for autonomous driving, robotics, and augmented reality. Recent deep learning advancements enable end-to-end depth prediction using extensive annotations, but creating high-quality datasets is time-consuming and expensive. Self-supervised methods reduce reliance on labels by leveraging video sequences, yet they often assume static scenes, leading to failures with dynamic objects. To address these limitations, we integrate ground geometry into depth estimation. In static scenes, ground normal vectors from human probes provide accurate scale information, aligning predicted 3D scenes with real-world environments and enabling metric depth estimation. For dynamic scenes, we assume object depths align with their ground contact points. We propose a ground propagation module that iteratively propagates ground features to dynamic objects in the decoder’s latent space, improving depth calibration. Experimental results show enhanced accuracy for moving objects and superior generalization. In summary, leveraging ground geometry significantly improves monocular depth estimation in both static and dynamic environments, offering a reliable solution for diverse applications.
Tipologia del documento
Tesi di dottorato
Autore
Li, Huan
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Monocular depth estimation; Ground geometry; 3D vision
DOI
10.48676/unibo/amsdottorato/12061
Data di discussione
9 Aprile 2025
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Li, Huan
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Monocular depth estimation; Ground geometry; 3D vision
DOI
10.48676/unibo/amsdottorato/12061
Data di discussione
9 Aprile 2025
URI
Statistica sui download
Gestione del documento: