Guided scene perception

Bartolomei, Luca (2026) Guided scene perception, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Ingegneria e tecnologia dell'informazione per il monitoraggio strutturale e ambientale e la gestione dei rischi - eit4semm, 38 Ciclo. DOI 10.48676/unibo/amsdottorato/12475.
Documenti full-text disponibili:
[thumbnail of Luca_Bartolomei_Tesi_Dottorato (2).pdf] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (62MB)

Abstract

This thesis addresses the fundamental challenge of depth perception through novel multi-sensor fusion strategies. We introduce a general framework based on virtual pattern projection that seamlessly integrates stereo vision with active sensors, such as LiDARs, mimicking active stereo principles without the limitations of physical pattern projectors. Unlike previous fusion methods, our hallucination approach operates externally to the matching algorithm, making it compatible with any stereo method without code modification or retraining, while achieving state-of-the-art performance with as little as 1\% of active depth measurements. The framework's inherent redundancy enables graceful degradation: in case of active sensor failure, it reduces to passive stereo without accuracy loss; in case of camera failure, it naturally adapts to domain-generalized depth completion -- an unexplored branch of depth completion where a single model, in our case a modern stereo network, can infer dense depth maps across a wide range of environments and conditions. To further reduce setup costs, we propose replacing expensive active sensors with monocular Vision Foundation Models (VFMs). The resulting framework, dubbed Stereo Anywhere, unifies geometric constraints and learned contextual priors within a dual-branch design. This approach achieves zero-shot generalization and remarkable robustness to challenging scenarios such as textureless regions, occlusions, and non-Lambertian surfaces. Furthermore, we extend these principles to the neuromorphic domain, pioneering the fusion of event-based stereo with LiDAR through event hallucination. Finally, we move toward completing the puzzle by extending the Stereo Anywhere concept to the neuromorphic domain. As a first step, we address the absence of event-based Vision Foundation Models by introducing a cross-modal distillation framework that transfers depth priors from RGB-based VFMs to event data, bypassing the scarcity of event data. The next step, integrating monocular cues with the stereo pipeline to build a fully event-based Stereo Anywhere framework, remains a promising direction for future research.

Abstract
Tipologia del documento
Tesi di dottorato
Autore
Bartolomei, Luca
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
38
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Deep Learning, Machine Learning, Computer Vision, 3D reconstruction, Depth perception, Multi-sensor fusion, Stereo vision, Virtual pattern projection, Active stereo, Depth completion, Event-based stereo, Monocular depth estimation, Sensor fusion
DOI
10.48676/unibo/amsdottorato/12475
Data di discussione
27 Marzo 2026
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi

^