Guided scene perception

Bartolomei, Luca (2026) Guided scene perception, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Ingegneria e tecnologia dell'informazione per il monitoraggio strutturale e ambientale e la gestione dei rischi - eit4semm, 38 Ciclo. DOI 10.48676/unibo/amsdottorato/12475.

Salva citazione

Citato da

Documenti full-text disponibili:

[thumbnail of Luca_Bartolomei_Tesi_Dottorato (2).pdf]

Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (62MB)

Abstract

This thesis addresses the fundamental challenge of depth perception through novel multi-sensor fusion strategies. We introduce a general framework based on virtual pattern projection that seamlessly integrates stereo vision with active sensors, such as LiDARs, mimicking active stereo principles without the limitations of physical pattern projectors. Unlike previous fusion methods, our hallucination approach operates externally to the matching algorithm, making it compatible with any stereo method without code modification or retraining, while achieving state-of-the-art performance with as little as 1\% of active depth measurements. The framework's inherent redundancy enables graceful degradation: in case of active sensor failure, it reduces to passive stereo without accuracy loss; in case of camera failure, it naturally adapts to domain-generalized depth completion -- an unexplored branch of depth completion where a single model, in our case a modern stereo network, can infer dense depth maps across a wide range of environments and conditions. To further reduce setup costs, we propose replacing expensive active sensors with monocular Vision Foundation Models (VFMs). The resulting framework, dubbed Stereo Anywhere, unifies geometric constraints and learned contextual priors within a dual-branch design. This approach achieves zero-shot generalization and remarkable robustness to challenging scenarios such as textureless regions, occlusions, and non-Lambertian surfaces. Furthermore, we extend these principles to the neuromorphic domain, pioneering the fusion of event-based stereo with LiDAR through event hallucination. Finally, we move toward completing the puzzle by extending the Stereo Anywhere concept to the neuromorphic domain. As a first step, we address the absence of event-based Vision Foundation Models by introducing a cross-modal distillation framework that transfers depth priors from RGB-based VFMs to event data, bypassing the scarcity of event data. The next step, integrating monocular cues with the stereo pipeline to build a fully event-based Stereo Anywhere framework, remains a promising direction for future research.

Abstract

Tipologia del documento

Tesi di dottorato

Autore

Bartolomei, Luca

Supervisore

Mattoccia, Stefano

Co-supervisore

Poggi, Matteo

Dottorato di ricerca

Ingegneria e tecnologia dell'informazione per il monitoraggio strutturale e ambientale e la gestione dei rischi - eit4semm

Ciclo

Coordinatore

De Marchi, Luca

Settore disciplinare

Area 09 - Ingegneria industriale e dell'informazione > ING-INF/05 Sistemi di elaborazione delle informazioni

Settore concorsuale

Area 09 - Ingegneria industriale e dell'informazione > 09/H - Ingegneria informatica > 09/H1 Sistemi di elaborazione delle informazioni

Parole chiave

Deep Learning, Machine Learning, Computer Vision, 3D reconstruction, Depth perception, Multi-sensor fusion, Stereo vision, Virtual pattern projection, Active stereo, Depth completion, Event-based stereo, Monocular depth estimation, Sensor fusion

DOI

10.48676/unibo/amsdottorato/12475

Data di discussione

27 Marzo 2026

URI

https://amsdottorato.unibo.it/id/eprint/12475

Altri metadati

Statistica sui download

Vedi altre statistiche

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

Guided scene perception

Abstract

Altri metadati

Statistica sui download