Learning to understand the world in 3D

Spezialetti, Riccardo (2020) Learning to understand the world in 3D, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Computer science and engineering, 32 Ciclo. DOI 10.6092/unibo/amsdottorato/9513.
Documenti full-text disponibili:
[img] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons Attribution Non-commercial ShareAlike 4.0 (CC BY-NC-SA 4.0) .
Download (72MB)


3D Computer vision is a research topic gathering even increasing attention thanks to the more and more widespread availability of off-the-shelf depth sensors and large-scale 3D datasets. The main purpose of 3D computer vision is to understand the geometry of the objects in order to interact with them. Recently, the success of deep neural networks for processing images has fostered a data driven approach to solve 3D vision problems. Inspired by the potential of this field, in this thesis we will address two main problems: (a) how to leverage machine/deep learning techniques to build a robust and effective pipeline to establish correspondences between surfaces, and (b) how to obtain a reliable 3D reconstruction of an object using RGB images sparsely acquired from different point of views by means of deep neural networks. At the heart of many 3D computer vision applications lies surface matching, an effective paradigm aimed at finding correspondences between points belonging to different shapes. To this end, it is essential to first identify the characteristic points of an object and then create an adequate representation of them. We will refer to these two steps as keypoint detection and keypoint description, respectively. As a first contribution (a) of this Ph.D thesis, we will propose data driven solutions to tackle the problems of keypoint detection and description. As a further interesting direction of research, we investigate the problem of 3D object reconstruction from RGB data only (b). If in the past this application has been addressed by SLAM and Structure from motion (SfM) techniques, this radically changed in recent years thanks to the dawn of deep learning. Following this trend, we will introduce a novel approach that combines traditional computer vision techniques with deep learning to perform a view point variant 3D object reconstruction from non-overlapping RGB views.

Tipologia del documento
Tesi di dottorato
Spezialetti, Riccardo
Dottorato di ricerca
Settore disciplinare
Settore concorsuale
Parole chiave
3D computer vision; deep learning; surface matching; 3D keypoints detection; 3D keypoints description; canonical orientation; local reference frame; multiview reconstruction; surface registration; relative pose estimation; point cloud; deformable matching
Data di discussione
6 Novembre 2020

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi