Learning to understand the world in 3D

Spezialetti, Riccardo (2020) Learning to understand the world in 3D, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Computer science and engineering, 32 Ciclo. DOI 10.6092/unibo/amsdottorato/9513.

Salva citazione

Citato da

Documenti full-text disponibili:

[thumbnail of spezialetti_riccardo_tesi.pdf]

Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons: Attribuzione - Non Commerciale - Condividi allo Stesso Modo 4.0 (CC BY-NC-SA 4.0) .
Download (72MB)

Abstract

3D Computer vision is a research topic gathering even increasing attention thanks to the more and more widespread availability of off-the-shelf depth sensors and large-scale 3D datasets. The main purpose of 3D computer vision is to understand the geometry of the objects in order to interact with them. Recently, the success of deep neural networks for processing images has fostered a data driven approach to solve 3D vision problems. Inspired by the potential of this field, in this thesis we will address two main problems: (a) how to leverage machine/deep learning techniques to build a robust and effective pipeline to establish correspondences between surfaces, and (b) how to obtain a reliable 3D reconstruction of an object using RGB images sparsely acquired from different point of views by means of deep neural networks. At the heart of many 3D computer vision applications lies surface matching, an effective paradigm aimed at finding correspondences between points belonging to different shapes. To this end, it is essential to first identify the characteristic points of an object and then create an adequate representation of them. We will refer to these two steps as keypoint detection and keypoint description, respectively. As a first contribution (a) of this Ph.D thesis, we will propose data driven solutions to tackle the problems of keypoint detection and description. As a further interesting direction of research, we investigate the problem of 3D object reconstruction from RGB data only (b). If in the past this application has been addressed by SLAM and Structure from motion (SfM) techniques, this radically changed in recent years thanks to the dawn of deep learning. Following this trend, we will introduce a novel approach that combines traditional computer vision techniques with deep learning to perform a view point variant 3D object reconstruction from non-overlapping RGB views.

Abstract

Tipologia del documento

Tesi di dottorato

Autore

Spezialetti, Riccardo

Supervisore

Di Stefano, Luigi

Dottorato di ricerca

Computer science and engineering

Ciclo

Coordinatore

Sangiorgi, Davide

Settore disciplinare

Area 09 - Ingegneria industriale e dell'informazione > ING-INF/05 Sistemi di elaborazione delle informazioni

Settore concorsuale

Area 09 - Ingegneria industriale e dell'informazione > 09/H - Ingegneria informatica > 09/H1 Sistemi di elaborazione delle informazioni

Parole chiave

3D computer vision; deep learning; surface matching; 3D keypoints detection; 3D keypoints description; canonical orientation; local reference frame; multiview reconstruction; surface registration; relative pose estimation; point cloud; deformable matching

URN:NBN

urn:nbn:it:unibo-26969

DOI

10.6092/unibo/amsdottorato/9513

Data di discussione

6 Novembre 2020

URI

https://amsdottorato.unibo.it/id/eprint/9513