Deep Scene Understanding with Limited Training Data

Zama Ramirez, Pierluigi (2021) Deep Scene Understanding with Limited Training Data, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Computer science and engineering, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9815.
Documenti full-text disponibili:
[img] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (81MB)

Abstract

Scene understanding by a machine is a challenging task due to the profound variety of nature. Nevertheless, deep learning achieves impressive results in several scene understanding tasks such as semantic segmentation, depth estimation, or optical flow. However, these kinds of approaches need a large amount of labeled data, leading to massive manual annotations, which are incredibly tedious and expensive to collect. In this thesis, we will focus on understanding a scene through deep learning with limited data availability. First of all, we will tackle the problem of the lack of data for semantic segmentation. We will show that computer graphics come in handy to our purpose, both to create a new, efficient tool for annotation as well to render synthetic annotated datasets quickly. However, a network trained only on synthetic data suffers from the so-called domain-shift problem, i.e. unable to generalize to real data. Thus, we will show that we can mitigate this problem using a novel deep image to image translation technique. In the second part of the thesis, we will focus on the relationship between scene understanding tasks. We argue that building a model aware of the connections between tasks is the first building stone to create more robust, efficient, performant models that need less annotated training data. In particular, we demonstrate that we can decrease the need for labels by exploiting the relationship between visual tasks. Finally, in the last part, we propose a novel unified framework for comprehensive scene understanding, which exploits the synergies between tasks to be more robust, efficient, and performant.

Abstract
Tipologia del documento
Tesi di dottorato
Autore
Zama Ramirez, Pierluigi
Supervisore
Dottorato di ricerca
Ciclo
33
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Computer Vision, Deep Learning, Scene Understanding, Synthetic Data, Semantic Segmentation, Depth Estimation, Optical Flow, Domain Adaptation, Transfer Learning
URN:NBN
DOI
10.48676/unibo/amsdottorato/9815
Data di discussione
27 Maggio 2021
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi

^