Zama Ramirez, Pierluigi
Deep Scene Understanding with Limited
Training Data, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9815.
Documenti full-text disponibili:
![zamaramirez_pierluigi_tesi.pdf [thumbnail of zamaramirez_pierluigi_tesi.pdf]]( |
Documento PDF (English)
- Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (81MB)
Scene understanding by a machine is a challenging task due to the profound variety of nature. Nevertheless, deep learning achieves impressive results in several scene understanding tasks such as semantic segmentation, depth
estimation, or optical flow. However, these kinds of approaches need a large amount of labeled data, leading to massive manual annotations, which are incredibly tedious and expensive to collect. In this thesis, we will focus on
understanding a scene through deep learning with limited data availability. First of all, we will tackle the problem of the lack of data for semantic segmentation.
We will show that computer graphics come in handy to our purpose, both to create a new, efficient tool for annotation as well to render synthetic annotated datasets quickly. However, a network trained only on synthetic
data suffers from the so-called domain-shift problem, i.e. unable to generalize to real data. Thus, we will show that we can mitigate this problem using
a novel deep image to image translation technique. In the second part of the thesis, we will focus on the relationship between scene understanding tasks.
We argue that building a model aware of the connections between tasks is the first building stone to create more robust, efficient, performant models that need less annotated training data. In particular, we demonstrate that we can decrease the need for labels by exploiting the relationship between visual tasks. Finally, in the last part, we propose a novel unified framework for comprehensive scene understanding, which exploits the synergies between tasks to be more robust, efficient, and performant.
Scene understanding by a machine is a challenging task due to the profound variety of nature. Nevertheless, deep learning achieves impressive results in several scene understanding tasks such as semantic segmentation, depth
estimation, or optical flow. However, these kinds of approaches need a large amount of labeled data, leading to massive manual annotations, which are incredibly tedious and expensive to collect. In this thesis, we will focus on
understanding a scene through deep learning with limited data availability. First of all, we will tackle the problem of the lack of data for semantic segmentation.
We will show that computer graphics come in handy to our purpose, both to create a new, efficient tool for annotation as well to render synthetic annotated datasets quickly. However, a network trained only on synthetic
data suffers from the so-called domain-shift problem, i.e. unable to generalize to real data. Thus, we will show that we can mitigate this problem using
a novel deep image to image translation technique. In the second part of the thesis, we will focus on the relationship between scene understanding tasks.
We argue that building a model aware of the connections between tasks is the first building stone to create more robust, efficient, performant models that need less annotated training data. In particular, we demonstrate that we can decrease the need for labels by exploiting the relationship between visual tasks. Finally, in the last part, we propose a novel unified framework for comprehensive scene understanding, which exploits the synergies between tasks to be more robust, efficient, and performant.
Tipologia del documento
Tesi di dottorato
Zama Ramirez, Pierluigi
Dottorato di ricerca
Settore disciplinare
Settore concorsuale
Parole chiave
Computer Vision, Deep Learning, Scene Understanding, Synthetic Data, Semantic Segmentation, Depth Estimation, Optical Flow, Domain Adaptation, Transfer Learning
Data di discussione
27 Maggio 2021
Altri metadati
Tipologia del documento
Tesi di dottorato
Zama Ramirez, Pierluigi
Dottorato di ricerca
Settore disciplinare
Settore concorsuale
Parole chiave
Computer Vision, Deep Learning, Scene Understanding, Synthetic Data, Semantic Segmentation, Depth Estimation, Optical Flow, Domain Adaptation, Transfer Learning
Data di discussione
27 Maggio 2021
Statistica sui download
Gestione del documento: