Navarin, Nicolò
(2014)

*Learning with Kernels on Graphs: DAG-based kernels, data streams and RNA function prediction.*, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in

Informatica, 26 Ciclo. DOI 10.6092/unibo/amsdottorato/6578.

Documenti full-text disponibili:

## Abstract

In many application domains data can be naturally represented as graphs.
When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources.
There are three main contributions in this thesis.
The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets.
The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management.
The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Abstract

In many application domains data can be naturally represented as graphs.
When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources.
There are three main contributions in this thesis.
The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets.
The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management.
The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Tipologia del documento

Tesi di dottorato

Autore

Navarin, Nicolò

Supervisore

Dottorato di ricerca

Scuola di dottorato

Scienze e ingegneria dell'informazione

Ciclo

26

Coordinatore

Settore disciplinare

Settore concorsuale

Parole chiave

machine learning, kernel methods, graph kernels, online learning, bioinformatics

URN:NBN

DOI

10.6092/unibo/amsdottorato/6578

Data di discussione

19 Maggio 2014

URI

## Altri metadati

Tipologia del documento

Tesi di dottorato

Autore

Navarin, Nicolò

Supervisore

Dottorato di ricerca

Scuola di dottorato

Scienze e ingegneria dell'informazione

Ciclo

26

Coordinatore

Settore disciplinare

Settore concorsuale

Parole chiave

machine learning, kernel methods, graph kernels, online learning, bioinformatics

URN:NBN

DOI

10.6092/unibo/amsdottorato/6578

Data di discussione

19 Maggio 2014

URI

## Statistica sui download

Gestione del documento: