Pagliarani, Andrea
(2019)
Big Data mining and machine learning techniques applied to real world scenarios, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 31 Ciclo. DOI 10.6092/unibo/amsdottorato/8904.
Documenti full-text disponibili:
|
Documento PDF (English)
- Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (6MB)
|
Abstract
Data mining techniques allow the extraction of valuable information from heterogeneous and possibly very large data sources, which can be either structured or unstructured. Unstructured data, such as text files, social media, mobile data, are much more than structured data, and grow at a higher rate. Their high volume and the inherent ambiguity of natural language make unstructured data very hard to process and analyze. Appropriate text representations are therefore required in order to capture word semantics as well as to preserve statistical information, e.g. word counts. In Big Data scenarios, scalability is also a primary requirement. Data mining and machine learning approaches should take advantage of large-scale data, exploiting abundant information and avoiding the curse of dimensionality.
The goal of this thesis is to enhance text understanding in the analysis of big data sets, introducing novel techniques that can be employed for the solution of real world problems. The presented Markov methods temporarily achieved the state-of-the-art on well-known Amazon reviews corpora for cross-domain sentiment analysis, before being outperformed by deep approaches in the analysis of large data sets.
A noise detection method for the identification of relevant tweets leads to 88.9% accuracy in the Dow Jones Industrial Average daily prediction, which is the best result in literature based on social networks. Dimensionality reduction approaches are used in combination with LinkedIn users' skills to perform job recommendation. A framework based on deep learning and Markov Decision Process is designed with the purpose of modeling job transitions and recommending pathways towards a given career goal. Finally, parallel primitives for vendor-agnostic implementation of Big Data mining algorithms are introduced to foster multi-platform deployment, code reuse and optimization.
Abstract
Data mining techniques allow the extraction of valuable information from heterogeneous and possibly very large data sources, which can be either structured or unstructured. Unstructured data, such as text files, social media, mobile data, are much more than structured data, and grow at a higher rate. Their high volume and the inherent ambiguity of natural language make unstructured data very hard to process and analyze. Appropriate text representations are therefore required in order to capture word semantics as well as to preserve statistical information, e.g. word counts. In Big Data scenarios, scalability is also a primary requirement. Data mining and machine learning approaches should take advantage of large-scale data, exploiting abundant information and avoiding the curse of dimensionality.
The goal of this thesis is to enhance text understanding in the analysis of big data sets, introducing novel techniques that can be employed for the solution of real world problems. The presented Markov methods temporarily achieved the state-of-the-art on well-known Amazon reviews corpora for cross-domain sentiment analysis, before being outperformed by deep approaches in the analysis of large data sets.
A noise detection method for the identification of relevant tweets leads to 88.9% accuracy in the Dow Jones Industrial Average daily prediction, which is the best result in literature based on social networks. Dimensionality reduction approaches are used in combination with LinkedIn users' skills to perform job recommendation. A framework based on deep learning and Markov Decision Process is designed with the purpose of modeling job transitions and recommending pathways towards a given career goal. Finally, parallel primitives for vendor-agnostic implementation of Big Data mining algorithms are introduced to foster multi-platform deployment, code reuse and optimization.
Tipologia del documento
Tesi di dottorato
Autore
Pagliarani, Andrea
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
31
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
sentiment analysis, opinion mining, stock market prediction, job recommendation, career pathway recommendation, recommender systems, natural language processing, cross-domain, transfer learning, domain adaptation, deep learning, big data platforms, markov models, neural networks
URN:NBN
DOI
10.6092/unibo/amsdottorato/8904
Data di discussione
4 Aprile 2019
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Pagliarani, Andrea
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
31
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
sentiment analysis, opinion mining, stock market prediction, job recommendation, career pathway recommendation, recommender systems, natural language processing, cross-domain, transfer learning, domain adaptation, deep learning, big data platforms, markov models, neural networks
URN:NBN
DOI
10.6092/unibo/amsdottorato/8904
Data di discussione
4 Aprile 2019
URI
Statistica sui download
Gestione del documento: