Poltronieri, Andrea
(2025)
Harmonising music information retrieval with semantics: from data integration to multimodality, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 37 Ciclo. DOI 10.48676/unibo/amsdottorato/11758.
Documenti full-text disponibili:
Abstract
In the era of big data and machine learning, the fragmentation of musical datasets and lack of standardised representations hinder advancements in Music Information Retrieval (MIR). The multifaceted nature of music complicates both the representation of content - with small, task-specific datasets scattered across various formats, and context (metadata), where there is a lack of consistent terminology. These challenges increase the effort required for data collection and pre-processing, reduce reproducibility, and limit the scalability of MIR models. To address these issues, this thesis proposes a unified semantic model to foster interoperability and advance MIR tasks. A specific instance of this fragmentation can be found in harmonic annotations, where harmony is inconsistently represented across datasets, formats, and notational systems. Taking harmony as a use case, this thesis develops a standardised workflow to harmonise disconnected datasets, enabling the creation of large-scale unified corpora. Building on these harmonised datasets, a key contribution is the exploration of harmonic similarity to reveal connections across diverse tracks, periods, and genres through novel state-of-the-art similarity functions. While integrating symbolic data offers significant advantages, certain limitations persist. Primary challenges include the limited diversity of annotated data, often biased toward a narrow range of musical genres, and the inherent ambiguity and subjectivity in harmonic annotations. Such challenges have led MIR tasks like Audio Chord Estimation (ACE) to hit a "glass ceiling," where neither increasing computational power nor the volume of data has led to improved results. To address these issues, this thesis explores a multimodal approach combining audio and chord annotations. We propose a method for enriching datasets with aligned audio annotations and introduce a new ACE model that embeds music theory concepts like consonance and dissonance. This model aims to mitigate chord vocabulary imbalance and annotation subjectivity, advancing the state-of-the-art in audio-based harmonic analysis.
Abstract
In the era of big data and machine learning, the fragmentation of musical datasets and lack of standardised representations hinder advancements in Music Information Retrieval (MIR). The multifaceted nature of music complicates both the representation of content - with small, task-specific datasets scattered across various formats, and context (metadata), where there is a lack of consistent terminology. These challenges increase the effort required for data collection and pre-processing, reduce reproducibility, and limit the scalability of MIR models. To address these issues, this thesis proposes a unified semantic model to foster interoperability and advance MIR tasks. A specific instance of this fragmentation can be found in harmonic annotations, where harmony is inconsistently represented across datasets, formats, and notational systems. Taking harmony as a use case, this thesis develops a standardised workflow to harmonise disconnected datasets, enabling the creation of large-scale unified corpora. Building on these harmonised datasets, a key contribution is the exploration of harmonic similarity to reveal connections across diverse tracks, periods, and genres through novel state-of-the-art similarity functions. While integrating symbolic data offers significant advantages, certain limitations persist. Primary challenges include the limited diversity of annotated data, often biased toward a narrow range of musical genres, and the inherent ambiguity and subjectivity in harmonic annotations. Such challenges have led MIR tasks like Audio Chord Estimation (ACE) to hit a "glass ceiling," where neither increasing computational power nor the volume of data has led to improved results. To address these issues, this thesis explores a multimodal approach combining audio and chord annotations. We propose a method for enriching datasets with aligned audio annotations and introduce a new ACE model that embeds music theory concepts like consonance and dissonance. This model aims to mitigate chord vocabulary imbalance and annotation subjectivity, advancing the state-of-the-art in audio-based harmonic analysis.
Tipologia del documento
Tesi di dottorato
Autore
Poltronieri, Andrea
Supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
music information retrieval; harmony; computational musicology; chord estimation; signal processing; symbolic music processing; music similarity; ontology; semantic web; knowledge graph
DOI
10.48676/unibo/amsdottorato/11758
Data di discussione
9 Aprile 2025
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Poltronieri, Andrea
Supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
music information retrieval; harmony; computational musicology; chord estimation; signal processing; symbolic music processing; music similarity; ontology; semantic web; knowledge graph
DOI
10.48676/unibo/amsdottorato/11758
Data di discussione
9 Aprile 2025
URI
Statistica sui download
Gestione del documento: