The Web as a Historical Corpus: Collecting, Analysing and Selecting Sources on the Recent Past of Academic Institutions

Nanni, Federico (2017) The Web as a Historical Corpus: Collecting, Analysing and Selecting Sources on the Recent Past of Academic Institutions, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Science, cognition and technology, 29 Ciclo. DOI 10.6092/unibo/amsdottorato/7848.
Documenti full-text disponibili:
Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (13MB) | Anteprima


The goal of this thesis is to understand the impact that the transition from analogue to born-digital sources will have on the way historians collect, analyse and select primary evidences. This thesis aims in particular at addressing the simultaneous scarcity and abundance of digital materials and at dealing with these issues by combining the historical method with methodologies from the fields of internet studies and natural language processing. The case study of this work is focused on recollecting sources on the recent past of Italian academic institutions, with specific attention to the University of Bologna. The dissertation is organised in three main parts. Part I offers an extensive overview of the academic background where this thesis is settled. Next, the so-called scarcity issue is addressed, by considering university websites as primary sources for the study of the recent past of academic institutions. With a combination of traditional sources and methods together with solutions from the field of internet studies, Part II presents how the digital past of the University of Bologna has been reconstructed. The collected resources allowed to address the second issue, namely the large abundance of born-digital sources. Part III focuses on collecting, analysing and selecting materials from large collections of academic publications. In particular, it is remarked on the importance of adopting methods from the field of natural language processing in a highly critical way. This point is stressed by presenting a case-study focused on identifying interdisciplinary collaborations through the analysis of a corpus of Ph.D. dissertations. Based on the case-studies presented, the final part of the dissertation describes how this work intends to be a contribution both to the research in digital humanities and in historiography.

Tipologia del documento
Tesi di dottorato
Nanni, Federico
Dottorato di ricerca
Settore disciplinare
Settore concorsuale
Parole chiave
digital history, historiography, web archives, topic models, tool criticism, source criticism, interdisciplinarity, digital humanities
Data di discussione
7 Giugno 2017

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi