Automatic Speech Recognition (ASR) and NMT for Interlingual and Intralingual Communication: Speech to Text Technology for Live Subtitling and Accessibility.

Gregori, Alessandro (2021) Automatic Speech Recognition (ASR) and NMT for Interlingual and Intralingual Communication: Speech to Text Technology for Live Subtitling and Accessibility., [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Traduzione, interpretazione e interculturalità, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9931.
Documenti full-text disponibili:
[img] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (3MB)

Abstract

Considered the increasing demand for institutional translation and the multilingualism of international organizations, the application of Artificial Intelligence (AI) technologies in multilingual communications and for the purposes of accessibility has become an important element in the production of translation and interpreting services (Zetzsche, 2019). In particular, the widespread use of Automatic Speech Recognition (ASR) and Neural Machine Translation (NMT) technology represents a recent development in the attempt of satisfying the increasing demand for interinstitutional, multilingual communications at inter-governmental level (Maslias, 2017). Recently, researchers have been calling for a universalistic view of media and conference accessibility (Greco, 2016). The application of ASR, combined with NMT, may allow for the breaking down of communication barriers at European institutional conferences where multilingualism represents a fundamental pillar (Jopek Bosiacka, 2013). In addition to representing a so-called disruptive technology (Accipio Consulting, 2006), ASR technology may facilitate the communication with non-hearing users (Lewis, 2015). Thanks to ASR, it is possible to guarantee content accessibility for non-hearing audience via subtitles at institutionally-held conferences or speeches. Hence the need for analysing and evaluating ASR output: a quantitative approach is adopted to try to make an evaluation of subtitles, with the objective of assessing its accuracy (Romero-Fresco, 2011). A database of F.A.O.’s and other international institutions’ English-language speeches and conferences on climate change is taken into consideration. The statistical approach is based on WER and NER models (Romero-Fresco, 2016) and on an adapted version. The ASR software solution implemented into the study will be VoxSigma by Vocapia Research and Google Speech Recognition engine. After having defined a taxonomic scheme, Native and Non-Native subtitles are compared to gold standard transcriptions. The intralingual and interlingual output generated by NMT is specifically analysed and evaluated via the NTR model to evaluate accuracy and accessibility.

Abstract
Tipologia del documento
Tesi di dottorato
Autore
Gregori, Alessandro
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
33
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
automatic speech recognition, neural machine translation, artificial intelligence, accessibility, non-hearing users, deaf, institutional translation, institutional interpreting, climate change, multimedia accessibility
URN:NBN
DOI
10.48676/unibo/amsdottorato/9931
Data di discussione
26 Ottobre 2021
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi

^