Design, Creation and Evaluation of PodIT: A Corpus of Spoken Italian in the Media

Giacobbe, Joanna (2026) Design, Creation and Evaluation of PodIT: A Corpus of Spoken Italian in the Media. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)

Download (2MB)

Abstract

A corpus is large collection of samples of authentic use, selected to be representative of a whole language or language variety. A corpus is the fundamental tool for corpus linguistics, a branch of linguistics that studies language through real-life examples of language use. The present dissertation addresses the design and collection of PodIT, a corpus of spoken Italian in the media. The corpus was designed with the objective of creating a representative and balanced corpus; it was created by combining manual and automated processes, in order to achieve the best possible result while acknowledging time and resource constraints. The corpus contains 100,000 words, it is POS-tagged and lemmatized, and the texts contain relevant metadata that can be leveraged to conduct comparative analyses within the corpus. PodIT was also analyzed in order to explore its potential. The analysis has also proven the validity of the corpus as a resource to analyze both spoken Italian in the media and general spoken Italian.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Giacobbe, Joanna
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM TRANSLATION AND TECHNOLOGY
Ordinamento Cds
DM270
Parole chiave
corpus,corpus linguistics,spoken language,spoken Italian,corpus building
Data di discussione della Tesi
20 Marzo 2026
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^