Design, Creation and Evaluation of PodIT: A Corpus of Spoken Italian in the Media

Giacobbe, Joanna (2026) Design, Creation and Evaluation of PodIT: A Corpus of Spoken Italian in the Media. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)
Download (2MB)

Abstract

A corpus is large collection of samples of authentic use, selected to be representative of a whole language or language variety. A corpus is the fundamental tool for corpus linguistics, a branch of linguistics that studies language through real-life examples of language use. The present dissertation addresses the design and collection of PodIT, a corpus of spoken Italian in the media. The corpus was designed with the objective of creating a representative and balanced corpus; it was created by combining manual and automated processes, in order to achieve the best possible result while acknowledging time and resource constraints. The corpus contains 100,000 words, it is POS-tagged and lemmatized, and the texts contain relevant metadata that can be leveraged to conduct comparative analyses within the corpus. PodIT was also analyzed in order to explore its potential. The analysis has also proven the validity of the corpus as a resource to analyze both spoken Italian in the media and general spoken Italian.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Giacobbe, Joanna

Relatore della tesi

Bernardini, Silvia

Correlatore della tesi

Milicevic Petrovic, Maja ; Polizzi, Daniele

Scuola

Lingue e Letterature, Traduzione e Interpretazione

Corso di studio