Documenti full-text disponibili:
Abstract
A corpus is large collection of samples of authentic use, selected to be representative of a whole language or language variety. A corpus is the fundamental tool for corpus linguistics, a branch of linguistics that studies language through real-life examples of language use. The present dissertation addresses the design and collection of PodIT, a corpus of spoken Italian in the media. The corpus was designed with the objective of creating a representative and balanced corpus; it was created by combining manual and automated processes, in order to achieve the best possible result while acknowledging time and resource constraints. The corpus contains 100,000 words, it is POS-tagged and lemmatized, and the texts contain relevant metadata that can be leveraged to conduct comparative analyses within the corpus. PodIT was also analyzed in order to explore its potential. The analysis has also proven the validity of the corpus as a resource to analyze both spoken Italian in the media and general spoken Italian.
Abstract
A corpus is large collection of samples of authentic use, selected to be representative of a whole language or language variety. A corpus is the fundamental tool for corpus linguistics, a branch of linguistics that studies language through real-life examples of language use. The present dissertation addresses the design and collection of PodIT, a corpus of spoken Italian in the media. The corpus was designed with the objective of creating a representative and balanced corpus; it was created by combining manual and automated processes, in order to achieve the best possible result while acknowledging time and resource constraints. The corpus contains 100,000 words, it is POS-tagged and lemmatized, and the texts contain relevant metadata that can be leveraged to conduct comparative analyses within the corpus. PodIT was also analyzed in order to explore its potential. The analysis has also proven the validity of the corpus as a resource to analyze both spoken Italian in the media and general spoken Italian.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Giacobbe, Joanna
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM TRANSLATION AND TECHNOLOGY
Ordinamento Cds
DM270
Parole chiave
corpus,corpus linguistics,spoken language,spoken Italian,corpus building
Data di discussione della Tesi
20 Marzo 2026
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Giacobbe, Joanna
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM TRANSLATION AND TECHNOLOGY
Ordinamento Cds
DM270
Parole chiave
corpus,corpus linguistics,spoken language,spoken Italian,corpus building
Data di discussione della Tesi
20 Marzo 2026
URI
Statistica sui download
Gestione del documento: