Documenti full-text disponibili:
|
Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (643kB)
|
Abstract
This dissertation deals with definitional contexts extraction and automatic definitions linking in the Italian and English language. Definitional contexts extraction is a task that is not limited to glossaries and encyclopaediae, but has been addressed also in the field of Natural Language Processing. In this research, the objective is to identify definitional contexts in food-related Wikipedia articles. To set the basis of the work, we built two ad-hoc corpora out of the Italian and English dumps of Wikipedia. We trained two BERT models in a supervised fashion with a manually annotated dataset. The F1-measures of 96.08 and 97.66 testify the high performance. We then fed each model with 30 Wikipedia articles randomly extracted from the two corpora, one with Italian and one with English articles. We obtained the best results by restricting the selection to the first sentence of the article whose BERT positive score is above 0.6.
The task of automatic definitions linking is loosely based on the wikification process. Rather than linking a term to its corresponding Wikipedia article, we aim at linking a term to its corresponding definition in a Wikipedia article. To lay the foundation of the task, we built two ad-hoc corpora from a cooking website in its Italian and English version. We created a pipeline for automatic definitions linking and carried out a successful experiment using the title of a recipe as input text, the output of which is a minimalistic HTML version of the input, whose terms are linked to their corresponding Wikipedia articles. The definitions linking is one of the two missing steps in the pipeline and discussed in the conclusions.
Abstract
This dissertation deals with definitional contexts extraction and automatic definitions linking in the Italian and English language. Definitional contexts extraction is a task that is not limited to glossaries and encyclopaediae, but has been addressed also in the field of Natural Language Processing. In this research, the objective is to identify definitional contexts in food-related Wikipedia articles. To set the basis of the work, we built two ad-hoc corpora out of the Italian and English dumps of Wikipedia. We trained two BERT models in a supervised fashion with a manually annotated dataset. The F1-measures of 96.08 and 97.66 testify the high performance. We then fed each model with 30 Wikipedia articles randomly extracted from the two corpora, one with Italian and one with English articles. We obtained the best results by restricting the selection to the first sentence of the article whose BERT positive score is above 0.6.
The task of automatic definitions linking is loosely based on the wikification process. Rather than linking a term to its corresponding Wikipedia article, we aim at linking a term to its corresponding definition in a Wikipedia article. To lay the foundation of the task, we built two ad-hoc corpora from a cooking website in its Italian and English version. We created a pipeline for automatic definitions linking and carried out a successful experiment using the title of a recipe as input text, the output of which is a minimalistic HTML version of the input, whose terms are linked to their corresponding Wikipedia articles. The definitions linking is one of the two missing steps in the pipeline and discussed in the conclusions.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Martinelli, Margherita
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
computational linguistics,definitional context,definitional context extraction,definitions linking,python,corpora,wikipedia
Data di discussione della Tesi
15 Marzo 2022
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Martinelli, Margherita
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
computational linguistics,definitional context,definitional context extraction,definitions linking,python,corpora,wikipedia
Data di discussione della Tesi
15 Marzo 2022
URI
Statistica sui download
Gestione del documento: