Ferri, Veronica
(2015)
Estrazione terminologica automatica: sistemi a confronto.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Traduzione specializzata [LM-DM270] - Forli', Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
In any terminological study, candidate term extraction is a very time-consuming task. Corpus analysis tools have automatized some processes allowing the detection of relevant data within the texts, facilitating term candidate selection as well. Nevertheless, these tools are (normally) not specific for terminology research; therefore, the units which are automatically extracted need manual evaluation. Over the last few years some software products have been specifically developed for automatic term extraction. They are based on corpus analysis, but use linguistic and statistical information to filter data more precisely. As a result, the time needed for manual evaluation is reduced. In this framework, we tried to understand if and how these new tools can really be an advantage. In order to develop our project, we simulated a terminology study: we chose a domain (i.e. legal framework for medicinal products for human use) and compiled a corpus from which we extracted terms and phraseologisms using AntConc, a corpus analysis tool. Afterwards, we compared our list with the lists extracted automatically from three different tools (TermoStat Web, TaaS e Sketch Engine) in order to evaluate their performance.
In the first chapter we describe some principles relating to terminology and phraseology in language for special purposes and show the advantages offered by corpus linguistics. In the second chapter we illustrate some of the main concepts of the domain selected, as well as some of the main features of legal texts. In the third chapter we describe automatic term extraction and the main criteria to evaluate it; moreover, we introduce the term-extraction tools used for this project. In the fourth chapter we describe our research method and, in the fifth chapter, we show our results and draw some preliminary conclusions on the performance and usefulness of term-extraction tools.
Abstract
In any terminological study, candidate term extraction is a very time-consuming task. Corpus analysis tools have automatized some processes allowing the detection of relevant data within the texts, facilitating term candidate selection as well. Nevertheless, these tools are (normally) not specific for terminology research; therefore, the units which are automatically extracted need manual evaluation. Over the last few years some software products have been specifically developed for automatic term extraction. They are based on corpus analysis, but use linguistic and statistical information to filter data more precisely. As a result, the time needed for manual evaluation is reduced. In this framework, we tried to understand if and how these new tools can really be an advantage. In order to develop our project, we simulated a terminology study: we chose a domain (i.e. legal framework for medicinal products for human use) and compiled a corpus from which we extracted terms and phraseologisms using AntConc, a corpus analysis tool. Afterwards, we compared our list with the lists extracted automatically from three different tools (TermoStat Web, TaaS e Sketch Engine) in order to evaluate their performance.
In the first chapter we describe some principles relating to terminology and phraseology in language for special purposes and show the advantages offered by corpus linguistics. In the second chapter we illustrate some of the main concepts of the domain selected, as well as some of the main features of legal texts. In the third chapter we describe automatic term extraction and the main criteria to evaluate it; moreover, we introduce the term-extraction tools used for this project. In the fourth chapter we describe our research method and, in the fifth chapter, we show our results and draw some preliminary conclusions on the performance and usefulness of term-extraction tools.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Ferri, Veronica
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
corpora, terminologia, termini, estrazione automatica
Data di discussione della Tesi
12 Marzo 2015
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Ferri, Veronica
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
corpora, terminologia, termini, estrazione automatica
Data di discussione della Tesi
12 Marzo 2015
URI
Statistica sui download
Gestione del documento: