Bitext alignment: building and evaluating a bilingual corpus and translation memory of academic course descriptions

Cocozza, Daniele (2015) Bitext alignment: building and evaluating a bilingual corpus and translation memory of academic course descriptions. [Laurea magistrale], Università di Bologna, Corso di Studio in Traduzione specializzata [LM-DM270] - Forli'
Documenti full-text disponibili:
Documento PDF
Download (2MB) | Anteprima


Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Cocozza, Daniele
Relatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
alignment, corpora, translation technology, English as a Lingua Franca, academic course descriptions
Data di discussione della Tesi
12 Marzo 2015

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento