Design and Implementation of a Neural Machine Translation Engine for Computer-Assisted Translations

Saleem Butt, Rooshan (2023) Design and Implementation of a Neural Machine Translation Engine for Computer-Assisted Translations. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)

Download (1MB)

Abstract

This research investigates the development of a Neural Machine Translation (NMT) engine for seamless integration into Computer-Assisted Translation (CAT) software via an Application Programming Interface (API). The study conducts a comprehensive review of state-of-the-art NMT techniques and relevant Language Models (LLMs), including mT5, mBart, MarainMT, and SMaLL100. The study extracts data from Trados Studio .tmx files, preprocesses it to construct suitable datasets spanning 22 languages, and fine-tunes pre-trained LMs. The NMT engine's performance undergoes rigorous evaluation, employing a multifaceted approach, including statistical metrics such as BLEU, ROUGE, and Semantic Similarity (cosine similarity) to gauge translation accuracy. The successful integration of the NMT engine into CAT software is facilitated through the development of an API using Flask. Additionally, a user-friendly web frontend provides web-based access to the NMT engine. The findings of this research showcase a significant enhancement in translation performance through the successful integration of the NMT engine into CAT software, opening doors for practical applications in real-world translation scenarios, and empowering human translators with an efficient and powerful tool.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Saleem Butt, Rooshan
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
NMT,CAT,mT5,mBart,MarianMT,SMaLL100,Flask,NLTK,NLP,LMs
Data di discussione della Tesi
21 Ottobre 2023
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^