Advanced techniques for cross-language annotation projection in legal texts

Antici, Francesco (2021) Advanced techniques for cross-language annotation projection in legal texts. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (602kB)


Nowadays, the majority of the services we benefit from, are provided online and their use is regulated by the acceptance to the terms of service by the users. All our data are handled accordingly with the clauses of such document and all our behaviours must comply with it. Given so, it would be very useful to find automated techniques to ensure fairness of the document or inform the users about possible threats. The focus of this work, is to create resources aimed to the development of such tools in languages other than English, which may lack in linguistic resources and annotated corpus. The enormous breakthroughs of the last years in Natural Language Processing techniques made it possible the creation of such tools through automated and unsupervised process. One of the means to achieve that is through the annotation projection between two parallel corpora. The difficulties and costs of creating ad hoc resource for every language has brought the need to find another way for achieving the goal.\\ This work investigates the cross language annotation projection technique based on sentence embedding and similarity metrics to find matches between sentences. Several combination of methods and algorithms are compared, among which there are monolingual and multilingual embedding neural models. The experiments are conducted on two datasets, where the reference language is always English and the projection are evaluated on Italian, German and Polish. The results obtained provide a robust and reliable technique for the task and a good starting point to build multilingual tools.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Antici, Francesco
Relatore della tesi
Correlatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
Natural Language Processing,Artificial Intelligence,Annotation Projection,Cross Language Annotation,Word Embedding
Data di discussione della Tesi
21 Luglio 2021

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento