Code stylometry, a metric learning approach

Gurioli, Andrea (2023) Code stylometry, a metric learning approach. [Laurea magistrale], Università di Bologna, Corso di Studio in Informatica [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (1MB)

Abstract

Authorship attribution, also recognized as code stylometry, has always been a milestone in obtaining important information for what concerns plagiarism and de-anonymization tasks, assessing the author in several different ways through the years. The proposed work revolves around the whole problem, starting with the mining of a new dataset which faces data scarcity and domain bias problems that afflicted the former works. Diving then into a new machine learning model design, derived from the former state-of-the-art techniques, which tries to gain advantages from Natural language process practices adopted by the newest language models. The problem is then tackled by moving through a metric learning technique, dealing for the first time with the stylometry problem as a querying snippet mechanism that allows a zero-shot inference over authors not present in the training set.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Gurioli, Andrea
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
Code stylometry,Machine learning,Data mining,Metric learning
Data di discussione della Tesi
16 Marzo 2023
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^