Gurioli, Andrea
(2023)
Code stylometry, a metric learning approach.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Informatica [LM-DM270]
Documenti full-text disponibili:
Abstract
Authorship attribution, also recognized as code stylometry, has always been
a milestone in obtaining important information for what concerns plagiarism
and de-anonymization tasks, assessing the author in several different ways
through the years. The proposed work revolves around the whole problem,
starting with the mining of a new dataset which faces data scarcity and
domain bias problems that afflicted the former works. Diving then into a
new machine learning model design, derived from the former state-of-the-art
techniques, which tries to gain advantages from Natural language process
practices adopted by the newest language models. The problem is then tackled
by moving through a metric learning technique, dealing for the first time
with the stylometry problem as a querying snippet mechanism that allows a
zero-shot inference over authors not present in the training set.
Abstract
Authorship attribution, also recognized as code stylometry, has always been
a milestone in obtaining important information for what concerns plagiarism
and de-anonymization tasks, assessing the author in several different ways
through the years. The proposed work revolves around the whole problem,
starting with the mining of a new dataset which faces data scarcity and
domain bias problems that afflicted the former works. Diving then into a
new machine learning model design, derived from the former state-of-the-art
techniques, which tries to gain advantages from Natural language process
practices adopted by the newest language models. The problem is then tackled
by moving through a metric learning technique, dealing for the first time
with the stylometry problem as a querying snippet mechanism that allows a
zero-shot inference over authors not present in the training set.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Gurioli, Andrea
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
Code stylometry,Machine learning,Data mining,Metric learning
Data di discussione della Tesi
16 Marzo 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Gurioli, Andrea
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM A: TECNICHE DEL SOFTWARE
Ordinamento Cds
DM270
Parole chiave
Code stylometry,Machine learning,Data mining,Metric learning
Data di discussione della Tesi
16 Marzo 2023
URI
Statistica sui download
Gestione del documento: