Paradisi, Arianna
(2025)
Learner Corpora and Artificial Intelligence: Towards Error Annotation of a Corpus of Italian EFL Students' Interactions with Chatbots.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Specialized translation [LM-DM270] - Forli'
Documenti full-text disponibili:
Abstract
This thesis is part of the UNITE — Universally inclusive technologies to practice English project, which aims to create and analyse a learner corpus based on interactions between Italian students of English as a Foreign Language (EFL) and chatbots. The thesis specifically presents two case studies, one on error annotation of a sample of texts from the corpus, and another on the possibility of using ChatGPT for automating the error annotation process. The first case study involved the error annotation of students’ conversational turns from 23 texts using the Louvain Error Tagging Manual Version 2.0, which resulted in the refinement of the error taxonomy so that it could align with the conversational nature of the UNITE corpus. Among other results, the distribution of errors annotated using the refined error tagset showed that the corpus presents several features commonly associated with digitally-mediated-communication, with orthographic and morphological errors being the most frequent type of linguistic errors. The second case study consisted of a proof-of-concept experiment where a custom GPT powered by the ChatGPT-4o model was created and used for error annotating four texts from the sample manually annotated corpus. By comparing the GPT’s output with human annotations, results on accuracy revealed that the chatbot was able to reach an acceptable level of accuracy. This means that, even if with due attention, it may be used as a preliminary instrument for error annotation, followed by an accurate revision and post-editing.
Abstract
This thesis is part of the UNITE — Universally inclusive technologies to practice English project, which aims to create and analyse a learner corpus based on interactions between Italian students of English as a Foreign Language (EFL) and chatbots. The thesis specifically presents two case studies, one on error annotation of a sample of texts from the corpus, and another on the possibility of using ChatGPT for automating the error annotation process. The first case study involved the error annotation of students’ conversational turns from 23 texts using the Louvain Error Tagging Manual Version 2.0, which resulted in the refinement of the error taxonomy so that it could align with the conversational nature of the UNITE corpus. Among other results, the distribution of errors annotated using the refined error tagset showed that the corpus presents several features commonly associated with digitally-mediated-communication, with orthographic and morphological errors being the most frequent type of linguistic errors. The second case study consisted of a proof-of-concept experiment where a custom GPT powered by the ChatGPT-4o model was created and used for error annotating four texts from the sample manually annotated corpus. By comparing the GPT’s output with human annotations, results on accuracy revealed that the chatbot was able to reach an acceptable level of accuracy. This means that, even if with due attention, it may be used as a preliminary instrument for error annotation, followed by an accurate revision and post-editing.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Paradisi, Arianna
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM SPECIALIZED TRANSLATION
Ordinamento Cds
DM270
Parole chiave
artificial intelligence,large language models,chatbots,language learning,dialogue-based Computer-Assisted Language Learning,learner corpora,corpus annotation,error annotation
Data di discussione della Tesi
18 Marzo 2025
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Paradisi, Arianna
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM SPECIALIZED TRANSLATION
Ordinamento Cds
DM270
Parole chiave
artificial intelligence,large language models,chatbots,language learning,dialogue-based Computer-Assisted Language Learning,learner corpora,corpus annotation,error annotation
Data di discussione della Tesi
18 Marzo 2025
URI
Statistica sui download
Gestione del documento: