Next Pictogram Prediction via Vision-Language Modeling: Enhancing Communication of Autistic Children

Ianaro, Martina (2023) Next Pictogram Prediction via Vision-Language Modeling: Enhancing Communication of Autistic Children. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)
Download (10MB)

Abstract

Individuals with complex communication needs often face speech barriers that hinder social inclusion. As said by I. Estrada, "If a child cannot learn the way we teach, maybe we should teach the way they learn". This philosophy guides our research, which focuses on empowering children with autism and other communication disorders to effectively utilize Augmentative and Alternative Communication Systems, better predicting their needs. Recent multi-modal AI breakthroughs have opened up new avenues for enhancing the lives of these individuals. In this context, we introduce PictoViLT, a Vision-and-Language Transformer Encoder fine-tuned on text datasets and ARASAAC pictograms. Our approach involves employing various self-supervised masking techniques that transition from single to multiple text tokens, effectively addressing the challenge of predicting the next pictogram. PictoViLT breaks the strict dependence on WordNet concept sequences as input, being able to manage natural language and pictogram sequences directly. In-depth experiments conducted on various datasets, grounded on commonsense resources and verbalized knowledge graphs, reveal significant enhancements compared to prior state-of-the-art models. In contrast to PictoBERT and statistical n-gram models, PictoViLT achieves up to +0.60 Top-1 accuracy points. Ultimately, token--patch alignments and attention areas make predictions interpretable.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Ianaro, Martina

Relatore della tesi

Moro, Gianluca

Correlatore della tesi

Frisoni, Giacomo

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds

DM270

Parole chiave

Large Language Model,Multimodal AI,Language modeling,Natural Language Processing,Text Mining,Computer Vision,Transformer,Vision-and-Language Transformers,Deep Learning,Augmentative and alternative communication,Pictogram Prediction,Autism

Data di discussione della Tesi

21 Ottobre 2023

URI

https://amslaurea.unibo.it/id/eprint/30081

Altri metadati

Statistica sui download

Vedi altre statistiche

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

Next Pictogram Prediction via Vision-Language Modeling: Enhancing Communication of Autistic Children

Abstract

Altri metadati

Statistica sui download