Ianaro, Martina
(2023)
Next Pictogram Prediction via Vision-Language Modeling: Enhancing Communication of Autistic Children.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
Individuals with complex communication needs often face speech barriers that hinder social inclusion. As said by I. Estrada, "If a child cannot learn the way we teach, maybe we should teach the way they learn". This philosophy guides our research, which focuses on empowering children with autism and other communication disorders to effectively utilize Augmentative and Alternative Communication Systems, better predicting their needs.
Recent multi-modal AI breakthroughs have opened up new avenues for enhancing the lives of these individuals. In this context, we introduce PictoViLT, a Vision-and-Language Transformer Encoder fine-tuned on text datasets and ARASAAC pictograms. Our approach involves employing various self-supervised masking techniques that transition from single to multiple text tokens, effectively addressing the challenge of predicting the next pictogram.
PictoViLT breaks the strict dependence on WordNet concept sequences as input, being able to manage natural language and pictogram sequences directly. In-depth experiments conducted on various datasets, grounded on commonsense resources and verbalized knowledge graphs, reveal significant enhancements compared to prior state-of-the-art models. In contrast to PictoBERT and statistical n-gram models, PictoViLT achieves up to +0.60 Top-1 accuracy points. Ultimately, token--patch alignments and attention areas make predictions interpretable.
Abstract
Individuals with complex communication needs often face speech barriers that hinder social inclusion. As said by I. Estrada, "If a child cannot learn the way we teach, maybe we should teach the way they learn". This philosophy guides our research, which focuses on empowering children with autism and other communication disorders to effectively utilize Augmentative and Alternative Communication Systems, better predicting their needs.
Recent multi-modal AI breakthroughs have opened up new avenues for enhancing the lives of these individuals. In this context, we introduce PictoViLT, a Vision-and-Language Transformer Encoder fine-tuned on text datasets and ARASAAC pictograms. Our approach involves employing various self-supervised masking techniques that transition from single to multiple text tokens, effectively addressing the challenge of predicting the next pictogram.
PictoViLT breaks the strict dependence on WordNet concept sequences as input, being able to manage natural language and pictogram sequences directly. In-depth experiments conducted on various datasets, grounded on commonsense resources and verbalized knowledge graphs, reveal significant enhancements compared to prior state-of-the-art models. In contrast to PictoBERT and statistical n-gram models, PictoViLT achieves up to +0.60 Top-1 accuracy points. Ultimately, token--patch alignments and attention areas make predictions interpretable.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Ianaro, Martina
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Model,Multimodal AI,Language modeling,Natural Language Processing,Text Mining,Computer Vision,Transformer,Vision-and-Language Transformers,Deep Learning,Augmentative and alternative communication,Pictogram Prediction,Autism
Data di discussione della Tesi
21 Ottobre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Ianaro, Martina
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Model,Multimodal AI,Language modeling,Natural Language Processing,Text Mining,Computer Vision,Transformer,Vision-and-Language Transformers,Deep Learning,Augmentative and alternative communication,Pictogram Prediction,Autism
Data di discussione della Tesi
21 Ottobre 2023
URI
Gestione del documento: