Multimodal Retrieval-Enhanced Large Language Models for Pictogram Interaction in Augmentative and Alternative Communication

Lucchiari, Laura (2026) Multimodal Retrieval-Enhanced Large Language Models for Pictogram Interaction in Augmentative and Alternative Communication. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text non accessibile fino al 31 Gennaio 2027.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)

Download (3MB) | Contatta l'autore

Abstract

Pictogram-based Augmentative and Alternative Communication (AAC) systems are essential for individuals with complex communication needs, yet navigating large symbol vocabularies remains slow and cognitively demanding. This thesis investigates multimodal, retrieval-enhanced methods to streamline pictogram selection by aligning natural-language intent with AAC repositories, specifically the ARASAAC library. Analysis of ARASAAC metadata reveals significant heterogeneity. To address this, a description-generation pipeline was developed using a vision-language model to assign each pictogram a consistent, keyword-grounded English phrase. Quantitative validation via embedding-based metrics demonstrates that these descriptions provide superior alignment and coverage compared to original metadata. A structured sentence-concept-pictogram dataset was constructed using CommonGen as a grammatical backbone. Concept-to-pictogram mappings were reconstructed through a multi-stage retrieval involving dense retrieval, multimodal filtering, and LLM-based selection. Using this resource, a CLIP-style multimodal retriever was fine-tuned for prefix-based pictogram completion. Evaluations show the fine-tuned model significantly outperforms frozen baselines, with Recall@1 increasing from 4.23% to 10.66% and Recall@5 from 13.07% to 27.79%, proving the efficacy of domain-adapted retrieval for predictive AAC. Finally, an original, profile-aware AAC sentence corpus is introduced, designed to reflect clinically grounded user profiles and pragmatic intent. By modeling variability across age groups and goals, this resource enables ecologically valid evaluation. The corpus is paired with a retrieval pipeline to stress-test reranking and selection under realistic distributions, facilitating the study of generalization and profile-dependent semantic alignment.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Lucchiari, Laura
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Augmentative and Alternative Communication, Pictogram Prediction, Large Language Model, Multimodal Retrieval Augmented Generation, Natural Language Processing
Data di discussione della Tesi
6 Febbraio 2026
URI

Altri metadati

Gestione del documento: Visualizza il documento

^