On the Automatic Captioning of Gastronomic Procedural Pictures

Benini, Elena (2026) On the Automatic Captioning of Gastronomic Procedural Pictures. [Laurea magistrale], Università di Bologna, Corso di Studio in Specialized translation [LM-DM270] - Forli'

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)
Download (1MB)

Abstract

Automated image captioning using artificial neural networks allows for applications that go beyond the creation of a description in natural language of the visual information contained in an image. This work explores the use of image captioning to generate the instructions to perform a gastronomic procedure depicted by an input picture. To do this, the model must learn to focus on the appropriate visual elements of the image, as well as to mimic the required style of the captions. A multilingual dataset of recipes is used to fine-tune an English vision encoder-decoder model, and to prefix-tune an Italian model built using CLIP as an image encoder and mGPT as a linguistic decoder with a lightweight network to bridge the modalities. Lack of context that goes beyond the individual image causes the most issues, but both of the resulting models perform well overall. This is especially evident in the case of the English fine-tuned model. However, most of the automated metrics used struggle to reliably evaluate the quality of the results. BERTScore fares the best among them, both when only the baseline BERT model is used and when the model is adapted to the domain. The presence of noisy references probably contributes to the issues encountered during the evaluation, but is certainly not the only factor. In short, while this kind of non-standard application of image captioning can be modeled successfully, the selection of appropriate evaluation metrics is non-trivial, and a time-consuming manual evaluation may be necessary for a fully informed assessment.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Benini, Elena

Relatore della tesi

Barron Cedeno, Luis Alberto

Correlatore della tesi

Milicevic Petrovic, Maja ; Gajo, Paolo

Scuola

Lingue e Letterature, Traduzione e Interpretazione

Corso di studio