A Comparison between LLMs and SLMs for Document Processing in the Insurance Sector

Turrini, Alice (2025) A Comparison between LLMs and SLMs for Document Processing in the Insurance Sector. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)
Download (6MB)

Abstract

This thesis provides a comparison over feasibility and performance between state-of-the-art large language models (LLMs) and smaller language models (SLMs), for the task of document classification and data extraction of a real-world case scenario. The research focuses on the development of a robust document processing pipeline. Starting from the raw PDF and encompassing all the necessary steps to obtain a structured format suitable for classification and subsequent metadata extraction. Modern techniques are integrated throughout the pipeline to ensure efficiency and scalability. The project leverages a dataset of over 8,000 documents, including both labeled and pseudo-labeled data, in the medical and administrative domains. Specifically, the study compares the use of advanced LLMs, particularly GPT-4o, against smaller language models, BERT and LLaMA 3.2, for document classification and key metadata extraction. Key challenges addressed include the efficient extraction of meaningful information from complex domain documents, optimization of model performance for both classification and extraction tasks, and scalability of the proposed methods. A central focus of this research is identifying the optimal balance between model size and performance. This is explored through fine-tuning smaller models, applying techniques such as knowledge distillation and model quantization, and comparing their results to those of larger models. Results suggest that finetuning small language models for specific tasks can achieve performance comparable to, or in some cases surpass, LLMs, especially when considering model size and computational efficiency. These findings provide valuable insights for the modern topic of choosing between solutions based on LLMs or SLMs, taking into consideration various aspects such as performances, deployment, privacy, personalization, and cost.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Turrini, Alice

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Caimi, Lorenzo ; Granata, Ludovico

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds

DM270

Parole chiave

Document Classification, Metadata Extraction, Large Language Models (LLMs), Small Language Models (SLMs), GPT-4o, Document Processing Pipeline, Model Fine-Tuning, Knowledge Distillation, Model Quantization, Computational Efficiency, Scalability, Medical and Administrative Documents, Real-World Case Scenario, Insurance Sector

Data di discussione della Tesi

25 Marzo 2025

URI

https://amslaurea.unibo.it/id/eprint/35299

Altri metadati

Statistica sui download

Vedi altre statistiche

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

A Comparison between LLMs and SLMs for Document Processing in the Insurance Sector

Abstract

Altri metadati

Statistica sui download