Valida: AI and NLP Techniques in Document Forgery Detection

Sasdelli, Anthea Silvia (2024) Valida: AI and NLP Techniques in Document Forgery Detection. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

This thesis delves into the development of a model for Valida, a system designed by Gradiant that detects and highlights modifications and alterations in documents. The objective is therefore to search for and identify within the documents the fields considered sensitive and most at risk of modification, namely codes, addresses, dates, and customer names. By leveraging advanced natural language processing (NLP) techniques, the proposed system combines OCR-based methods with cutting-edge multimodal large language models (MLLMs) to analyze document structures and extract relevant information. The thesis explores the integration of these models with parsers and disambiguators to enhance accuracy and address challenges posed by noisy data, diverse layouts, and multilingual content. A key focus of this work is evaluating the performance of OCR-free models, such as Qwen2-VL, and their potential as alternatives to traditional approaches. The study highlights the strengths and limitations of each method, including the adaptability of Qwen2-VL to unstructured inputs and the precision of DistilBERT in handling structured fields.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Sasdelli, Anthea Silvia

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Cerezo Costas, Héctor ; Alonso Doval, Pedro

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds

DM270

Parole chiave

AI,NLP,Machine Learning,OCR,MLLMs,DistilBERT,Qwen2-VL,parsing,document classification,document parsing,fine-tuning,information extraction,OCR-free models,data extraction

Data di discussione della Tesi

5 Dicembre 2024

URI

https://amslaurea.unibo.it/id/eprint/33915

Altri metadati

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

Valida: AI and NLP Techniques in Document Forgery Detection

Abstract

Altri metadati