Valida: AI and NLP Techniques in Document Forgery Detection

Sasdelli, Anthea Silvia (2024) Valida: AI and NLP Techniques in Document Forgery Detection. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

This thesis delves into the development of a model for Valida, a system designed by Gradiant that detects and highlights modifications and alterations in documents. The objective is therefore to search for and identify within the documents the fields considered sensitive and most at risk of modification, namely codes, addresses, dates, and customer names. By leveraging advanced natural language processing (NLP) techniques, the proposed system combines OCR-based methods with cutting-edge multimodal large language models (MLLMs) to analyze document structures and extract relevant information. The thesis explores the integration of these models with parsers and disambiguators to enhance accuracy and address challenges posed by noisy data, diverse layouts, and multilingual content. A key focus of this work is evaluating the performance of OCR-free models, such as Qwen2-VL, and their potential as alternatives to traditional approaches. The study highlights the strengths and limitations of each method, including the adaptability of Qwen2-VL to unstructured inputs and the precision of DistilBERT in handling structured fields.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Sasdelli, Anthea Silvia
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
AI,NLP,Machine Learning,OCR,MLLMs,DistilBERT,Qwen2-VL,parsing,document classification,document parsing,fine-tuning,information extraction,OCR-free models,data extraction
Data di discussione della Tesi
5 Dicembre 2024
URI

Altri metadati

Gestione del documento: Visualizza il documento

^