Investigating the Potential of Language Models for Automated Static Code Analysis

Armillotta, Michele (2026) Investigating the Potential of Language Models for Automated Static Code Analysis. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria informatica [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato

Download (640kB) | Contatta l'autore

Abstract

Large Language Models (LLMs) have recently emerged as promising tools for software vulnerability analysis. However, most existing approaches still operate at function level and treat vulnerability detection as a direct classification task over isolated code fragments. This formulation is often inadequate for logical vulnerabilities, whose identification requires reasoning over repository-level context, interprocedural interactions, and implicit security assumptions, rather than relying solely on local syntactic patterns or explicit data-flow relations. This thesis investigates repository-level vulnerability detection with LLMs through a multi-stage analysis framework centered on the notion of \emph{semantic flow}, i.e., the propagation of security-relevant conditions across the codebase. Instead of reasoning only on whether a function is vulnerable, the proposed approach identifies sensitive operations, infers the conditions required for their safe execution, and analyzes whether such conditions are validated across relevant execution contexts. In this way, vulnerability detection is reformulated as a constrained reasoning problem over semantically meaningful repository context. The study focuses on logical weaknesses related to improper access control and exposure of sensitive information, namely CWE-284 and CWE-200, including relevant subcategories. Experimental observations indicate that structured repository-level reasoning provides a more suitable basis for analyzing these weaknesses than plain LLM prompting or black-box function-level classification. Overall, this work suggests that LLM-based vulnerability analysis becomes more effective when supported by explicit contextual decomposition and condition-oriented reasoning, rather than being framed as a monolithic prediction task.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Armillotta, Michele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INGEGNERIA INFORMATICA
Ordinamento Cds
DM270
Parole chiave
Static Code Analysis, Large Language Models
Data di discussione della Tesi
26 Marzo 2026
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^