Investigating the Potential of Language Models for Automated Static Code Analysis

Armillotta, Michele (2026) Investigating the Potential of Language Models for Automated Static Code Analysis. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria informatica [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (640kB) | Contatta l'autore

Abstract

Large Language Models (LLMs) have recently emerged as promising tools for software vulnerability analysis. However, most existing approaches still operate at function level and treat vulnerability detection as a direct classification task over isolated code fragments. This formulation is often inadequate for logical vulnerabilities, whose identification requires reasoning over repository-level context, interprocedural interactions, and implicit security assumptions, rather than relying solely on local syntactic patterns or explicit data-flow relations. This thesis investigates repository-level vulnerability detection with LLMs through a multi-stage analysis framework centered on the notion of \emph{semantic flow}, i.e., the propagation of security-relevant conditions across the codebase. Instead of reasoning only on whether a function is vulnerable, the proposed approach identifies sensitive operations, infers the conditions required for their safe execution, and analyzes whether such conditions are validated across relevant execution contexts. In this way, vulnerability detection is reformulated as a constrained reasoning problem over semantically meaningful repository context. The study focuses on logical weaknesses related to improper access control and exposure of sensitive information, namely CWE-284 and CWE-200, including relevant subcategories. Experimental observations indicate that structured repository-level reasoning provides a more suitable basis for analyzing these weaknesses than plain LLM prompting or black-box function-level classification. Overall, this work suggests that LLM-based vulnerability analysis becomes more effective when supported by explicit contextual decomposition and condition-oriented reasoning, rather than being framed as a monolithic prediction task.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Armillotta, Michele

Relatore della tesi

Montanari, Rebecca

Correlatore della tesi

Cavallaro, Lorenzo ; Romandini, Nicolò

Scuola

Ingegneria e Architettura

Corso di studio