A Critical Analysis of LLM Creativity Evaluation

Tutone, Alessandro (2026) A Critical Analysis of LLM Creativity Evaluation. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 26 Marzo 2028.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)
Download (8MB) | Contatta l'autore

Abstract

Large Language Models (LLMs) have fundamentally reshaped landscapes from academic research to industrial application. As computational resources scale, these models achieve remarkable proficiency in natural language generation, increasingly challenging human benchmarks in abstract domains like creative writing. However, accurately measuring creativity and understanding how LLMs relate to this complex construct remains a profound academic challenge. This thesis investigates the relationship between LLMs and creativity, asking not only whether these models can successfully simulate creative works, but also whether they possess the capacity to objectively evaluate them. Utilizing the WritingPrompts dataset, we conducted a multi-dimensional analysis combining automated objective metrics with subjective evaluations across 11 dimensions of creativity, assessed by both human judges and an LLM-as-a-Judge framework. Our findings highlight a profound dichotomy between how LLMs generate narratives and how they evaluate creativity. While the models produced highly sophisticated texts, a critical failure emerged during evaluation. We reveal an evident misalignment with human aesthetic standards and a severe systemic bias: the LLM judge consistently favors AI-generated texts over human unpredictability. Furthermore, Principal Component Analysis and correlation analyses demonstrate that current automated metrics are fundamentally inadequate, exhibiting near-zero alignment with human perception. This research concludes that while LLMs have brought unprecedented efficiency to countless domains, the nuances and beautiful imperfections of human creative products remain profoundly complex and difficult to understand; therefore, mathematically encoding them is still unfeasible. Until an artificial system can truly experience the world it is attempting to describe, creativity will remain a uniquely human milestone: a deeply complex aspect of the mind that algorithms have yet to “learn".

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Tutone, Alessandro

Relatore della tesi

Musolesi, Mirco

Correlatore della tesi

Franceschelli, Giorgio

Scuola

Ingegneria e Architettura

Corso di studio