A Critical Analysis of LLM Creativity Evaluation

Tutone, Alessandro (2026) A Critical Analysis of LLM Creativity Evaluation. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Full-text non accessibile fino al 26 Marzo 2028.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (8MB) | Contatta l'autore

Abstract

Large Language Models (LLMs) have fundamentally reshaped landscapes from academic research to industrial application. As computational resources scale, these models achieve remarkable proficiency in natural language generation, increasingly challenging human benchmarks in abstract domains like creative writing. However, accurately measuring creativity and understanding how LLMs relate to this complex construct remains a profound academic challenge. This thesis investigates the relationship between LLMs and creativity, asking not only whether these models can successfully simulate creative works, but also whether they possess the capacity to objectively evaluate them. Utilizing the WritingPrompts dataset, we conducted a multi-dimensional analysis combining automated objective metrics with subjective evaluations across 11 dimensions of creativity, assessed by both human judges and an LLM-as-a-Judge framework. Our findings highlight a profound dichotomy between how LLMs generate narratives and how they evaluate creativity. While the models produced highly sophisticated texts, a critical failure emerged during evaluation. We reveal an evident misalignment with human aesthetic standards and a severe systemic bias: the LLM judge consistently favors AI-generated texts over human unpredictability. Furthermore, Principal Component Analysis and correlation analyses demonstrate that current automated metrics are fundamentally inadequate, exhibiting near-zero alignment with human perception. This research concludes that while LLMs have brought unprecedented efficiency to countless domains, the nuances and beautiful imperfections of human creative products remain profoundly complex and difficult to understand; therefore, mathematically encoding them is still unfeasible. Until an artificial system can truly experience the world it is attempting to describe, creativity will remain a uniquely human milestone: a deeply complex aspect of the mind that algorithms have yet to “learn".

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Tutone, Alessandro
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Large Language Models, Creativity, Artificial Intelligence, Creativity Evaluation, Natural Language Processing, LLM-as-a-Judge, Text generation
Data di discussione della Tesi
26 Marzo 2026
URI

Altri metadati

Gestione del documento: Visualizza il documento

^