A Comprehensive Benchmark of Efficient Text-Driven 3D Generative Models

Conti, Matteo (2024) A Comprehensive Benchmark of Efficient Text-Driven 3D Generative Models. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (9MB)

Abstract

This work presents an in-depth analysis of recent text-driven 3D generation models, focusing on evaluating their performance under strict efficiency constraints, including hardware requirements and generation time. The study aimed to compare the performance of these efficient models against the original outcomes without constraints and incorporated a new benchmark for evaluating text-driven 3D generation models based on the quality and coherence of the generated 3D shapes. The research involved selecting promising models and testing the impact of efficiency constraints on their performance. The results revealed that the quality of generated 3D shapes heavily depends on the input prompt and the specific model used, with some models performing significantly better on certain prompts due to their unique 3D shape parametrization techniques. Efficiency constraints generally resulted in coarser 3D shapes with more artifacts, especially in models requiring substantial computational resources. However, some models still managed to generate detailed and semantically accurate 3D shapes within these constraints. The study also evaluated model performance using the T3Bench benchmark, observing lower Quality scores due to the coarser nature of the 3D shapes but higher Alignment scores, posing an interesting question related to the relationship between 3D efficient generation and prompt coherence evaluated on less detailed 3D shapes. In conclusion, the research highlights significant potential for future exploration in efficient text-driven 3D generation models, particularly in improving detail precision and prompt semantic capture within efficiency constraints. It suggests the importance of developing a suitable prompt dataset for evaluating constrained models and emphasizes the challenge of balancing 3D quality with text-prompt coherence as key areas for future research.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Conti, Matteo

Relatore della tesi

Salti, Samuele

Correlatore della tesi

Amaduzzi, Andrea

Scuola

Ingegneria e Architettura

Corso di studio