Turekhassim, Abylay
(2025)
Design and Implementation of a High-Accuracy Text-to-SQL Pipeline with LLMs.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
Large Language Models (LLMs) are increasingly being explored for the Text-to-SQL task, with numerous techniques proposed by researchers to enhance their performance in SQL generation. In this thesis, we aim to design a high-accuracy Text-to-SQL pipeline suitable for real-world industrial applications, grounded in standard research practices.
We conduct a comparative study of foundational approaches, evaluating alternative methods for Text-to-SQL such as LLM tool calling and prompting with linearized tables against conventional LLM prompting. Based on these findings, we systematically design an optimized pipeline that integrates state-of-the-art techniques, including M-Schema representation, advanced prompting strategies, schema linking, and validation.
Furthermore, we investigate the use of reasoning-based models by comparing several frontier reasoning LLMs to identify the most effective one for the Text-to-SQL task. We then evaluate this model against non-reasoning models under identical conditions to determine the most effective overall approach for our use case. Following a comprehensive study, comparison, and evaluation of multiple existing approaches, we obtain an 'at least one column match' accuracy of 80\% on a 200-pair subset of the BIRD dataset. This work demonstrates the practical value of applying Text-to-SQL techniques in real industrial settings, contributing to a deeper understanding of how such solutions can be effectively engineered and deployed.
Abstract
Large Language Models (LLMs) are increasingly being explored for the Text-to-SQL task, with numerous techniques proposed by researchers to enhance their performance in SQL generation. In this thesis, we aim to design a high-accuracy Text-to-SQL pipeline suitable for real-world industrial applications, grounded in standard research practices.
We conduct a comparative study of foundational approaches, evaluating alternative methods for Text-to-SQL such as LLM tool calling and prompting with linearized tables against conventional LLM prompting. Based on these findings, we systematically design an optimized pipeline that integrates state-of-the-art techniques, including M-Schema representation, advanced prompting strategies, schema linking, and validation.
Furthermore, we investigate the use of reasoning-based models by comparing several frontier reasoning LLMs to identify the most effective one for the Text-to-SQL task. We then evaluate this model against non-reasoning models under identical conditions to determine the most effective overall approach for our use case. Following a comprehensive study, comparison, and evaluation of multiple existing approaches, we obtain an 'at least one column match' accuracy of 80\% on a 200-pair subset of the BIRD dataset. This work demonstrates the practical value of applying Text-to-SQL techniques in real industrial settings, contributing to a deeper understanding of how such solutions can be effectively engineered and deployed.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Turekhassim, Abylay
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
text-to-sql, Large Language Models, Natural Language Processing, SQL
Data di discussione della Tesi
4 Dicembre 2025
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Turekhassim, Abylay
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
text-to-sql, Large Language Models, Natural Language Processing, SQL
Data di discussione della Tesi
4 Dicembre 2025
URI
Gestione del documento: