Generation of proprietary code: from the data extraction to the model finetuning and integration in multi agent system

Periti, Alex (2024) Generation of proprietary code: from the data extraction to the model finetuning and integration in multi agent system. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (945kB) | Contatta l'autore

Abstract

Nowadays with the proliferation of natural language processing (NLP) models, the potential for automating code generation has garnered significant attention. However, tailoring these models to domain-specific requirements, especially in the context of proprietary software development, remains a complex and relatively unexplored area. This thesis begins with the motivations behind the project, highlighting the challenges posed and the potential benefits derived from an efficient code generation process. The core of the thesis focuses on the methodology employed for fine-tuning a state-of-the-art large language model, such as GPT-3.5, for the task at hand. The process involves the creation of a specialized dataset made up of java code. The methodology also addresses considerations such as the problem and limitations encountered during the development, and the evaluation metrics employed to measure the performance of the fine-tuned model. To validate the effectiveness of the fine-tuned model, a series of experiments are conducted, comparing the performance of the model against baseline models and traditional code generation techniques. Tailored evaluation metrics are employed to assess the model’s efficacy in generating high-quality proprietary Java code. The findings of the research contribute valuable insights to the knowledge in the intersection of NLP and software development. The thesis concludes with a discussion on the practical implications of the research, potential applications in real-world scenarios, and future research in refining and extending the capabilities of LLM for proprietary code generation.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Periti, Alex

Relatore della tesi

Torroni, Paolo

Correlatore della tesi

Chiarentin, Andrea

Scuola

Ingegneria e Architettura

Corso di studio