Generation of proprietary code: from the data extraction to the model finetuning and integration in multi agent system

Periti, Alex (2024) Generation of proprietary code: from the data extraction to the model finetuning and integration in multi agent system.
Nowadays with the proliferation of natural language processing (NLP) models, the potential for automating code generation has garnered significant attention. However, tailoring these models to domain-specific requirements, especially in the context of proprietary software development, remains a complex and relatively unexplored area. This thesis begins with the motivations behind the project, highlighting the challenges posed and the potential benefits derived from an efficient code generation process. The core of the thesis focuses on the methodology employed for fine-tuning a state-of-the-art large language model, such as GPT-3.5, for the task at hand. The process involves the creation of a specialized dataset made up of java code. The methodology also addresses considerations such as the problem and limitations encountered during the development, and the evaluation metrics employed to measure the performance of the fine-tuned model. To validate the effectiveness of the fine-tuned model, a series of experiments are conducted, comparing the performance of the model against baseline models and traditional code generation techniques. Tailored evaluation metrics are employed to assess the model’s efficacy in generating high-quality proprietary Java code. The findings of the research contribute valuable insights to the knowledge in the intersection of NLP and software development. The thesis concludes with a discussion on the practical implications of the research, potential applications in real-world scenarios, and future research in refining and extending the capabilities of LLM for proprietary code generation.

Periti, Alex
Parole chiave
Large Language Model,Code Generation,GPT 3.5 turbo finetuning,Multi-Agent System,Llama2
19 Marzo 2024

