Abstract
Large language models (LLMs) are high-level language models that have attracted attention due to their ability to generate and understand complex natural languages. These models automatically learn statistical patterns in language by analyzing large amounts of text data, and mainly rely on self-supervised learning and semi-supervised learning methods. Most of them are based on the Transformer architecture, a neural network structure for efficient processing of sequence data, although more recently implementations based on recurrent neural network variants and Mamba (a state space model) have also appeared. LLMs are able to generate text by predicting what words or symbols are likely to appear next, which makes them very useful in automatic text generation. Until 2020, fine-tuning models to adapt to specific tasks was the dominant usage, but larger models such as GPT-3 introduced the concept of on-the-fly design, allowing models to adapt to new tasks without fine-tuning. These models learn not only the syntax and semantics of human language, but also the knowledge and biases in textual data.