Survey of Large Language Models in Natural Language Processing

Li, Lesi (2024) Survey of Large Language Models in Natural Language Processing. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Large language models (LLMs) are high-level language models that have attracted attention due to their ability to generate and understand complex natural languages. These models automatically learn statistical patterns in language by analyzing large amounts of text data, and mainly rely on self-supervised learning and semi-supervised learning methods. Most of them are based on the Transformer architecture, a neural network structure for efficient processing of sequence data, although more recently implementations based on recurrent neural network variants and Mamba (a state space model) have also appeared. LLMs are able to generate text by predicting what words or symbols are likely to appear next, which makes them very useful in automatic text generation. Until 2020, fine-tuning models to adapt to specific tasks was the dominant usage, but larger models such as GPT-3 introduced the concept of on-the-fly design, allowing models to adapt to new tasks without fine-tuning. These models learn not only the syntax and semantics of human language, but also the knowledge and biases in textual data.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Li, Lesi
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
The importance of language model research,seedtime,Characteristics of LLMs,Capabilities of LLMs,Background,Technical Evolution,Commonly Used Corpora for Pre-training,Commonly Used Datasets for Fine-tuning,Data Preparation and Preprocessing,Encoder-decoder Architecture,Decoder-only Architecture,Parallel Training,Mixed Precision Training,Offloading,Instruction Tuning,Alignment Tuning,Efficient Tuning,Quantization,In-Context Learning,Chain of Thought Prompting,Planning,Future Directions and Implications,Conclusion
Data di discussione della Tesi
19 Marzo 2024
URI

Altri metadati

Gestione del documento: Visualizza il documento

^