Survey of Large Language Models in Natural Language Processing

Li, Lesi (2024) Survey of Large Language Models in Natural Language Processing. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Large language models (LLMs) are high-level language models that have attracted attention due to their ability to generate and understand complex natural languages. These models automatically learn statistical patterns in language by analyzing large amounts of text data, and mainly rely on self-supervised learning and semi-supervised learning methods. Most of them are based on the Transformer architecture, a neural network structure for efficient processing of sequence data, although more recently implementations based on recurrent neural network variants and Mamba (a state space model) have also appeared. LLMs are able to generate text by predicting what words or symbols are likely to appear next, which makes them very useful in automatic text generation. Until 2020, fine-tuning models to adapt to specific tasks was the dominant usage, but larger models such as GPT-3 introduced the concept of on-the-fly design, allowing models to adapt to new tasks without fine-tuning. These models learn not only the syntax and semantics of human language, but also the knowledge and biases in textual data.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Li, Lesi

Relatore della tesi

Asperti, Andrea

Scuola

Ingegneria e Architettura

Corso di studio

Artificial intelligence [LM-DM270]

Ordinamento Cds

DM270

Parole chiave

The importance of language model research,seedtime,Characteristics of LLMs,Capabilities of LLMs,Background,Technical Evolution,Commonly Used Corpora for Pre-training,Commonly Used Datasets for Fine-tuning,Data Preparation and Preprocessing,Encoder-decoder Architecture,Decoder-only Architecture,Parallel Training,Mixed Precision Training,Offloading,Instruction Tuning,Alignment Tuning,Efficient Tuning,Quantization,In-Context Learning,Chain of Thought Prompting,Planning,Future Directions and Implications,Conclusion

Data di discussione della Tesi

19 Marzo 2024

URI

https://amslaurea.unibo.it/id/eprint/31748

Altri metadati

Gestione del documento:

Strumenti di navigazione

Collezioni AlmaDL

Survey of Large Language Models in Natural Language Processing

Abstract

Altri metadati