Documenti full-text disponibili:
Abstract
In recent years, abstract summarization has undergone significant advancements driven by the emergence of novel neural language models, transformer-based architectures, the utilization of high-dimensional spaces, the availability of extensive datasets, and innovative pre-training tasks. Within this evolving landscape, decoding strategies play a crucial role in transforming the probability distributions generated by these models into coherent text, following an autoregressive approach.
Summarization systems make numerous decisions about summary properties during inference, e.g. degree of copying, specificity, and length of outputs, etc. However, these are implicitly encoded within model parameters and specific styles cannot be enforced. Goyal et al. in 2021 introduced Hydrasum, a summarization model with multiple decoders. The peculiarity of Hydrasum relies upon its architecture, which includes multiple decoding channels. HydraSum provides a simple mechanism to obtain stylistically diverse summaries by sampling from either individual decoders or their mixtures through a gating mechanism.
The thesis presents an architecture comprising two Large Language Models that replicate distinct decoding strategies. Given the inherent non-differentiability of decoding strategies, an innovative approach is employed wherein the emulation of these strategies is achieved through the integration of a second language model, rather than relying on differentiable layers for each strategy. In this configuration, the initial language model is provided with input text intended for summarization, but the primary objective is not the generation of the summary itself. Instead, the first language model computes the probability distribution of each token for the given input text. Subsequently, these probability distributions are utilized as input for the second language model, responsible for producing the token that corresponds to the selection made by a specific decoding strategy.
Abstract
In recent years, abstract summarization has undergone significant advancements driven by the emergence of novel neural language models, transformer-based architectures, the utilization of high-dimensional spaces, the availability of extensive datasets, and innovative pre-training tasks. Within this evolving landscape, decoding strategies play a crucial role in transforming the probability distributions generated by these models into coherent text, following an autoregressive approach.
Summarization systems make numerous decisions about summary properties during inference, e.g. degree of copying, specificity, and length of outputs, etc. However, these are implicitly encoded within model parameters and specific styles cannot be enforced. Goyal et al. in 2021 introduced Hydrasum, a summarization model with multiple decoders. The peculiarity of Hydrasum relies upon its architecture, which includes multiple decoding channels. HydraSum provides a simple mechanism to obtain stylistically diverse summaries by sampling from either individual decoders or their mixtures through a gating mechanism.
The thesis presents an architecture comprising two Large Language Models that replicate distinct decoding strategies. Given the inherent non-differentiability of decoding strategies, an innovative approach is employed wherein the emulation of these strategies is achieved through the integration of a second language model, rather than relying on differentiable layers for each strategy. In this configuration, the initial language model is provided with input text intended for summarization, but the primary objective is not the generation of the summary itself. Instead, the first language model computes the probability distribution of each token for the given input text. Subsequently, these probability distributions are utilized as input for the second language model, responsible for producing the token that corresponds to the selection made by a specific decoding strategy.
Tipologia del documento
Tesi di laurea
(Laurea)
Autore della tesi
Pacilli, Benedetta
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Natural Language Processing,Language Models,Decoding Strategies,Natural Language Generation,Transformer-based Models
Data di discussione della Tesi
5 Ottobre 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Pacilli, Benedetta
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Natural Language Processing,Language Models,Decoding Strategies,Natural Language Generation,Transformer-based Models
Data di discussione della Tesi
5 Ottobre 2023
URI
Statistica sui download
Gestione del documento: