The Summary Imitation Game: Improving Abstractive Summarizers via Neural-Approximated Decoding Strategies

Pacilli, Benedetta (2023) The Summary Imitation Game: Improving Abstractive Summarizers via Neural-Approximated Decoding Strategies. [Laurea], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [L-DM270] - Cesena

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)
Download (4MB)

Abstract

In recent years, abstract summarization has undergone significant advancements driven by the emergence of novel neural language models, transformer-based architectures, the utilization of high-dimensional spaces, the availability of extensive datasets, and innovative pre-training tasks. Within this evolving landscape, decoding strategies play a crucial role in transforming the probability distributions generated by these models into coherent text, following an autoregressive approach. Summarization systems make numerous decisions about summary properties during inference, e.g. degree of copying, specificity, and length of outputs, etc. However, these are implicitly encoded within model parameters and specific styles cannot be enforced. Goyal et al. in 2021 introduced Hydrasum, a summarization model with multiple decoders. The peculiarity of Hydrasum relies upon its architecture, which includes multiple decoding channels. HydraSum provides a simple mechanism to obtain stylistically diverse summaries by sampling from either individual decoders or their mixtures through a gating mechanism. The thesis presents an architecture comprising two Large Language Models that replicate distinct decoding strategies. Given the inherent non-differentiability of decoding strategies, an innovative approach is employed wherein the emulation of these strategies is achieved through the integration of a second language model, rather than relying on differentiable layers for each strategy. In this configuration, the initial language model is provided with input text intended for summarization, but the primary objective is not the generation of the summary itself. Instead, the first language model computes the probability distribution of each token for the given input text. Subsequently, these probability distributions are utilized as input for the second language model, responsible for producing the token that corresponds to the selection made by a specific decoding strategy.

Abstract

Tipologia del documento

Tesi di laurea (Laurea)

Autore della tesi

Pacilli, Benedetta

Relatore della tesi

Moro, Gianluca

Correlatore della tesi

Frisoni, Giacomo

Scuola

Ingegneria e Architettura

Corso di studio