Semantically-aware text generation: an approach based on deep learning, word sense disambiguation and linguistic frames

Montanari, Davide (2020) Semantically-aware text generation: an approach based on deep learning, word sense disambiguation and linguistic frames. [Laurea magistrale], Università di Bologna, Corso di Studio in Informatica [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

The objective of this thesis is to blend state-of-the-art neural architectures with the still scarcely exploited potential of symbolic knowledge bases, to contribute tackling one of the biggest open problems in artificial intelligence in a specific subfield of natural language processing (NLP): commonsense reasoning in text generation. Given a set of concepts (expressed by nouns and verbs), the goal is to generate a short sentence that acts as a description of a scene, plausible according to human commonsense knowledge. This problem can be seen as a special case of constrained text generation, with two major challenges: the generation of sentences given an unordered set of keywords with potential morphological changes, and the comprehension of commonsense relations between sets of concepts, finding an appropriate composition. Since most recent approaches to the problem show no interest in the use of symbolic knowledge resources, this work intends to take the best of both worlds (neural/symbolic), lying at their intersection. The recently proven capabilities of transformer models are leveraged in combination with word sense disambiguation and frame embeddings extracted from FrameNet, an English language knowledge base built upon the theory of Frame Semantics. The evaluation has been conducted on Commongen, a dataset suitably built for this purpose. A long pre-processing phase, including disambiguation of nominal entities, has been conducted, before training the model in three main scenarios: raw text (as baseline); disambiguated text; disambiguated text with frame embeddings. The approach has been evaluated on a manually built test-set from the post-processed initial dataset, by means of BLEU and ROUGE metrics. The proposed approach leads to an increase in performance with respect to the baseline, achieving promising results on both automatic metrics, and suggesting further steps to refine the methodology.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Montanari, Davide

Relatore della tesi

Presutti, Valentina

Scuola

Scienze

Corso di studio

Informatica [LM-DM270]

Indirizzo