Graph-Based Keyword Extraction from Scientific Paper Abstracts using Word Embeddings

Koluh, Dinno (2023) Graph-Based Keyword Extraction from Scientific Paper Abstracts using Word Embeddings. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (1MB)

Abstract

In the era of information overload it is essential to efficiently extract concise, precise and quality information from large texts. One aspect of information extraction is keyword extraction where large texts are represented as sets of keywords. This prospect of keyword extraction is paramount to researchers as they deal with huge numbers of scientific papers, and having a good and concise representation of those papers is essential for them. This thesis addresses that problem in the realm of Natural Language Processing (NLP). Using core NLP concepts and modeling texts as graphs, we are going to build a model for the automatic extraction of keywords. This is done in an unsupervised manner as the importance of a word is calculated through the position and weights associated with respective words in the graph. The first metric used to calculate the graph weights are co-occurrence matrices and the other metric are word embeddings. Word embeddings became a crucial way of representing the semantic information of words as dense vectors. The results of this paper were compared with keywords that were provided by authors of scientific papers in the area of computer science which act as the ground truth, but crucially are not a component in the model construction, but just serve as a verifier of the model’s accuracy.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Koluh, Dinno
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
NLP,keyword extraction,scientific papers,graphs,word-embeddings
Data di discussione della Tesi
16 Dicembre 2023
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^