STRUMENTI DI NAVIGAZIONE

Subtopic-oriented biomedical summarization using pretrained language models

Xia, Tian Cheng (2023) Subtopic-oriented biomedical summarization using pretrained language models. [Laurea], Università di Bologna, Corso di Studio in Informatica [L-DM270]

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Condividi allo stesso modo 4.0 (CC BY-SA 4.0)
Download (672kB)

Abstract

The ever-growing number of publications in the biomedical field is causing difficulties in finding insightful knowledge. In this work, we propose a subtopic-oriented summarization framework that aims to provide an overview on the state-of-the-art of a given subject. The method we propose clusters the papers retrieved from a query and then, for each cluster, extracts the subtopics and summarizes the abstracts. We conducted various experiments to select the most appropriate clustering approach and concluded that the best choices are MiniLM for text embedding, UMAP for dimensionality reduction and OPTICS as clustering algorithm. For summarization, we fine-tuned both general-domain and biomedical pretrained language models for the task of extractive summarization and selected Longformer as the most suited model. Experimental results on multi-document summarization datasets show that the proposed framework improves the overall recall of the generated summary with a small decrease in precision, which corresponds to slightly longer summaries but closer to the ground truth.

Abstract

Tipologia del documento

Tesi di laurea (Laurea)

Autore della tesi

Xia, Tian Cheng

Relatore della tesi

Montesi, Danilo

Correlatore della tesi

Bertini, Flavio

Scuola

Scienze

Corso di studio