Documento PDF (Thesis)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato Download (12MB) | Contatta l'autore |
|
Documento PDF (Supplementary file)
Full-text accessibile solo agli utenti istituzionali dell'Ateneo Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato Download (30kB) | Contatta l'autore |
Abstract
The purpose of this thesis project is to bring the application of Cognitive Discovery to an informal type of knowledge. Cognitive Discovery is a term coined by IBM Research to indicate a series of Information Extraction (IE) processes in order to build a knowledge graph capable of representing knowledge from highly unstructured data such as text. Cognitive Discovery is typically applied to a type of formal knowledge, i.e. of the documented text such as academic papers, business reports, patents, etc. While informal knowledge is provided, for example, by recording a conversation within a meeting or through a Power Point presentation, therefore a type of knowledge not formally defined. The idea behind the project is the same as that of the original Cognitive Discovery project, that is the processing of natural language in order to build a knowledge graph that can be interrogated in different ways. This knowledge graph will have an architecture that will depend on the use case, but tends to be a network of entity nodes connected to each other through a certain semantic relationship and to a certain type of nodes containing structural data such as a paragraph, an image or a slide from a presentation. The creation of this graph requires a series of steps, a data processing pipeline that starting from the raw data (in the specific case of the prototype the audio file of the conversation) a series of features are extracted and processed such as entities, semantic relationships between entities, main concepts etc. Once the graph has been created, it is necessary to define an engine for querying and / or generating insights from the knowledge graph; in general the graph database infrastructure also provides a language for querying the graph, however to make the application usable even for those who do not have the technical knowledge necessary to learn the query language, a component has been defined to process the natural language query to query the graph.