Cucè, Marco
(2022)
Work environments implementation for genomic reporting and analytics.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Artificial intelligence [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore.
(
Contatta l'autore)
Abstract
A global italian pharmaceutical company has to provide two work environments
that favor different needs. The environments will allow to develop solutions
in a controlled, secure and at the same time in an independent manner
on a state-of-the-art enterprise cloud platform. The need of developing two
different environments is dictated by the needs of the working units. Indeed,
the first environment is designed to facilitate the creation of application related
to genomics, therefore, designed more for data-scientists. This environment
is capable of consuming, producing, retrieving and incorporating data,
furthermore, will support the most used programming languages for genomic
applications (e.g., Python, R). The proposal was to obtain a pool of ready-togo
Virtual Machines with different architectures to provide best performance
based on the job that needs to be carried out.
The second environment has more of a traditional trait, to obtain, via ETL
(Extract-Transform-Load) process, a global datamodel, resembling a classical
relational structure. It will provide major BI operations (e.g., analytics, performance
measure, reports, etc.) that can be leveraged both for application
analysis or for internal usage. Since, both architectures will maintain large
amounts of data regarding not only pharmaceutical informations but also internal
company informations, it would be possible to digest the data by reporting/
analytics tools and also apply data-mining, machine learning technologies
to exploit intrinsic informations. The thesis work will introduce, proposals,
implementations, descriptions of used technologies/platforms and future
works of the above discussed environments.
Abstract
A global italian pharmaceutical company has to provide two work environments
that favor different needs. The environments will allow to develop solutions
in a controlled, secure and at the same time in an independent manner
on a state-of-the-art enterprise cloud platform. The need of developing two
different environments is dictated by the needs of the working units. Indeed,
the first environment is designed to facilitate the creation of application related
to genomics, therefore, designed more for data-scientists. This environment
is capable of consuming, producing, retrieving and incorporating data,
furthermore, will support the most used programming languages for genomic
applications (e.g., Python, R). The proposal was to obtain a pool of ready-togo
Virtual Machines with different architectures to provide best performance
based on the job that needs to be carried out.
The second environment has more of a traditional trait, to obtain, via ETL
(Extract-Transform-Load) process, a global datamodel, resembling a classical
relational structure. It will provide major BI operations (e.g., analytics, performance
measure, reports, etc.) that can be leveraged both for application
analysis or for internal usage. Since, both architectures will maintain large
amounts of data regarding not only pharmaceutical informations but also internal
company informations, it would be possible to digest the data by reporting/
analytics tools and also apply data-mining, machine learning technologies
to exploit intrinsic informations. The thesis work will introduce, proposals,
implementations, descriptions of used technologies/platforms and future
works of the above discussed environments.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Cucè, Marco
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
AI,Datawarehose,Datawarehousing,Cloud
Data di discussione della Tesi
6 Dicembre 2022
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Cucè, Marco
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
AI,Datawarehose,Datawarehousing,Cloud
Data di discussione della Tesi
6 Dicembre 2022
URI
Gestione del documento: