Synthetic data generation for the assessment of antimicrobial resistance through machine learning

Zaghi, Adriano (2022) Synthetic data generation for the assessment of antimicrobial resistance through machine learning. [Laurea magistrale], Università di Bologna, Corso di Studio in Physics [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Condividi allo stesso modo 4.0 (CC BY-NC-SA 4.0)

Download (1MB)

Abstract

As a consequence of the diffusion of next generation sequencing techniques, metagenomics databases have become one of the most promising repositories of information about features and behavior of microorganisms. One of the subjects that can be studied from those data are bacteria populations. Next generation sequencing techniques allow to study the bacteria population within an environment by sampling genetic material directly from it, without the needing of culturing a similar population in vitro and observing its behavior. As a drawback, it is quite complex to extract information from those data and usually there is more than one way to do that; AMR is no exception. In this study we will discuss how the quantified AMR, which regards the genotype of the bacteria, can be related to the bacteria phenotype and its actual level of resistance against the specific substance. In order to have a quantitative information about bacteria genotype, we will evaluate the resistome from the read libraries, aligning them against CARD database. With those data, we will test various machine learning algorithms for predicting the bacteria phenotype. The samples that we exploit should resemble those that could be obtained from a natural context, but are actually produced by a read libraries simulation tool. In this way we are able to design the populations with bacteria of known genotype, so that we can relay on a secure ground truth for training and testing our algorithms.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Zaghi, Adriano
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Metagenomics,DNA,Bacteria,AMR,Antibiotics,Simulated data,Machine learning,Anti Microbial Resistance,PCA,Principal Component Analysis,Random Forest,Ada Boost Classifier,Elastic Net,Logistic Regression
Data di discussione della Tesi
23 Settembre 2022
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^