Bruno, Daniele
(2023)
Hi-C data spectral analysis: SynHi-C maps for a case study with ShRec3D algorithm and VR.
[Laurea magistrale], Università di Bologna, Corso di Studio in
Physics [LM-DM270]
Documenti full-text disponibili:
|
Documento PDF (Thesis)
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato
Download (12MB)
|
Abstract
Hi-C matrices are milestones for the qualitative and at the same time quantitative study of genome folding, its organization into chromosomal territories, compartments and topological domains. Here we introduce and discuss the synHi-C, a method for synthetic Hi-C data production. It arises from the possibility of characterizing the signal-to-noise ratio starting from a spectral analysis on different types of Hi-C data at different resolutions (1 Mb and 100 kb). Through the spectral analysis, the signal component has been identified, consisting of isolated and scattered eigenvalues even at a great distance from the origin, and the noise component, which follows the Wigner's semicircle law centered in zero, identified through the simulation of random symmetric matrices. By adding the essential matrix (essHi-C) consisting of the sum of the projectors associated to the signal component, with the one reconstructed starting from the projectors of the random component after an eigenvalues reshuffling, it is possible to obtain a potentially a vast amount of synthetic matrices. After testing the spectral analysis on the gold standard cell line GM12878, this innovative method has been applied to a real case study consisting of two cases (235 and 295) of a rare prion disease and two controls (LM and MB), demonstrating how not only the intrinsic biological properties of the Hi-C maps, given by the essHi-C component, are enhanced, but also that the statistical properties of the introduced fluctuations are unbiased, reflecting the non-specific component. The validation of these results has been obtained through different methods including the use of scatter plots between synHi-C and original matrices to identify their correlation, the ShRec3D algorithm to verify the coherence between the spatial folding structures of the chromatin after a proper Procrustes analysis and finally through their visualization in Blender and the Virtual Reality (VR) 3D simulated environment inspection.
Abstract
Hi-C matrices are milestones for the qualitative and at the same time quantitative study of genome folding, its organization into chromosomal territories, compartments and topological domains. Here we introduce and discuss the synHi-C, a method for synthetic Hi-C data production. It arises from the possibility of characterizing the signal-to-noise ratio starting from a spectral analysis on different types of Hi-C data at different resolutions (1 Mb and 100 kb). Through the spectral analysis, the signal component has been identified, consisting of isolated and scattered eigenvalues even at a great distance from the origin, and the noise component, which follows the Wigner's semicircle law centered in zero, identified through the simulation of random symmetric matrices. By adding the essential matrix (essHi-C) consisting of the sum of the projectors associated to the signal component, with the one reconstructed starting from the projectors of the random component after an eigenvalues reshuffling, it is possible to obtain a potentially a vast amount of synthetic matrices. After testing the spectral analysis on the gold standard cell line GM12878, this innovative method has been applied to a real case study consisting of two cases (235 and 295) of a rare prion disease and two controls (LM and MB), demonstrating how not only the intrinsic biological properties of the Hi-C maps, given by the essHi-C component, are enhanced, but also that the statistical properties of the introduced fluctuations are unbiased, reflecting the non-specific component. The validation of these results has been obtained through different methods including the use of scatter plots between synHi-C and original matrices to identify their correlation, the ShRec3D algorithm to verify the coherence between the spatial folding structures of the chromatin after a proper Procrustes analysis and finally through their visualization in Blender and the Virtual Reality (VR) 3D simulated environment inspection.
Tipologia del documento
Tesi di laurea
(Laurea magistrale)
Autore della tesi
Bruno, Daniele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Hi-C,Spectral Analysis,ShRec3D Algorithm,Virtual Reality,Contact Maps,Synthetic Data,3D Genome Folding,3D Reconstruction
Data di discussione della Tesi
14 Luglio 2023
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Bruno, Daniele
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Indirizzo
Applied Physics
Ordinamento Cds
DM270
Parole chiave
Hi-C,Spectral Analysis,ShRec3D Algorithm,Virtual Reality,Contact Maps,Synthetic Data,3D Genome Folding,3D Reconstruction
Data di discussione della Tesi
14 Luglio 2023
URI
Statistica sui download
Gestione del documento: