Multimodal Deep Learning for Medical Imaging: A Survey and A New Approach to Brain Tumor Segmentation with Incomplete Data

Ge, Enze (2025) Multimodal Deep Learning for Medical Imaging: A Survey and A New Approach to Brain Tumor Segmentation with Incomplete Data. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270]
Documenti full-text disponibili:
[thumbnail of Thesis] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)

Download (11MB)

Abstract

Multimodal MRI is crucial for brain tumor segmentation, but its clinical use is hampered by the "missing modality problem," where incomplete data degrades model performance and deployment. This thesis introduces the Grouped Modality Distillation Transformer (GMD-Trans), a novel, fully supervised framework designed to be inherently robust to this challenge. The GMD-Trans architecture uses a pure 3D Vision Transformer (ViT) backbone for global context modeling and a dual-stream encoder for synergistic modality groups. Features are integrated via a cross-attention mixer (IG-CAM). Robustness is achieved through a teacher-student knowledge distillation (KD) scheme guided by the mathematically stable Hölder Divergence to ensure performance even when key modalities are absent. Evaluated on the BraTS 2021 benchmark with randomly missing modalities, GMD-Trans achieves a state-of-the-art Dice score of 82.1% for the Tumor Core (TC), surpassing strong baselines. Ablation studies confirm the efficacy of the proposed methods. This specialized success, however, reveals a performance trade-off, with lower accuracy on the Enhancing Tumor (ET) region. GMD-Trans provides a powerful and efficient solution for robustly segmenting the main tumor body from incomplete data using a fully supervised paradigm, without needing complex pre-training or data synthesis. This work advances the development of dependable AI tools for real-world neuro-oncology.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Ge, Enze
Relatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
multimodal deep learning, brain tumor segmentation, missing modalities, Vision Transformer (ViT), Knowledge Distillation, multimodal fusion
Data di discussione della Tesi
22 Luglio 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento

^