A FEDERATED MULTI-SCALE HYBRID TRANSFORMER-CNN FRAMEWORK FOR BRAIN TUMOR CLASSIFICATION

Muhammad Akmal; Urooj Fatima; Abdullah Soomro; Sajid Ahmed; Wajahat Akbar

Authors

Muhammad Akmal
Urooj Fatima
Abdullah Soomro
Sajid Ahmed
Wajahat Akbar

Keywords:

Brain tumor classification; Federated learning; Vision Transformer; Convolutional neural network; Evidential deep learning; Explainable AI; MRI analysis; Uncertainty quantification

Abstract

Brain tumors, especially those of the gliomas type, meningiomas and metastatic tumors, are one of the most difficult areas in neuroimaging. The current deep learning frameworks have three major drawbacks: (i) limited or scarce data availability and inability to share with other institutions due to patient privacy protection regulations, (ii) lack of transparency of the model, and (iii) high prediction confidence without any reliable signal to provide for the radiologists for action. To tackle all three limitations, this paper presents a Federated Multi-Scale Hybrid Transformer-CNN Network, named as FMHTNet.To solve all these three limitations, this paper introduces a Federated Multi-Scale Hybrid Transformer-CNN Network, called as FMHTNet. Inspired by the success of Vision Transformers, a new hybrid encoder is proposed that combines a Vision Transformer (ViT) branch with a multi-scale CNN branch, both of which are connected by a learnable attention-gating module and both of which capture global spatial dependencies across MRI volumes and local features of tumor texture and boundaries. The model is trained through a federated learning (FL) paradigm on four simulated institutional nodes divided from the BraTS 2021 and Figshare brain tumor MRI datasets with the raw patient images never being transferred off the patient's premises. Predictions are generated using an Evidential Deep Learning (EDL) classifier that returns a class label and a calibrated uncertainty score that follows a Dirichlet distribution, which can be used to identify prediction cases that fall in an "ambiguity zone" to be reviewed by experts. Lastly, Grad-CAM++ saliency maps offer per-prediction explanations in the image consistent with the radiologic tradition. The results on the combined test set demonstrate the superiority of FMHTNet over all the baselines, ranging from the previous state-of-the-art ResNet50+GAN framework (96.25%) to 98.12% accuracy, 0.975 macro F1-score, and an Expected Calibration Error (ECE) score of 0.047. The proposed framework shows how it is possible to achieve privacy-preservation, high accuracy, and clinical interpretability.