INTEGRATING MULTIMODAL DATA FOR INTELLIGENT CLINICAL DECISION-MAKING IN CARDIOVASCULAR DISEASE
Keywords:
Multimodal learning, cardiovascular disease, clinical decision support, cross-modal transformer, ECG, electronic health records, echocardiography, attention mechanism, explainable AI.Abstract
Cardiovascular disease (CVD) remains the leading cause of mortality worldwide, accounting for approximately 17.9 million deaths annually. Accurate and timely diagnosis demands the synthesis of heterogeneous clinical data streams including electrocardiograms (ECG), electronic health records (EHR), echocardiographic imaging, laboratory biomarkers, and unstructured clinical notes. In this paper, we propose MMCardio, a novel multimodal deep learning framework that integrates five distinct data modalities through a cross-modal transformer-based fusion mechanism augmented with a dynamic attention gating (DAG) module. Our architecture employs modality-specific encoders a 1-D residual convolutional network for ECG signals, a clinical language model fine-tuned on MIMIC-IV for EHR text, and a 3-D convolutional encoder for echocardiographic video. Their representations are fused via a hierarchical cross-attention mechanism. Evaluated on a combined cohort of 87,243 patients across four public and institutional datasets, MMCardio achieves an AUC-ROC of 0.971, accuracy of 94.7%, and F1-score of 0.943, outperforming the best unimodal baselines by +9.8% AUC and state-of-the-art multimodal methods by +3.8% AUC. An extensive ablation study confirms the additive contribution of each modality. Explainability analysis using SHAP and attention visualization reveals clinically meaningful feature attributions aligned with established cardiology guidelines. This framework demonstrates strong potential for real-time deployment in clinical decision support systems.













