EFFICIENT CROSS-MODALITY IMAGE RETRIEVAL LEVERAGING USING MULTIMODAL OPTIMIZED FEATURE ENGINEERING AND DEEP LEARNING INTELLIGENCE

Jacob Katende; Muhammad Kashaf; Salahuddin; Hafiz Muhammad Ijaz; Nasir Hussain

Authors

Jacob Katende
Muhammad Kashaf
Salahuddin
Hafiz Muhammad Ijaz
Nasir Hussain

Abstract

Content-Based Image Retrieval (CBIR) has become an important area of research in computer vision, mainly due to the rapid increase in visual data and the need for more effective retrieval techniques beyond traditional text-based approaches. Although many existing systems use multimedia content to search large image collections, they still face difficulties when dealing with continuously growing datasets, especially in specialized domains such as medical imaging. Medical images—captured through different modalities like MRI, CT scans, and X-rays—require accurate identification of their type to support better diagnosis and improve retrieval precision. To address this challenge, this study presents a comprehensive framework for classifying and retrieving medical images based on their modality, using advanced feature extraction and machine learning techniques. The proposed approach combines seven different visual features to capture various aspects of image content, including texture, edges, and color. These features include Scale-Invariant Feature Transform (SIFT), Local Binary Patterns (LBP), Local Ternary Patterns (LTP), Edge Histogram Descriptor (EHD), Color and Edge Directivity Descriptor (CEDD), wavelet-based color edge features, and color histograms. All extracted features are merged into a single feature vector, allowing a more complete and descriptive representation of each image. The system was tested using the ImageCLEF2012 modality classification dataset, which contains 31 different types of medical imaging modalities. For classification, a Support Vector Machine (SVM) with a chi-square kernel was used, as it is well-suited for handling complex and high-dimensional data. The proposed method achieved an overall accuracy of 72.2%, outperforming the best visual feature-based result from ImageCLEF2012 by 2.6%. This performance improvement highlights the effectiveness of combining multiple features to better distinguish between different image modalities. The study’s key contribution lies in integrating wavelet-based edge information with texture features, along with the use of a chi-square kernel to improve classification performance. Overall, this work demonstrates that carefully designed feature fusion techniques, paired with an appropriate machine-learning model, can significantly enhance CBIR systems in medical imaging. Future work may focus on incorporating deep learning methods and extending the framework to handle images that belong to multiple categories simultaneously.