AN EXPLAINABLE MACHINE LEARNING FRAMEWORK FOR PHISHING DETECTION USING URL STRUCTURAL AND BEHAVIORAL FEATURES

Authors

  • Nayab Imtiaz
  • Muazzam Ali
  • M U Hashmi
  • Zarqa Zafar
  • Asifa Ittfaq

Keywords:

Phishing Detection; URL Analysis; Machine Learning; Explainable Artificial Intelligence; Random Forest; SHAP

Abstract

Phishing is one of the most prevalent cybersecurity threats, which uses misleading URLs to steal sensitive user data using more advanced attack techniques. Conventional detection systems, such as blacklists and rule-based systems, cannot be used to detect fast-changing and short-lived phishing campaigns. This paper presents a phishing detection model that is explainable and data-driven and uses structural, lexical, behavioral, and protocol-based URL characteristics to detect threats in real-time. An analysis of a dataset of 11,430 labeled URLs was performed, and 28 discriminative features were chosen out of an initial set of 89 attributes. Four machine learning classifiers were tested: Logistic Regression, Linear SVM, Gradient Boosting, and Random Forest. The experimental findings indicate that the Random Forest model has a better performance with an accuracy of 96.27%, precision of 96.37%, recall of 96.15%, and the lowest overall misclassification rate. In order to overcome the interpretability gap that is often linked to high-performing models, SHAP (SHapley Additive Explanations) was used to give clear information about the contribution of features. The analysis shows that the most significant indicators of phishing behavior are URL length, hostname length, domain age, and dot count. The suggested framework effectively balances the accuracy of detection with the transparency of the model, providing a powerful, interpretable, and scalable framework that can be deployed in the real-world cybersecurity setting.

Downloads

Published

2026-05-20

How to Cite

Nayab Imtiaz, Muazzam Ali, M U Hashmi, Zarqa Zafar, & Asifa Ittfaq. (2026). AN EXPLAINABLE MACHINE LEARNING FRAMEWORK FOR PHISHING DETECTION USING URL STRUCTURAL AND BEHAVIORAL FEATURES. Spectrum of Engineering Sciences, 4(5), 1639–1653. Retrieved from https://www.thesesjournal.com/index.php/1/article/view/2881