CONTENT-AWARE, LOW-LATENCY SPAM CALL DETECTION USING EDGE MACHINE LEARNING
Keywords:
CONTENT-AWARE, LOW-LATENCY SPAM CALL DETECTION, USING EDGE MACHINE LEARNINGAbstract
The paper introduces a manuscript on handset-first, content-aware, real-time detection of fraudulent telephone calls. It does not use caller identity or reputation, but the content of the conversation is analyzed. An Android app sends automatic speech-recognition (ASR) transcripts and the current changing text on-device through a lightweight TF-IDF plus Multinomial Naive Bayes model delivered through ONNX Runtime. A deterministic preprocessing pipeline consisting of normalization, tokenization, insertion of conservative placeholders, and lemmatization maintains intent cues and ensures train-serve parity. Segment posteriors are combined with decay and hysteresis to produce calibrated and explainable in-call alerts, backed by the most weighted tokens or phrases. The system has a mobile resource budget of approximately 120ms of inference time, less than 40 MB of memory, and achieves an accuracy of approximately 95%, precision of 92%, recall of 94%, F1 score of 93%, and ROC-AUC of over 0.90 on a labeled corpus. It features a modular architecture that combines cloud-based ASR and on-device classification, and is designed to be privacy-preserving, allowing for opt-in storage with no personally identifiable information required for classification.













