AUTOMATED SUICIDE RISK DETECTION FROM REDDIT POSTS USING A DEEP LEARNING FRAMEWORK

*Bilal Ajmal; Muhammad Munwar Iqbal; Anees Tariq; Maria Noor Hussain; Samra Batool

Authors

*Bilal Ajmal
Muhammad Munwar Iqbal
Anees Tariq
Maria Noor Hussain
Samra Batool

Abstract

Suicide is a significant public health problem worldwide and about 700,000 people die by suicide each year, according to the World Health Organization. People with suicidal thoughts discuss it on the internet without seeking professional intervention, and automated text analysis may be helpful in the identification of potential risk. A hybrid deep learning system is proposed in this work for the classification of Reddit posts to suicidal and non-suicidal groups using pre-trained contextual transformer-based model RoBERTa that produces embeddings for the text of Reddit posts and parallel CNN layers. The large scale PHR dataset (231,968 Reddit posts, 185,366 training posts and 46,390 testing posts) was used for experiments. The proposed model achieved a higher accuracy of 96.38%, compared to the traditional machine learning baseline and recent deep learning architecture with accuracy of 0.97, recall of 0.97 and macro F1 score of 0.96. The results demonstrate the effectiveness of incorporating the contextual language understanding and multi-scale convolutional feature extraction in the classification of large-scale mental health.

Keywords: Suicide detection, mental health NLP, RoBERTa, convolutional neural network (CNN), Reddit, deep learning, transformer, text classification, PHR dataset, social media monitoring.