A HYBRID DEEP LEARNING FRAMEWORK INTEGRATING LSTM AND LIGHTGBM FOR SENTIMENT ANALYSIS OF ROMAN URDU TEXT

Authors

  • Kanwal Mehmood
  • Muhammad Ahsan Naeem
  • Muhammad Imran

Keywords:

Sentiment analysis; Roman Urdu; low-resource languages; hybrid deep learning; LSTM; LightGBM; natural language processing; text classification

Abstract

Sentiment analysis is central to extracting opinions and emotional context from user-generated text, yet its application to Roman Urdu remains constrained by the language's informal usage, non-standardised orthography, and scarcity of annotated resources. This study proposes a hybrid classification framework that couples a Long Short-Term Memory (LSTM) network with a Light Gradient Boosting Machine (LightGBM) classifier to improve sentiment prediction for Roman Urdu. The LSTM branch models sequential and contextual dependencies in the text, while the LightGBM branch captures non-linear interactions among engineered features; the two branches are combined through a weighted Softmax fusion layer. A publicly available Roman Urdu corpus of 98,984 samples obtained from Kaggle was preprocessed using a custom tokenizer, transliteration-aware normalisation, and language-specific stop-word removal. The framework was trained and evaluated using stratified ten-fold cross-validation. The hybrid model achieved a classification accuracy of 97.74%, exceeding the standalone LSTM (93.72%) and standalone LightGBM (69.51%) models, and also outperforming conventional classifiers including Random Forest, Support Vector Machine, and k-Nearest Neighbour. The results indicate that integrating sequential representation learning with gradient-boosted feature modelling is an effective strategy for sentiment analysis in low-resource, non-standardised languages, and provide a basis for future work on multilingual and code-mixed sentiment systems.

Downloads

Published

2026-06-15

How to Cite

Kanwal Mehmood, Muhammad Ahsan Naeem, & Muhammad Imran. (2026). A HYBRID DEEP LEARNING FRAMEWORK INTEGRATING LSTM AND LIGHTGBM FOR SENTIMENT ANALYSIS OF ROMAN URDU TEXT. Spectrum of Engineering Sciences, 4(6), 1501–1515. Retrieved from https://www.thesesjournal.com/index.php/1/article/view/3230