DYNAMIC URDU DISCOURSE-AWARE PROMPT TUNING (DUDAPT) FOR CONTEXT-ADAPTIVE IMAGE CAPTIONING
Abstract
We propose Dynamic Urdu Discourse-Aware Prompt Tuning (DUDAPT), a novel framework for context-adaptive image captioning that addresses the unique challenges of Urdu language integration. Traditional captioning systems rely on static word embeddings, which often fail to capture Urdu’s rich discourse features such as syntactic complexity and anaphora resolution. The proposed method introduces a dynamic embedding layer that adapts to linguistic context through three key components: a Discourse Complexity Analyzer (DCA) to evaluate sentence complexity in real-time, a Dynamic Prompt Pool (DPP) that selectively activates context-aware soft prompts, and an Urdu-Aware Embedding Projector to align tokens with visual-semantic spaces. The DCA employs a lightweight transformer to compute complexity scores, which then guide the DPP to expand or prune prompts dynamically. Moreover, the projector combines frozen Urdu embeddings with adaptive prompts, enabling seamless integration with conventional language decoders. The framework is realized using a distilled Urdu-BERT model for efficiency and meta-learned multilingual prompts for robustness. Experimental validation demonstrates that DUDAPT outperforms fixed-embedding approaches by effectively capturing discourse nuances while maintaining compatibility with existing captioning pipelines. This work bridges a critical gap in low-resource language processing, offering a scalable solution for Urdu-centric multimodal applications.













