SPARSE-HIERARCHICAL ATTENTION FOR SELF-SUPERVISED INDOOR SCENE CLASSIFICATION: A MASKED PATCH CONTRASTIVE APPROACH

Authors

  • Mubasher Hussain Malik
  • Ammad Hussain

Abstract

We propose a sparse-hierarchical attention mechanism to improve self-supervised learning for indoor scene classification, addressing the computational inefficiency of standard Transformer attention while preserving structural dependencies unique to indoor environments. The proposed method integrates focal attention, which selectively computes interactions for semantically significant regions, and hierarchical pyramid attention, which captures multi-scale spatial reasoning across downsampled feature maps. These components are embedded into a contrastive pretext task framework, where masked patch contrastive learning optimizes feature representations by minimizing the distance between masked and unmasked regions. The sparse-hierarchical attention reduces computational complexity from quadratic to linear with respect to input size, enabling efficient training without sacrificing performance. Moreover, the hierarchical design ensures robust feature extraction across varying scales, which is critical for modeling the complex layouts and object arrangements typical of indoor scenes. We implement the approach within a modified Vision Transformer (ViT) backbone, demonstrating its effectiveness through empirical validation on standard indoor scene datasets. The results show that our method achieves competitive accuracy while significantly reducing memory and computational overhead compared to full self-attention baselines. This work provides a practical solution for scaling self-supervised learning to high-resolution indoor imagery, with potential applications in robotics, augmented reality, and smart environment systems.

Downloads

Published

2026-06-10

How to Cite

Mubasher Hussain Malik, & Ammad Hussain. (2026). SPARSE-HIERARCHICAL ATTENTION FOR SELF-SUPERVISED INDOOR SCENE CLASSIFICATION: A MASKED PATCH CONTRASTIVE APPROACH. Spectrum of Engineering Sciences, 4(6), 1121–1129. Retrieved from https://www.thesesjournal.com/index.php/1/article/view/3175