PARSIMONIOUS GESTURE BENCHMARKING FOR DUPLICATE-CONTAMINATED TOUCHLESS DOCUMENT INTERACTION

Basit Raza; Samina Rajper; Noor Ahmed Shaikh; Zahid Hussain Shar; Iqra Hyder

Authors

Basit Raza
Samina Rajper
Noor Ahmed Shaikh
Zahid Hussain Shar
Iqra Hyder

Keywords:

Touchless Document Interaction; Static Hand Gesture Recognition; Leakage-Aware Benchmarking; Hash-Deduplicated Evaluation; Frugal Vision Models; Deployment-Oriented Gesture Interfaces

Abstract

Touchless document control is attractive for low-contact settings such as document browsing, command selection, and OCR triggering, yet small-vocabulary gesture interfaces are often reported without sufficient attention to benchmark hygiene or deployment cost. This study presents a leakage-aware, deployment-oriented benchmark analysis of a four-command static hand-gesture interface for touchless document interaction. We first audit the official split of a public benchmark and identify a serious evaluation issue 996 exact duplicate samples appear across the validation and test sets. To obtain a fairer assessment, we construct a hash-deduplicated clean split and compare two lightweight recognition routes a LandmarkMLP built on MediaPipe hand landmarks and normalized geometric features, and an image-based MobileNetV3-Small baseline trained on hand crops. On the clean split, MobileNetV3-Small achieves 99.90% accuracy and 0.9990 macro-F1 on the full test set, while LandmarkMLP reaches 99.48% accuracy and 0.9948 macro-F1 on samples with successful hand detection. Despite slightly lower recognition performance, LandmarkMLP is markedly more efficient, requiring only 0.505 ms average inference time and 0.289 MB of model storage, compared with 5.15 ms and 5.93 MB for the image baseline. Corruption experiments show strong performance under low light, blur, and JPEG compression, but also reveal that the landmark route’s end-to-end robustness deteriorates under severe Gaussian noise because detector failures increase sharply. Overall, the results support the feasibility of low-cost touchless document interaction in controlled static-gesture settings, while emphasizing that fair evaluation and end-to-end reliability are as important as raw classification accuracy.