PARSIMONIOUS GESTURE BENCHMARKING FOR DUPLICATE-CONTAMINATED TOUCHLESS DOCUMENT INTERACTION
Keywords:
Touchless Document Interaction; Static Hand Gesture Recognition; Leakage-Aware Benchmarking; Hash-Deduplicated Evaluation; Frugal Vision Models; Deployment-Oriented Gesture InterfacesAbstract
Touchless document control is attractive for low-contact settings such as document browsing, command selection, and OCR triggering, yet small-vocabulary gesture interfaces are often reported without sufficient attention to benchmark hygiene or deployment cost. This study presents a leakage-aware, deployment-oriented benchmark analysis of a four-command static hand-gesture interface for touchless document interaction. We first audit the official split of a public benchmark and identify a serious evaluation issue 996 exact duplicate samples appear across the validation and test sets. To obtain a fairer assessment, we construct a hash-deduplicated clean split and compare two lightweight recognition routes a LandmarkMLP built on MediaPipe hand landmarks and normalized geometric features, and an image-based MobileNetV3-Small baseline trained on hand crops. On the clean split, MobileNetV3-Small achieves 99.90% accuracy and 0.9990 macro-F1 on the full test set, while LandmarkMLP reaches 99.48% accuracy and 0.9948 macro-F1 on samples with successful hand detection. Despite slightly lower recognition performance, LandmarkMLP is markedly more efficient, requiring only 0.505 ms average inference time and 0.289 MB of model storage, compared with 5.15 ms and 5.93 MB for the image baseline. Corruption experiments show strong performance under low light, blur, and JPEG compression, but also reveal that the landmark route’s end-to-end robustness deteriorates under severe Gaussian noise because detector failures increase sharply. Overall, the results support the feasibility of low-cost touchless document interaction in controlled static-gesture settings, while emphasizing that fair evaluation and end-to-end reliability are as important as raw classification accuracy.













