TY - GEN
T1 - Siamese mViT
T2 - 2025 Cyber Awareness and Research Symposium, CARS 2025
AU - Gharami, Kanchon
AU - Moni, Shafika Showkat
AU - Kandel, Laxima Niure
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Facial recognition has become a widely adopted biometric identification method, revolutionizing authentication process across platforms such as smartphones, and edge IoT devices like smartwatches, drones, autonomous vehicles and smart home security systems. While existing sophisticated models such as Vision Transformers (ViT) deliver impressive accuracy, their reliance on large datasets and heavy computational requirements creates challenges for resource constrained edge devices that struggle to handle their complexity, and large volume of training data. To overcome these limitations, we propose Siamese mViT, a lightweight network that integrates a mobile-optimized Vision Transformer (MobileViT) block into a Siamese architecture for facial authentication. Analysis on the Cross-modal Face-Periocular dataset demonstrate that proposed Siamese mViT achieves 93.12% accuracy with remarkably small dataset of just 190,876 pairs of face and periocular images (1.24GB). Our results demonstrate that Siamese mViT model is both lightweight and accurate, making it well-suited for deployment on edge devices.
AB - Facial recognition has become a widely adopted biometric identification method, revolutionizing authentication process across platforms such as smartphones, and edge IoT devices like smartwatches, drones, autonomous vehicles and smart home security systems. While existing sophisticated models such as Vision Transformers (ViT) deliver impressive accuracy, their reliance on large datasets and heavy computational requirements creates challenges for resource constrained edge devices that struggle to handle their complexity, and large volume of training data. To overcome these limitations, we propose Siamese mViT, a lightweight network that integrates a mobile-optimized Vision Transformer (MobileViT) block into a Siamese architecture for facial authentication. Analysis on the Cross-modal Face-Periocular dataset demonstrate that proposed Siamese mViT achieves 93.12% accuracy with remarkably small dataset of just 190,876 pairs of face and periocular images (1.24GB). Our results demonstrate that Siamese mViT model is both lightweight and accurate, making it well-suited for deployment on edge devices.
KW - Biometric Authentication
KW - Face Recognition
KW - Face Verification
KW - Lightweight Models
KW - Limited Data
KW - MobileViT
KW - Siamese Networks
UR - https://www.scopus.com/pages/publications/105033561991
U2 - 10.1109/CARS67163.2025.11337623
DO - 10.1109/CARS67163.2025.11337623
M3 - Conference contribution
AN - SCOPUS:105033561991
T3 - 2025 Cyber Awareness and Research Symposium, CARS 2025
BT - 2025 Cyber Awareness and Research Symposium, CARS 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 October 2025 through 30 October 2025
ER -