Stacking of Distilled Squeeze-And-Excitation vision transformers for hand gesture recognition

Citation

Tan, Chun Keat (2024) Stacking of Distilled Squeeze-And-Excitation vision transformers for hand gesture recognition. Masters thesis, Multimedia University.

Full text not available from this repository.
Official URL: http://erep.mmu.edu.my/

Abstract

Hand Gesture Recognition (HGR) has emerged as a pivotal technology in enhancing Human-Computer Interaction (HCI), particularly valuable for aiding communication in the mute and deaf community. Despite its potential, vision-based HGR systems face significant challenges, including sensitivity to variable lighting, occlusions, background clutter, and variations in hand sizes and shapes. Furthermore, the large and complex models typically used require substantial time and memory resources, while often suffering from overfitting and poor generalisation when encountering unseen data. Over the years, researchers have proposed various methods to tackle the HGR problem, broadly categorised into hand-crafted and deep learning approaches. However, the potential of Transformer-based methods for HGR remains largely unexplored, offering a promising area for further exploration. To address the challenges encountered in vision-based HGR, this thesis introduces three innovative Transformer-based methods: (1) Squeeze-and-Excitation Vision Transformer (SEViT), which incorporates squeeze-excitation blocks to improve feature extraction from small datasets; (2) Distilled Squeeze-and-Excitation Vision Transformer (KD-SEViT), which leverages knowledge distillation to reduce model trainable parameters and computational demands; and (3) Stacking of Distilled Squeeze-and-Excitation Vision Transformer (SKD-SEViT), which employs a stacking technique of multiple distilled models to enhance generalisation and robustness. The effectiveness of these methods has been validated using three publicly available benchmark datasets: American Sign Language (ASL), ASL with digits, and NUS Hand Gesture. The experimental results reveal that the proposed methods outperform existing models, with the SKD-SEViT achieving remarkable accuracies of 100.00% on the ASL dataset, 99.60% on the ASL with digits, and 100.00% on the NUS Hand Gesture dataset.

Item Type: Thesis (Masters)
Additional Information: Call No.: Q325.73 .T36 2024
Uncontrolled Keywords: Deep learning (Machine learning)
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 03 Feb 2025 06:27
Last Modified: 03 Feb 2025 06:27
URII: http://shdl.mmu.edu.my/id/eprint/13347

Downloads

Downloads per month over past year

View ItemEdit (login required)