Citation
Tan, Yong Soon and Lim, Kian Ming and Lee, Chin Poo (2021) Hand gesture recognition via enhanced densely connected convolutional neural network. Expert Systems with Applications, 175. p. 114797. ISSN 0957-4174
Text
Hand gesture recognition via enhanced densely connected....pdf Restricted to Repository staff only Download (5MB) |
Abstract
Hand gesture recognition (HGR) serves as a fundamental way of communication and interaction for human being. While HGR can be applied in human computer interaction (HCI) to facilitate user interaction, it can also be utilized for bridging the language barrier. For instance, HGR can be utilized to recognize sign language, which is a visual language represented by hand gestures and used by the deaf and mute all over the world as a primary way of communication. Hand-crafted approach for vision-based HGR typically involves multiple stages of specialized processing, such as hand-crafted feature extraction methods, which are usually designed to deal with particular challenges specifically. Hence, the effectiveness of the system and its ability to deal with varied challenges across multiple datasets are heavily reliant on the methods being utilized. In contrast, deep learning approach such as convolutional neural network (CNN), adapts to varied challenges via supervised learning. However, attaining satisfactory generalization on unseen data is not only dependent on the architecture of the CNN, but also dependent on the quantity and variety of the training data. Therefore, a customized network architecture dubbed as enhanced densely connected convolutional neural network (EDenseNet) is proposed for vision-based hand gesture recognition. The modified transition layer in EDenseNet further strengthens feature propagation, by utilizing bottleneck layer to propagate the features being reused to all the feature maps in a bottleneck manner, and the following Conv layer smooths out the unwanted features. Differences between EDenseNet and DenseNet are discerned, and its performance gains are scrutinized in the ablation study. Furthermore, numerous data augmentation techniques are utilized to attenuate the effect of data scarcity, by increasing the quantity of training data, and enriching its variety to further improve generalization. Experiments have been carried out on multiple datasets, namely one NUS hand gesture dataset and two American Sign Language (ASL) datasets. The proposed EDenseNet obtains 98.50% average accuracy without augmented data, and 99.64% average accuracy with augmented data, outperforming other deep learning driven instances in both settings, with and without augmented data.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Neural networks (Computer science) |
Subjects: | Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science |
Divisions: | Faculty of Information Science and Technology (FIST) |
Depositing User: | Ms Nurul Iqtiani Ahmad |
Date Deposited: | 09 May 2021 15:03 |
Last Modified: | 09 May 2021 15:03 |
URII: | http://shdl.mmu.edu.my/id/eprint/8706 |
Downloads
Downloads per month over past year
Edit (login required) |