Bidirectional Long Short-Term Memory with Temporal Dense Sampling for human action recognition

Citation

Tan, Kok Seang and Lim, Kian Ming and Lee, Chin Poo and Kwek, Lee Chung (2022) Bidirectional Long Short-Term Memory with Temporal Dense Sampling for human action recognition. Expert Systems with Applications, 210. p. 118484. ISSN 0957-4174

[img] Text
6.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

Long Short-Term Memory networks are making significant inroads into improving time series applications, including human action recognition. In a human action video, the spatial and temporal streams carry distinctive yet prominent information, hence many researchers turn to spatio-temporal models for human action recognition. A spatio-temporal model integrates the temporal network (e.g. Long Short-Term Memory) and spatial network (e.g. Convolutional Neural Networks). There are few challenges in the existing human action recognition: (1) the uni-directional modeling of Long Short-Term Memory making it unable to preserve the information from the future, (2) the sparse sampling strategy tends to lose prominent information when performing dimension reduction on the input of Long Short-Term Memory, and (3) the fusion strategy for consolidating the temporal network and spatial network. In view of this, we propose a Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method to address the above-mentioned challenges. The Temporal Dense Sampling partitions the human action video into segments and then performs maxpooling operation along the temporal axis in each segment. A multi-stream bidirectional Long Short-Term Memory network is adopted to encode the long-term spatial and temporal dependencies in both forward and backward directions. Instead of assigning fixed weights to the spatial network and temporal network, we propose a fusion network where a fully-connected layer is trained to adaptively assign the weights for the networks. The empirical results demonstrate that the proposed Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method outshines the state-of-the-art methods with an accuracy of 94.78% on UCF101 dataset and 70.72% on HMDB51 dataset.

Item Type: Article
Uncontrolled Keywords: Human action recognition, Bidirectional LSTM, Temporal Dense Sampling
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 01 Sep 2022 06:23
Last Modified: 01 Sep 2022 06:23
URII: http://shdl.mmu.edu.my/id/eprint/10421

Downloads

Downloads per month over past year

View ItemEdit (login required)