Citation
Ong, Kah Liang and Lee, Chin Poo and Lim, Heng Siong and Lim, Kian Ming and Alqahtani, Ali (2024) MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition. IEEE Access, 12. pp. 18237-18250. ISSN 2169-3536
Text
5.pdf - Published Version Restricted to Repository staff only Download (1MB) |
Abstract
Vision Transformers, known for their innovative architectural design and modeling capabilities, have gained significant attention in computer vision. This paper presents a dual-path approach that leverages the strengths of the Multi-Axis Vision Transformer (MaxViT) and the Improved Multiscale Vision Transformer (MViTv2). It starts by encoding speech signals into Constant-Q Transform (CQT) spectrograms and Mel Spectrograms with Short-Time Fourier Transform (Mel-STFT). The CQT spectrogram is then fed into the MaxViT model, while the Mel-STFT is input to the MViTv2 model to extract informative features from the spectrograms. These features are integrated and passed into a Multilayer Perceptron (MLP) model for final classification. This hybrid model is named the ‘‘MaxViT and MViTv2 Fusion Network with Multilayer Perceptron (MaxMViT-MLP).’’ The MaxMViT-MLP model achieves remarkable results with an accuracy of 95.28% on the Emo-DB, 89.12% on the RAVDESS dataset, and 68.39% on the IEMOCAP dataset, substantiating the advantages of integrating multiple audio feature representations and Vision Transformers in speech emotion recognition.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Speech emotion recognition, ensemble learning, spectrogram, vision transformer, Emo-DB, RAVDESS, IEMOCAP. |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television |
Divisions: | Faculty of Information Science and Technology (FIST) |
Depositing User: | Ms Nurul Iqtiani Ahmad |
Date Deposited: | 04 Mar 2024 03:07 |
Last Modified: | 04 Mar 2024 03:07 |
URII: | http://shdl.mmu.edu.my/id/eprint/12165 |
Downloads
Downloads per month over past year
Edit (login required) |