MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition

Citation

Ong, Kah Liang and Lee, Chin Poo and Lim, Heng Siong and Lim, Kian Ming and Alqahtani, Ali (2024) MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition. IEEE Access, 12. pp. 18237-18250. ISSN 2169-3536

[img] Text
5.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

Vision Transformers, known for their innovative architectural design and modeling capabilities, have gained significant attention in computer vision. This paper presents a dual-path approach that leverages the strengths of the Multi-Axis Vision Transformer (MaxViT) and the Improved Multiscale Vision Transformer (MViTv2). It starts by encoding speech signals into Constant-Q Transform (CQT) spectrograms and Mel Spectrograms with Short-Time Fourier Transform (Mel-STFT). The CQT spectrogram is then fed into the MaxViT model, while the Mel-STFT is input to the MViTv2 model to extract informative features from the spectrograms. These features are integrated and passed into a Multilayer Perceptron (MLP) model for final classification. This hybrid model is named the ‘‘MaxViT and MViTv2 Fusion Network with Multilayer Perceptron (MaxMViT-MLP).’’ The MaxMViT-MLP model achieves remarkable results with an accuracy of 95.28% on the Emo-DB, 89.12% on the RAVDESS dataset, and 68.39% on the IEMOCAP dataset, substantiating the advantages of integrating multiple audio feature representations and Vision Transformers in speech emotion recognition.

Item Type: Article
Uncontrolled Keywords: Speech emotion recognition, ensemble learning, spectrogram, vision transformer, Emo-DB, RAVDESS, IEMOCAP.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 04 Mar 2024 03:07
Last Modified: 04 Mar 2024 03:07
URII: http://shdl.mmu.edu.my/id/eprint/12165

Downloads

Downloads per month over past year

View ItemEdit (login required)