Citation
Ong, Kah Liang and Lee, Chin Poo and Lim, Heng Siong and Lim, Kian Ming (2023) Speech emotion recognition with light gradient boosting decision trees machine. International Journal of Electrical and Computer Engineering (IJECE), 13 (4). p. 4020. ISSN 2088-8708
Text
28940-62448-1-PB.pdf - Published Version Restricted to Repository staff only Download (732kB) |
Abstract
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the emo-DB dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | light gradient boosting machine; machine learning; speech; speech emotion; speech emotion recognition |
Subjects: | Q Science > Q Science (General) > Q300-390 Cybernetics |
Divisions: | Faculty of Information Science and Technology (FIST) |
Depositing User: | Ms Nurul Iqtiani Ahmad |
Date Deposited: | 02 May 2023 08:04 |
Last Modified: | 02 May 2023 08:04 |
URII: | http://shdl.mmu.edu.my/id/eprint/11395 |
Downloads
Downloads per month over past year
Edit (login required) |