Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification

Citation

Tan, Pooi Shiang and Lim, Kian Ming and Tan, Cheah Heng and Lee, Chin Poo (2023) Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification. IAENG International Journal of Computer Science, 50 (1). pp. 1-12. ISSN 1819-9224

[img] Text
IJCS_50_1_07.pdf
Restricted to Repository staff only

Download (2MB)

Abstract

Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1- scores of 80.7%, 87.3%, and 69.6%, respectively.

Item Type: Article
Uncontrolled Keywords: Acoustic event classification, DenseNet, multilayer perceptron, time stretching, pitch shifting, frequency spectrogram
Subjects: Q Science > QC Physics > QC170-197 Atomic physics. Constitution and properties of matter Including molecular physics, relativity, quantum theory, and solid state physics
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 11 Apr 2023 01:59
Last Modified: 11 Apr 2023 01:59
URII: http://shdl.mmu.edu.my/id/eprint/11328

Downloads

Downloads per month over past year

View ItemEdit (login required)