Citation
Renukadevi, M. N. and Rajesh, T. M. and Shaila, S. G. and Kulkarni, Praveen and Babu, Tina and Nair, Rekha R. and Yogarayan, Sumendra (2026) Modeling audio dynamics using hierarchical assisted K-means model for structured speaker profiling in TED talks. Scientific Reports, 16 (1). ISSN 2045-2322|
Text
s41598-026-47033-4.pdf - Published Version Restricted to Repository staff only Download (3MB) |
Abstract
Speaker vocal delivery significantly impacts audience engagement in knowledge-sharing platforms like TED Talks. However, existing classification approaches face limitations in noise handling, feature extraction, and clustering robustness. The study investigates vocal behavior using a curated set of 5,000 TED Talk audio clips from 500 diverse speakers, analyzing voiced-unvoiced segments based on acoustic features including Zero Crossing Rate (ZCR), Short-Term Energy (STE), spectral measures, and auto correlation-based periodicity. Consistent patterns were observed across all speakers: voiced segments exhibited higher energy and strong periodicity, whereas unvoiced frames showed elevated zero-crossing activity and wide spectral spread. These features served as input for extensive clustering experiments aimed at profiling speaker vocal characteristics. The Proposed research aimed to address these gaps by developing a robust and interpretable hybrid clustering framework that combines K-means initialization (k = 6) with hierarchical agglomerative refinement, ultimately profiling TED speakers into six interpretable vocal types: Energetic, Balanced, Rhythmic, Flat, Noisy, and Muffled. Four baseline clustering methods (K-means, Spectral, Agglomerative, DBSCAN) demonstrated only moderate performance with silhouette scores of 0.52-0.60. In contrast, the proposed hybrid method achieved markedly superior results, a silhouette score of 0.90 ± 0.02, Davies -Bouldin (DB) index of 0.92 ± 0.04, and Calinski-Harabasz (CH) score of 1,020.5 ± 25.0. Feature-ablation tests further validated the importance of Mean Power and Magnitude, as removing both features degraded clustering quality by 5.6% and increased the DB index by 30.4%, confirming that the full-feature Hybrid approach delivers the most robust structure
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Zero Crossing Rate (ZCR), Short-Term Energy (STE), |
| Subjects: | Q Science > QA Mathematics > QA71-90 Instruments and machines |
| Divisions: | Faculty of Computing and Informatics (FCI) |
| Depositing User: | Ms Rosnani Abd Wahab |
| Date Deposited: | 01 Jul 2026 02:03 |
| Last Modified: | 01 Jul 2026 02:03 |
| URII: | http://shdl.mmu.edu.my/id/eprint/16173 |
Downloads
Downloads per month over past year
Edit (login required) |
