Abstract
Emotions are essential for human communication as they reflect our inner states and influence our actions. Today, emotions provide crucial information to many applications, from virtual assistants to security systems, mood-tracking wearable devices, and autism robots. The speech emotion recognition (SER) model must be lightweight to run on varying devices with limited computational power. This research investigates the performance of music-related features for SER based on the auditory and neuropsychology evidence about the connection of emotional speech and music in human perception. Unlike prior works on low-level descriptors that primarily focus on differentiating human speech production, our method employs features extracted directly from raw speech signals through Discrete Fourier Transform and Constant-Q Transform. These features represent the perceptual pitches and timbre characteristics of the human voice. The 10-fold cross-validation results show that our method improves the accuracy of the audio feature-based approach on RAVDESS, CREMA-D and IEMOCAP datasets. Findings from the ablation study imply the significance of perceptual pitch, the perceptual loudness and the combination of pitch and timbre features in building a robust SER model. Compared to pretrained deep learning embeddings, our method demonstrates its generalizability and high efficiency despite a much smaller model size.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th European Signal Processing Conference (EUSIPCO 2022) |
Editors | Tirza Routtenberg, Predrag Tadic |
Publisher | European Association for Signal Processing |
Pages | 120-124 |
Number of pages | 5 |
ISBN (Electronic) | 9781665467971 |
Publication status | Published - 2022 |
Event | European Signal Processing Conference 2022 - Belgrade, Serbia Duration: 29 Aug 2022 → 2 Sept 2022 Conference number: 30th https://2022.eusipco.org/ (Website) https://eurasip.org/Proceedings/Eusipco/Eusipco2022/HTML/session-index.html (Proceedings) |
Publication series
Name | European Signal Processing Conference |
---|---|
Volume | 2022-August |
ISSN (Print) | 2219-5491 |
Conference
Conference | European Signal Processing Conference 2022 |
---|---|
Abbreviated title | EUSIPCO 2022 |
Country/Territory | Serbia |
City | Belgrade |
Period | 29/08/22 → 2/09/22 |
Internet address |
Keywords
- speech emotion recognition
- audio features
- music features
- LLD
- MFCC
- CQT
- Mel spectrogram