Improved speech emotion recognition based on music-related audio features

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)

Abstract

Emotions are essential for human communication as they reflect our inner states and influence our actions. Today, emotions provide crucial information to many applications, from virtual assistants to security systems, mood-tracking wearable devices, and autism robots. The speech emotion recognition (SER) model must be lightweight to run on varying devices with limited computational power. This research investigates the performance of music-related features for SER based on the auditory and neuropsychology evidence about the connection of emotional speech and music in human perception. Unlike prior works on low-level descriptors that primarily focus on differentiating human speech production, our method employs features extracted directly from raw speech signals through Discrete Fourier Transform and Constant-Q Transform. These features represent the perceptual pitches and timbre characteristics of the human voice. The 10-fold cross-validation results show that our method improves the accuracy of the audio feature-based approach on RAVDESS, CREMA-D and IEMOCAP datasets. Findings from the ablation study imply the significance of perceptual pitch, the perceptual loudness and the combination of pitch and timbre features in building a robust SER model. Compared to pretrained deep learning embeddings, our method demonstrates its generalizability and high efficiency despite a much smaller model size.

Original languageEnglish
Title of host publicationProceedings of the 30th European Signal Processing Conference (EUSIPCO 2022)
EditorsTirza Routtenberg, Predrag Tadic
PublisherEuropean Association for Signal Processing
Pages120-124
Number of pages5
ISBN (Electronic)9781665467971
Publication statusPublished - 2022
EventEuropean Signal Processing Conference 2022 - Belgrade, Serbia
Duration: 29 Aug 20222 Sept 2022
Conference number: 30th
https://2022.eusipco.org/ (Website)
https://eurasip.org/Proceedings/Eusipco/Eusipco2022/HTML/session-index.html (Proceedings)

Publication series

NameEuropean Signal Processing Conference
Volume2022-August
ISSN (Print)2219-5491

Conference

ConferenceEuropean Signal Processing Conference 2022
Abbreviated titleEUSIPCO 2022
Country/TerritorySerbia
CityBelgrade
Period29/08/222/09/22
Internet address

Keywords

  • speech emotion recognition
  • audio features
  • music features
  • LLD
  • MFCC
  • CQT
  • Mel spectrogram

Cite this