Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features

Wissam A. Jassim, Raveendran Paramesran, Naomi Harte

Research output: Contribution to journalArticleResearchpeer-review

28 Citations (Scopus)


Recently, increasing attention has been directed to study and identify the emotional content of a spoken utterance. This study introduces a method to improve emotion classification performance under clean and noisy environments by combining two types of features: the proposed neural-responses-based features and the traditional INTERSPEECH 2010 paralinguistic emotion challenge features. The neural-responses-based features are represented by the responses of a computational model of the auditory system for listeners with normal hearing. The model simulates the responses of an auditory-nerve fibre with a characteristic frequency to a speech signal. The simulated responses of the model are represented by the 2D neurogram (time-frequency representation). The neurogram image is sub-divided into non-overlapped blocks and the averaged value of each block is computed. The neurogram features and the traditional emotion features are combined together to form the feature vector for each speech signal. The features are trained using support vector machines to predict the emotion of speech. The performance of the proposed method is evaluated on two well-known databases: the eNTERFACE and Berlin emotional speech data set. The results show that the proposed method performed better when compared with the classification results obtained using neurogram and INTERSPEECH features separately.

Original languageEnglish
Pages (from-to)587-595
Number of pages9
JournalIET Signal Processing
Issue number5
Publication statusPublished - 1 Jul 2017
Externally publishedYes

Cite this