Towards AST-LLDs for the analysis of depression in speech signals

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Recent advancements in deep learning allowed the deployment of large, multimodal models for depression analysis, but the increasing complexity of such models resulted in slow deployment times. This work proposes multi-stream audio-only models for depression analysis, that use transformer weights attended to by low-level descriptors (LLD) through an attention-weighted sum. It operates on the hypothesis that handcrafted feature sets will ameliorate extensive transformer pre-training. Extensive experimentation on the DAIC-WOZ test dataset shows that a combination of an audio spectrogram transformer (AST) and a Mel-frequency cepstral coefficient (MFCC) based convolutional neural network (AST-MFCC) produces the highest accuracy in our suite of models, but reports marginally lower macro F1 scores than both a naive AST and pure LLD-based models, suggesting that the injection of extra feature streams adds a sensibility element to models and limits false positives. However, the naive transformer-based and LLD-based models are surprisingly more effective at flagging depressed patients, although at the cost of an acceptable number of false positives. Our work suggests in totality that the addition of extra feature streams adds a distinct and controllable discriminating power to existing models and is able to assist lightweight models in low-data, audio-only settings.

Original languageEnglish
Title of host publication2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
EditorsTing-Lan Lin, Yoshinobu Kajikawa, Zhaoxia Yin
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1323-1328
Number of pages6
ISBN (Electronic)9798350300673
ISBN (Print)9798350300680
DOIs
Publication statusPublished - 2023
EventAnnual Summit and Conference of the Asia-Pacific-Signal-and-Information-Processing-Association (APSIPA) 2023 - Taipei, Taiwan
Duration: 31 Oct 20233 Nov 2023
Conference number: 15th
https://www.apsipa2023.org/ (Website)
https://ieeexplore.ieee.org/xpl/conhome/10317071/proceeding (Proceedings)

Conference

ConferenceAnnual Summit and Conference of the Asia-Pacific-Signal-and-Information-Processing-Association (APSIPA) 2023
Abbreviated titleAPSIPA 2023
Country/TerritoryTaiwan
CityTaipei
Period31/10/233/11/23
Internet address

Cite this