Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Large Multimodal Models (LMMs) have achieved great success recently, demonstrating a strong capability to understand multimodal information and to interact with human users. Despite the progress made, the challenge of detecting high-risk interactions in multimodal settings, and in particular in speech modality, remains largely unexplored. Conventional research on risk for speech modality primarily emphasises the content (e.g., what is captured as transcription). However, in speech-based interactions, paralinguistic cues in audio can significantly alter the intended meaning behind utterances. In this work, we propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity). Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk. We observe even the latest models remain ineffective to detect various paralinguistic-specific risks in speech (e.g., Gemini 1.5 Pro is performing only slightly above random baseline). Warning: this paper contains biased and offensive examples.
Original languageEnglish
Title of host publicationEMNLP 2024, The 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung (Vivian) Chen
Place of PublicationKerrville TX USA
PublisherAssociation for Computational Linguistics (ACL)
Pages10957–10973
Number of pages17
ISBN (Electronic)9798891761643
Publication statusPublished - 2024
EventEmpirical Methods in Natural Language Processing 2024 - Hyatt Regency Miami Hotel, Miami, United States of America
Duration: 12 Nov 202416 Nov 2024
https://aclanthology.org/volumes/2024.emnlp-main/
https://2024.emnlp.org/
https://aclanthology.org/events/emnlp-2024/#2024emnlp-main

Conference

ConferenceEmpirical Methods in Natural Language Processing 2024
Abbreviated titleEMNLP 2024
Country/TerritoryUnited States of America
CityMiami
Period12/11/2416/11/24
Internet address

Cite this