TY - JOUR
T1 - Plug-and-Play Microphones for Recording Speech and Voice with Smart Devices
AU - Noffs, Gustavo
AU - Cobler-Lichter, Matthew
AU - Perera, Thushara
AU - Kolbe, Scott C.
AU - Butzkueven, Helmut
AU - Boonstra, Frederique M.C.
AU - van der Walt, Anneke
AU - Vogel, Adam P.
N1 - Funding Information:
Adam Vogel is an employee at Redenlab Inc. He receives grant and fellowship funding from the National Health and Medical Research Council of Australia. Andrew Evans, Frederique M.C. Boonstra, Helmut Butzkueven, Matthew Cobler-Lichter, and Thushara Perera have nothing to disclose. Anneke van der Walt receives grant support from the National Health and Medical Research Council of Australia and Multiple Sclerosis Research Australia. Gustavo Noffs is an employee at Redenlab Inc. Scott Kolbe receives grant income from the National Health and Medical Research Council of Australia.
Funding Information:
This work was funded by National Health and Medical Research Council of Australia, Grant 1085461, which paid for data collection costs (equipment, room, storage) as well as research stipends for Frederique M.C. Boonstra and Gustavo Noffs.
Publisher Copyright:
© 2023 The Author(s).
PY - 2024/8
Y1 - 2024/8
N2 - Introduction: Smart devices are widely available and capable of quickly recording and uploading speech segments for health-related analysis. The switch from laboratory recordings with professional-grade microphone setups to remote, smart device-based recordings offers immense potential for the scalability of voice assessment. Yet, a growing body of literature points to a wide heterogeneity among acoustic metrics for their robustness to variation in recording devices. The addition of consumer-grade plug- and-play microphones has been proposed as a possible solution. The aim of our study was to assess if the addition of consumer-grade plug-and-play microphones increases the acoustic measurement agreement between ultra-portable devices and a reference microphone. Methods: Speech was simultaneously recorded by a reference high-quality microphone commonly used in research and by two configurations with plug-and-play microphones. Twelve speech-acoustic features were calculated using recordings from each microphone to determine the agreement intervals in measurements between microphones. Agreement intervals were then compared to expected deviations in speech in various neurological conditions. Each microphone's response to speech and to silence was characterized through acoustic analysis to explore possible reasons for differences in acoustic measurements between microphones. The statistical differentiation of two groups, neurotypical and people with multiple sclerosis, using metrics from each tested microphone was compared to that of the reference microphone. Results: The two consumer-grade plug-and-play microphones favored high frequencies (mean center of gravity difference ≥ +175.3 Hz) and recorded more noise (mean difference in signal to noise ≤ −4.2 dB) when compared to the reference microphone. Between consumer-grade microphones, differences in relative noise were closely related to distance between the microphone and the speaker's mouth. Agreement intervals between the reference and consumer-grade microphones remained under disease-expected deviations only for fundamental frequency (f0, agreement interval ≤0.06 Hz), f0 instability (f0 CoV, agreement interval ≤0.05%), and tracking of second formant movement (agreement interval ≤1.4 Hz/ms). Agreement between microphones was poor for other metrics, particularly for fine timing metrics (mean pause length and pause length variability for various tasks). The statistical difference between the two groups of speakers was smaller with the plug-and-play than with the reference microphone. Conclusion: Measurement of f0 and F2 slope was robust to variation in recording equipment, while other acoustic metrics were not. Thus, the tested plug-and-play microphones should not be used interchangeably with professional-grade microphones for speech analysis. Plug- and-play microphones may assist in equipment standardization within speech studies, including remote or self-recording, possibly with small loss in accuracy and statistical power as observed in the current study.
AB - Introduction: Smart devices are widely available and capable of quickly recording and uploading speech segments for health-related analysis. The switch from laboratory recordings with professional-grade microphone setups to remote, smart device-based recordings offers immense potential for the scalability of voice assessment. Yet, a growing body of literature points to a wide heterogeneity among acoustic metrics for their robustness to variation in recording devices. The addition of consumer-grade plug- and-play microphones has been proposed as a possible solution. The aim of our study was to assess if the addition of consumer-grade plug-and-play microphones increases the acoustic measurement agreement between ultra-portable devices and a reference microphone. Methods: Speech was simultaneously recorded by a reference high-quality microphone commonly used in research and by two configurations with plug-and-play microphones. Twelve speech-acoustic features were calculated using recordings from each microphone to determine the agreement intervals in measurements between microphones. Agreement intervals were then compared to expected deviations in speech in various neurological conditions. Each microphone's response to speech and to silence was characterized through acoustic analysis to explore possible reasons for differences in acoustic measurements between microphones. The statistical differentiation of two groups, neurotypical and people with multiple sclerosis, using metrics from each tested microphone was compared to that of the reference microphone. Results: The two consumer-grade plug-and-play microphones favored high frequencies (mean center of gravity difference ≥ +175.3 Hz) and recorded more noise (mean difference in signal to noise ≤ −4.2 dB) when compared to the reference microphone. Between consumer-grade microphones, differences in relative noise were closely related to distance between the microphone and the speaker's mouth. Agreement intervals between the reference and consumer-grade microphones remained under disease-expected deviations only for fundamental frequency (f0, agreement interval ≤0.06 Hz), f0 instability (f0 CoV, agreement interval ≤0.05%), and tracking of second formant movement (agreement interval ≤1.4 Hz/ms). Agreement between microphones was poor for other metrics, particularly for fine timing metrics (mean pause length and pause length variability for various tasks). The statistical difference between the two groups of speakers was smaller with the plug-and-play than with the reference microphone. Conclusion: Measurement of f0 and F2 slope was robust to variation in recording equipment, while other acoustic metrics were not. Thus, the tested plug-and-play microphones should not be used interchangeably with professional-grade microphones for speech analysis. Plug- and-play microphones may assist in equipment standardization within speech studies, including remote or self-recording, possibly with small loss in accuracy and statistical power as observed in the current study.
KW - Acoustic analysis
KW - MeSH descriptors
KW - Microphone
KW - Remote assessment
KW - Speech
KW - Voice
UR - https://www.scopus.com/pages/publications/85195061955
U2 - 10.1159/000535152
DO - 10.1159/000535152
M3 - Article
C2 - 37972580
AN - SCOPUS:85195061955
SN - 1021-7762
VL - 76
SP - 372
EP - 385
JO - Folia Phoniatrica et Logopaedica
JF - Folia Phoniatrica et Logopaedica
IS - 4
ER -