Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation
Buccoli, Du, Soendergaard et al.
Upon choosing microphones for automotive hands-free communication or Automatic Speech Recognition (ASR) applications, OEMs typically specify wideband, super wideband or even fullband requirements following established standard recommendations (e.g., ITU-P.1110, ITU-P.1120). In practice, it is often challenging to achieve the preferred bandwidth for an automotive microphone when considering limitations and constraints on microphone placement inside the cabin, and the automotive grade environmental robustness requirements. On the other hand, there seems to be no consensus or sufficient data on the effect of each microphone characteristic on the actual performance. As an attempt to answer this question, we used noise signals recorded in real vehicles and under various driving conditions to experimentally study the relationship between the microphones' characteristics and the final audio quality of speech communication and performance of ASR engines. We focus on how variations in microphone bandwidth and amplitude frequency response shapes affect the perceptual speech quality. The speech quality results are compared by using ETSI TS 103 281 metrics (S-MOS, N-MOS, G-MOS) and ancillary metrics such as SNR. The ASR results are evaluated with standard metrics such as Word Error Rate (WER). Findings from this study provide knowledge in the understanding of what microphone frequency response characteristics are more relevant for audio quality and choice of proper microphone specifications, particularly for automotive applications.
academic
Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation
Title: Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation
Authors: Michele Buccoli, Yu Du, Jacob Soendergaard, Simone Shawn Cazzaniga
Classification: eess.AS (Electrical Engineering and Systems Science - Audio and Speech Processing), cs.SD (Computer Science - Sound)
Publication Time/Conference: AES 159th Convention, Oct 23-25, Long Beach, CA, USA (Express Paper)
This study addresses the critical issue of microphone selection for automotive hands-free communication and automatic speech recognition (ASR) applications through experimental investigation of the relationship between microphone frequency response characteristics and speech quality and ASR performance. The research employs noise signals recorded in real vehicle environments to evaluate the effects of microphone bandwidth and amplitude-frequency response shape variations on perceived speech quality. Speech quality assessment utilizes S-MOS, N-MOS, G-MOS metrics and auxiliary indicators such as SNR according to ETSI TS 103 281 standard, while ASR performance is evaluated through word error rate (WER). The research results provide important insights into understanding the effects of microphone frequency response characteristics on audio quality, particularly offering guidance for microphone specification selection in automotive applications.
Automotive OEM manufacturers typically follow recommendations from standards such as ITU-P.1110 and ITU-P.1120 when selecting microphones for hands-free communication or ASR applications, requiring wideband, super-wideband, or even full-band specifications. However, in practical applications, it is difficult to achieve ideal bandwidth specifications considering the constraints of microphone installation positions within vehicles and the stringent automotive-grade environmental robustness requirements.
Existing related research is primarily based on specific types of automotive microphones, with research scope limited to the inherent characteristics of these microphones, failing to demonstrate general trends regarding the effects of microphone characteristic variations on speech and ASR quality.
Established a systematic evaluation framework: Constructed an experimental evaluation platform for assessing the effects of microphone frequency response characteristics on speech quality and ASR performance
Comprehensive characteristic analysis: Systematically investigated the effects of microphone bandwidth, frequency response peaks, and other characteristics on performance
Multi-dimensional assessment: Simultaneously evaluated speech quality for human-to-human (H2H) communication and ASR performance for human-to-machine (H2M) interaction
Real environment verification: Employed noise recordings from real vehicle environments for validation
Standardized evaluation metrics: Adopted ETSI standard MOS scores and standard ASR evaluation metrics
To investigate the effects of microphone frequency response characteristics (bandwidth, peak frequency, quality factor) on speech quality (S-MOS, N-MOS) and ASR performance (WER) under different vehicle types and noise conditions.
Low cutoff frequency effect: S-MOS values at 20Hz and 100Hz cutoff frequencies are similar, both higher than at 350Hz
High cutoff frequency effect is weak: At the same low cutoff frequency, high-end bandwidth limitations have minimal impact on S-MOS
Statistical significance: Low cutoff frequency variation p-value approaches 0 (F-statistic=1174), high cutoff frequency p-value is 0.755 (F-statistic=0.47)
Du et al. (2019): First study investigating the association between three automotive microphones and user experience, using SII and subjective listening tests
Du (2023): Extended research including objective and subjective speech intelligibility and quality assessment
Maver et al. (2024): Investigated acoustic front-end performance across four different automotive microphone types and installation positions
Extended vehicle range: Include more vehicle types to analyze effects of objective vehicle characteristics (size, class, RT60)
Decoupling noise and vehicle type: Create combinations of all vehicle types and driving noises to effectively decouple influencing factors
Speaker characteristic research: Investigate interaction effects between speaker characteristics such as pitch frequency and microphone characteristics
Diversified filter design: Explore effects of different filter orders and different peak magnitudes
Dedicated ASR engines: Evaluate performance of automotive-specific ASR engines
This research cites multiple important international standards and prior research works, including ITU-T P.501, ETSI TS 103 281, ITU-P.1100 and other standard documents, as well as pioneering work by Du et al. on automotive microphone performance assessment. These references provide a solid theoretical foundation and methodological guidance for this research.