2025-11-13T22:49:11.191086

Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation

Buccoli, Du, Soendergaard et al.
Upon choosing microphones for automotive hands-free communication or Automatic Speech Recognition (ASR) applications, OEMs typically specify wideband, super wideband or even fullband requirements following established standard recommendations (e.g., ITU-P.1110, ITU-P.1120). In practice, it is often challenging to achieve the preferred bandwidth for an automotive microphone when considering limitations and constraints on microphone placement inside the cabin, and the automotive grade environmental robustness requirements. On the other hand, there seems to be no consensus or sufficient data on the effect of each microphone characteristic on the actual performance. As an attempt to answer this question, we used noise signals recorded in real vehicles and under various driving conditions to experimentally study the relationship between the microphones' characteristics and the final audio quality of speech communication and performance of ASR engines. We focus on how variations in microphone bandwidth and amplitude frequency response shapes affect the perceptual speech quality. The speech quality results are compared by using ETSI TS 103 281 metrics (S-MOS, N-MOS, G-MOS) and ancillary metrics such as SNR. The ASR results are evaluated with standard metrics such as Word Error Rate (WER). Findings from this study provide knowledge in the understanding of what microphone frequency response characteristics are more relevant for audio quality and choice of proper microphone specifications, particularly for automotive applications.
academic

Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation

Basic Information

  • Paper ID: 2510.09236
  • Title: Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation
  • Authors: Michele Buccoli, Yu Du, Jacob Soendergaard, Simone Shawn Cazzaniga
  • Classification: eess.AS (Electrical Engineering and Systems Science - Audio and Speech Processing), cs.SD (Computer Science - Sound)
  • Publication Time/Conference: AES 159th Convention, Oct 23-25, Long Beach, CA, USA (Express Paper)
  • Paper Link: https://arxiv.org/abs/2510.09236

Abstract

This study addresses the critical issue of microphone selection for automotive hands-free communication and automatic speech recognition (ASR) applications through experimental investigation of the relationship between microphone frequency response characteristics and speech quality and ASR performance. The research employs noise signals recorded in real vehicle environments to evaluate the effects of microphone bandwidth and amplitude-frequency response shape variations on perceived speech quality. Speech quality assessment utilizes S-MOS, N-MOS, G-MOS metrics and auxiliary indicators such as SNR according to ETSI TS 103 281 standard, while ASR performance is evaluated through word error rate (WER). The research results provide important insights into understanding the effects of microphone frequency response characteristics on audio quality, particularly offering guidance for microphone specification selection in automotive applications.

Research Background and Motivation

Problem Definition

Automotive OEM manufacturers typically follow recommendations from standards such as ITU-P.1110 and ITU-P.1120 when selecting microphones for hands-free communication or ASR applications, requiring wideband, super-wideband, or even full-band specifications. However, in practical applications, it is difficult to achieve ideal bandwidth specifications considering the constraints of microphone installation positions within vehicles and the stringent automotive-grade environmental robustness requirements.

Research Significance

  1. Lack of Consensus: The industry lacks consensus and sufficient data regarding the effects of various microphone characteristics on actual performance
  2. Practical Constraints: In-vehicle microphone installation positions are limited, and environmental requirements are stringent
  3. Performance Optimization: There is a need to understand which microphone characteristics are more critical for audio quality and ASR performance

Limitations of Existing Research

Existing related research is primarily based on specific types of automotive microphones, with research scope limited to the inherent characteristics of these microphones, failing to demonstrate general trends regarding the effects of microphone characteristic variations on speech and ASR quality.

Core Contributions

  1. Established a systematic evaluation framework: Constructed an experimental evaluation platform for assessing the effects of microphone frequency response characteristics on speech quality and ASR performance
  2. Comprehensive characteristic analysis: Systematically investigated the effects of microphone bandwidth, frequency response peaks, and other characteristics on performance
  3. Multi-dimensional assessment: Simultaneously evaluated speech quality for human-to-human (H2H) communication and ASR performance for human-to-machine (H2M) interaction
  4. Real environment verification: Employed noise recordings from real vehicle environments for validation
  5. Standardized evaluation metrics: Adopted ETSI standard MOS scores and standard ASR evaluation metrics

Methodology Details

Task Definition

To investigate the effects of microphone frequency response characteristics (bandwidth, peak frequency, quality factor) on speech quality (S-MOS, N-MOS) and ASR performance (WER) under different vehicle types and noise conditions.

Experimental Design Architecture

Signal Generation Model

Simulated recording signals are generated through the following formula:

x(n) = f(s(n) ⋆ h(n) + v(n))

Where:

  • s(n): Clean speech signal according to ITU-T P.501 standard
  • h(n): Vehicle impulse response
  • v(n): Real vehicle background noise
  • f(·): Cascade of digital filters simulating microphone spectral characteristics

Microphone Characteristic Simulation

Microphone characteristics are simulated using cascaded second-order bilinear transform filters:

  1. Bandwidth Definition:
    • High-pass filter (HP2): 20, 100, 350 Hz
    • Low-pass filter (LP2): 4k, 8k, 12k, 16k, 20k Hz
    • Q factor: 0.707
  2. Resonance Peak Simulation:
    • Peak filter (PK2): 4k, 6k, 8k, 13k, 16k Hz
    • Fixed magnitude: 20 dB
    • Q factor: 1.414, 2, 4

Experimental Conditions

  • Vehicle Types: Mid-size sedan, compact SUV, small SUV
  • Noise Conditions: Idle (low fan), urban (60 km/h medium fan), highway (120 km/h low fan)
  • Microphone Configurations: 113 practical configurations selected from 225 possible combinations

Technical Innovations

  1. Systematic parametric study: First systematic parametric investigation of microphone characteristic effects on automotive application performance
  2. Real environment data: Employed impulse responses and noise data recorded in real vehicle environments
  3. Dual evaluation system: Simultaneously assessed speech quality and ASR performance, providing comprehensive performance profiles
  4. Standardized methodology: Strictly adhered to ITU and ETSI standards for evaluation

Experimental Setup

Dataset

  • Speech Stimuli: 20 American English Harvard sentences as specified in ETSI TS 103 281 Annex E
  • Speakers: Multiple male and female speakers
  • Total Duration: 80 seconds (4 seconds per sentence, including 1 second leading and 1 second trailing silence)
  • Vehicle Impulse Response: Recorded using HATS (Head and Torso Simulator) at driver position
  • Background Noise: Recorded following ITU P.1100 standard Annex D guidelines

Evaluation Metrics

  1. Speech Quality Metrics:
    • S-MOS: Speech component quality assessment (1-5 scale)
    • N-MOS: Noise component intrusiveness assessment (1-5 scale)
    • G-MOS: Overall quality impression
    • Listening effort indicator (ETSI TS 103 558)
    • A-weighted SNR
  2. ASR Performance Metrics:
    • Word Error Rate (WER)
    • Evaluated using Whisper tiny model

Implementation Details

  • Total of 1017 speech files generated (113 microphone configurations × 3 vehicle types × 3 noise types)
  • 20 data points generated per scenario for statistical analysis
  • ANOVA test employed to assess statistical significance

Experimental Results

Main Results

1. Effects of Vehicle Type and Noise Type

  • Noise type effect is significant: S-MOS and N-MOS values decrease significantly with increasing background noise level (p-value approaching 0)
  • Vehicle type effect is limited: S-MOS values are very similar across different vehicle types, with some variation in N-MOS but no clear trend
  • Small SUV performs worst: Lowest SNR under highway noise conditions

2. Effects of Microphone Bandwidth

  • Low cutoff frequency effect: S-MOS values at 20Hz and 100Hz cutoff frequencies are similar, both higher than at 350Hz
  • High cutoff frequency effect is weak: At the same low cutoff frequency, high-end bandwidth limitations have minimal impact on S-MOS
  • Statistical significance: Low cutoff frequency variation p-value approaches 0 (F-statistic=1174), high cutoff frequency p-value is 0.755 (F-statistic=0.47)

3. Effects of Microphone Frequency Response Peaks

  • Peak frequency effect: Lower peak frequencies result in lower S-MOS values
  • Optimal peak location: Resonance peaks should be pushed to 10kHz or above for optimal performance
  • Quality factor effect: Higher quality factors (narrower peak bandwidth) yield better S-MOS performance

4. ASR Performance Results

  • Weak microphone characteristic effect: Microphone frequency response characteristics have no significant effect on ASR performance
  • Noise type dominance: Noise type is the primary factor affecting WER
  • Possible reasons: ASR engines are robust to frequency response variations in speech signals, or test speech may be present in training sets

Ablation Studies

Single-factor effects investigated by fixing certain parameters:

  1. Pure bandwidth effect: Excluded peak filters, studied only HP2 and LP2 combinations
  2. Peak effect: Investigated peak frequency and quality factor effects under different bandwidth settings
  3. Interaction effects: Studied synergistic effects of different parameter combinations

Experimental Findings

  1. Noise level is the decisive factor: Has the greatest impact on speech quality and ASR performance
  2. Bandwidth requirements can be relaxed: Microphone bandwidth has limited effect on speech quality
  3. Low-frequency response is important: Low cutoff frequency should not exceed 100Hz
  4. High-frequency peak optimization: Unavoidable resonance peaks should be pushed to high frequencies with narrow bandwidth
  5. ASR robustness: Modern ASR engines demonstrate good robustness to microphone characteristic variations

Overview of Existing Research

  1. Du et al. (2019): First study investigating the association between three automotive microphones and user experience, using SII and subjective listening tests
  2. Du (2023): Extended research including objective and subjective speech intelligibility and quality assessment
  3. Maver et al. (2024): Investigated acoustic front-end performance across four different automotive microphone types and installation positions

Advantages of This Work

  1. Systematic parametrization: Not limited to specific microphone types, systematically investigates effects of parameter variations
  2. Standardized evaluation: Employs ETSI and ITU standardized evaluation methods
  3. Dual perspective: Considers both H2H communication quality and H2M interaction performance
  4. Real environment: Employs real vehicle environment data rather than simulations

Conclusions and Discussion

Main Conclusions

  1. Noise type and level are the most relevant factors affecting speech quality and recognition
  2. Microphone bandwidth has minimal effect on speech quality
  3. S-MOS performance degrades when low cutoff frequency exceeds 100Hz
  4. Microphone resonance peaks should be pushed as high as possible with narrow peak width (high Q factor)
  5. ASR performance is virtually unaffected by microphone factors

Limitations

  1. Limited vehicle samples: Only three specific vehicle types tested
  2. Simplified filter design: Only second-order filters employed to simulate microphone characteristics
  3. Single ASR engine: Only Whisper general-purpose ASR engine used
  4. Speaker characteristics: Limited investigation of individual speaker characteristic effects
  5. Fixed peak magnitude: Peak filter magnitude fixed at 20dB

Future Directions

  1. Extended vehicle range: Include more vehicle types to analyze effects of objective vehicle characteristics (size, class, RT60)
  2. Decoupling noise and vehicle type: Create combinations of all vehicle types and driving noises to effectively decouple influencing factors
  3. Speaker characteristic research: Investigate interaction effects between speaker characteristics such as pitch frequency and microphone characteristics
  4. Diversified filter design: Explore effects of different filter orders and different peak magnitudes
  5. Dedicated ASR engines: Evaluate performance of automotive-specific ASR engines
  6. Acoustic front-end processing: Conduct comprehensive assessment incorporating commercial acoustic front-end processing systems

In-Depth Evaluation

Strengths

  1. Strong methodological innovation: First systematic parametric investigation of automotive microphone characteristic effects, filling a research gap
  2. Rigorous experimental design: Adheres to international standards, employs real environment data, scientifically sound experimental design
  3. Comprehensive evaluation system: Considers both speech quality and ASR performance, providing complete performance profiles
  4. High practical value: Research results directly guide automotive industry microphone selection and specification development
  5. Sufficient statistical analysis: Employs ANOVA and other statistical methods to verify result significance

Shortcomings

  1. Limited sample representativeness: Three vehicle types have limited representativeness, potentially affecting conclusion generalizability
  2. Limited ASR evaluation: Only one general-purpose ASR engine employed, may not reflect characteristics of professional automotive ASR systems
  3. Restricted parameter space: While filter parameter combinations cover common cases, optimization space remains
  4. Lack of subjective assessment: Only objective metrics employed, lacking verification through subjective evaluation by real users
  5. Simplified environmental factors: Does not consider effects of temperature, humidity, and other environmental factors on microphone performance

Impact

  1. Academic contribution: Provides important foundational research data and methodological framework for automotive audio field
  2. Industrial application: Directly guides automotive OEM manufacturers' microphone selection strategies with significant commercial value
  3. Standard development: Provides experimental evidence for revision and refinement of relevant international standards
  4. Technology advancement: Promotes optimization of automotive audio technology and ASR technology in in-vehicle environments

Applicable Scenarios

  1. Automotive OEM manufacturers: Microphone specification development and supplier selection
  2. Microphone manufacturers: Product design optimization and performance verification
  3. ASR service providers: In-vehicle ASR system optimization and robustness enhancement
  4. Standards development organizations: Reference for development and revision of relevant standards
  5. Academic research: Foundation for subsequent research in automotive audio and speech processing fields

References

This research cites multiple important international standards and prior research works, including ITU-T P.501, ETSI TS 103 281, ITU-P.1100 and other standard documents, as well as pioneering work by Du et al. on automotive microphone performance assessment. These references provide a solid theoretical foundation and methodological guidance for this research.