2025-11-13T22:49:11.191086

Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation

Buccoli, Du, Soendergaard et al.

Upon choosing microphones for automotive hands-free communication or Automatic Speech Recognition (ASR) applications, OEMs typically specify wideband, super wideband or even fullband requirements following established standard recommendations (e.g., ITU-P.1110, ITU-P.1120). In practice, it is often challenging to achieve the preferred bandwidth for an automotive microphone when considering limitations and constraints on microphone placement inside the cabin, and the automotive grade environmental robustness requirements. On the other hand, there seems to be no consensus or sufficient data on the effect of each microphone characteristic on the actual performance. As an attempt to answer this question, we used noise signals recorded in real vehicles and under various driving conditions to experimentally study the relationship between the microphones' characteristics and the final audio quality of speech communication and performance of ASR engines. We focus on how variations in microphone bandwidth and amplitude frequency response shapes affect the perceptual speech quality. The speech quality results are compared by using ETSI TS 103 281 metrics (S-MOS, N-MOS, G-MOS) and ancillary metrics such as SNR. The ASR results are evaluated with standard metrics such as Word Error Rate (WER). Findings from this study provide knowledge in the understanding of what microphone frequency response characteristics are more relevant for audio quality and choice of proper microphone specifications, particularly for automotive applications.

academic

Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation

基本信息

论文ID: 2510.09236
标题: Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation
作者: Michele Buccoli, Yu Du, Jacob Soendergaard, Simone Shawn Cazzaniga
分类: eess.AS (Electrical Engineering and Systems Science - Audio and Speech Processing), cs.SD (Computer Science - Sound)
发表时间/会议: AES 159th Convention, Oct 23-25, Long Beach, CA, USA (Express Paper)
论文链接: https://arxiv.org/abs/2510.09236

摘要

本研究针对汽车免提通信和自动语音识别(ASR)应用中麦克风选择的关键问题，通过实验方法研究麦克风频率响应特性与语音质量及ASR性能的关系。研究使用真实车辆环境下录制的噪声信号，评估麦克风带宽和幅频响应形状变化对感知语音质量的影响。语音质量评估采用ETSI TS 103 281标准的S-MOS、N-MOS、G-MOS指标以及SNR等辅助指标，ASR性能则通过词错误率(WER)进行评估。研究结果为理解麦克风频率响应特性对音频质量的影响提供了重要知识，特别是为汽车应用中的麦克风规格选择提供了指导。

研究背景与动机

问题定义

汽车OEM厂商在选择免提通信或ASR应用的麦克风时，通常按照ITU-P.1110、ITU-P.1120等标准建议，要求宽带、超宽带甚至全频带规格。然而，在实际应用中，考虑到车内麦克风安装位置的限制以及汽车级环境鲁棒性要求，很难实现理想的带宽规格。

研究重要性

缺乏共识: 业界对各种麦克风特性对实际性能影响缺乏共识和充分数据
实际约束: 车内麦克风安装位置受限，环境要求严苛
性能优化: 需要理解哪些麦克风特性对音频质量和ASR性能更为关键

现有研究局限

现有相关研究主要基于特定类型的汽车麦克风，研究空间局限于这些麦克风的固有特性，未能展示麦克风特性变化对语音和ASR质量影响的一般性趋势。

核心贡献

建立了系统的评估框架: 构建了麦克风频率响应特性对语音质量和ASR性能影响的实验评估平台
全面的特性分析: 系统研究了麦克风带宽、频率响应峰值等特性对性能的影响
多维度评估: 同时评估了人机通信(H2H)的语音质量和人机交互(H2M)的ASR性能
实际环境验证: 使用真实车辆环境下的噪声录音进行验证
标准化评估指标: 采用ETSI标准的MOS评分和标准ASR评估指标

方法详解

任务定义

研究麦克风频率响应特性(带宽、峰值频率、品质因数)在不同车型和噪声条件下对语音质量(S-MOS, N-MOS)和ASR性能(WER)的影响。

实验设计架构

信号生成模型

模拟录音信号通过以下公式生成：

x(n) = f(s(n) ⋆ h(n) + v(n))

其中：

s(n): ITU-T P.501标准的清洁语音信号
h(n): 车辆脉冲响应
v(n): 真实车辆背景噪声
f(·): 模拟麦克风频谱特性的数字滤波器级联

麦克风特性仿真

使用二阶双线性变换滤波器级联模拟麦克风特性：

带宽定义:
- 高通滤波器(HP2): 20, 100, 350 Hz
- 低通滤波器(LP2): 4k, 8k, 12k, 16k, 20k Hz
- Q因子: 0.707
谐振峰仿真:
- 峰值滤波器(PK2): 4k, 6k, 8k, 13k, 16k Hz
- 固定幅度: 20 dB
- Q因子: 1.414, 2, 4

实验条件

车型: 中型轿车、紧凑型SUV、小型SUV
噪声条件: 怠速(低风扇)、城市(60 km/h中风扇)、高速(120 km/h低风扇)
麦克风配置: 从225种可能组合中选择113种实用配置

技术创新点

系统性参数化研究: 首次系统性地参数化研究麦克风特性对汽车应用性能的影响
真实环境数据: 使用真实车辆环境录制的脉冲响应和噪声数据
双重评估体系: 同时评估语音质量和ASR性能，提供全面的性能画像
标准化方法: 严格遵循ITU和ETSI标准进行评估

实验设置

数据集

语音刺激: ETSI TS 103 281 Annex E规定的20个美式英语Harvard句子
说话人: 多个不同的男性和女性说话人
总时长: 80秒(每句4秒，包含1秒前导和1秒尾随静音)
车辆脉冲响应: 使用HATS(头部和躯干模拟器)在驾驶员位置录制
背景噪声: 遵循ITU P.1100标准Annex D指南录制

评价指标

语音质量指标:
- S-MOS: 语音成分质量评估(1-5分)
- N-MOS: 噪声成分干扰性评估(1-5分)
- G-MOS: 整体质量印象
- 听力努力指标(ETSI TS 103 558)
- A加权SNR
ASR性能指标:
- 词错误率(WER)
- 使用Whisper tiny模型进行评估