[FS_ULBC] ULBC Codec Testing in Background Noise
This contribution proposes a testing framework for the Ultra-Low Bitrate Codec (ULBC) in noisy conditions, drawing from EVS codec testing methodologies. The document is a revision of S4-251786 from SA4#134 and proposes updates to TR 26.940 Clause 9.
The document argues against mandating NS algorithms within the codec specification based on several key considerations:
Device-Specific Optimization: NS algorithms are typically optimized for specific device microphone array configurations. A generic NS algorithm applied uniformly could result in suboptimal performance across different device types.
Codec Robustness vs. NS Artifacts: Testing ULBC with clean, noisy, and optionally NS-processed speech provides better understanding of the codec's inherent robustness. NS algorithms may introduce speech distortions that could bias codec testing results.
Emergency Call Requirements: For emergency calls, preserving background noise is critical as it may contain important contextual information (alarms, traffic, voices) that helps identify the caller's environment or ongoing danger.
Complexity and Latency Concerns: ML-based NS algorithms can be computationally complex, increasing power consumption and end-to-end latency. Mandating complex NS could burden some devices inefficiently.
The document advocates for flexibility in NS implementation to enable manufacturers to develop device-specific solutions.
Following EVS codec testing principles (TR 26.952), the proposal includes:
| Source Material | Noise Type | SNR | Test Methodology |
|----------------|------------|-----|------------------|
| Clean speech | - | - | ITU-T P.800 ACR and/or DCR |
| Speech + Noise | Stationary (car, etc.) | 15 dB | ITU-T P.800 DCR |
| Speech + Noise | Non-stationary (street, babble, etc.) | 20-25 dB | ITU-T P.800 DCR |
This framework aligns with EVS testing which used:
- Car noise at 15 dB
- Street noise at 20 dB
- Office/babble noise at 20 dB
- ITU-T P.800 DCR methodology ("Degradation of Speech in Noise" DMOS test)
To characterize ULBC robustness in challenging low SNR conditions:
| Source Material | Noise Type | SNR | Test Methodology |
|----------------|------------|-----|------------------|
| Speech + Noise | Stationary (car, etc.) | 5-10 dB | ITU-T P.800 DCR |
| Speech + Noise | Non-stationary (street, babble, etc.) | 10-15 dB | ITU-T P.800 DCR |
| NS processed speech + Noise | Stationary (car, etc.) | 5-10 dB | ITU-T P.800 DCR |
| NS processed speech + Noise | Non-stationary (street, babble, etc.) | 10-15 dB | ITU-T P.800 DCR |
Key Notes:
- To avoid bias, a common NS processing tool should be used for generating NS-processed speech
- Selection of specific noise types and the NS processing tool is FFS
- Reference is made to TR 26.989 v19.0.0 (MCPTT work) where EVS was evaluated in siren noise at 5 dB SNR
The document proposes adding new Clause 9.1.4 to TR 26.940 with two subclauses:
The document seeks Discussion and Agreement on:
1. The proposed testing framework for ULBC in noisy conditions
2. Updates to TR 26.940 Clause 9 as specified in the text proposal