[FS_ULBC] Discussion of FS_ULBC Objective Speech Quality Assessment Method
This contribution addresses speech quality assessment challenges for ultra-low bitrate codecs (ULBC). While subjective testing remains the benchmark for ULBC codec selection, objective speech evaluation methods can serve as predictive tools during intermediate testing and parameter adjustment processes, enabling more convenient and efficient quality verification.
The document provides a comprehensive comparison of available objective assessment tools:
The document analyzes each method's suitability for ultra-low bitrate scenarios:
After excluding unsuitable methods, the contribution recommends considering P.863, ViSQOL, and ESTOI as potential objective quality assessment methods for ULBC.
The document proposes a pCR to TR 26.940 Section 9 (Test methodologies) that includes:
Identifies ULBC-specific impairment categories:
- Loss of listening-only audio quality
- Audio bandwidth loss
- Impaired intelligibility
- Impaired speaker identifiability
- Prosodic impairments
- Hallucination (word and phone confusions)
- Sensitivity to non-speech input (background noise, music, reverberant speech)
Addresses testing challenges specific to ULBC:
Traditional 3GPP Practice: AMR/AMR-WB/EVS used P.800 ACR for clean speech and DCR for noisy/mixed content, but did not focus on intelligibility, speaker identifiability, or prosodic impairments
ULBC-Specific Challenges: ML-based codecs introduce new impairment types (e.g., hallucination) requiring alternative test methods
Additional Test Methodologies (non-exhaustive list):
Automatic speech recognition tests
Objective Methods as Optional Tools: Proposes documenting that objective methods (P.863, ViSQOL, ESTOI, etc.) can be considered as optional tools for predicting speech quality during ULBC simulation testing and parameter optimization, acknowledging that subjective listening remains the most important evaluation method despite being time and resource-intensive
Speech Enhancement Evaluation: Notes that P.835 multi-dimensional rating scales can be used for speech enhancement tools that may be part of ULBC
The main technical contribution is establishing a framework for objective quality assessment in ULBC standardization that:
1. Recognizes the unique challenges of ML-based codecs
2. Identifies suitable objective methods as predictive tools
3. Proposes their documentation as optional assessment methods in TR 26.940
4. Maintains subjective testing as the primary benchmark while enabling more efficient intermediate evaluation