S4-260132 - AI Summary

[FS_ULBC] Discussion of FS_ULBC Objective Speech Quality Assessment Method

AI-Generated Summary AI

Summary of S4-260132: Discussion of FS_ULBC Objective Speech Quality Assessment Method

Background

This contribution addresses speech quality assessment challenges for ultra-low bitrate codecs (ULBC). While subjective testing remains the benchmark for ULBC codec selection, objective speech evaluation methods can serve as predictive tools during intermediate testing and parameter adjustment processes, enabling more convenient and efficient quality verification.

Overview of Existing Speech Objective Quality Evaluation Methods

The document provides a comprehensive comparison of available objective assessment tools:

Standardized ITU-T Methods

P.863 (POLQA): Full-reference method, widely adopted in ITU/3GPP, supports NB/WB/SWB, maintains performance below 4kbps in SWB mode
P.563: No-reference method suitable for real-time applications, but less accurate for extreme noise or complex distortions compared to full-reference methods

Open Source Methods

ViSQOL: Full-reference, performs well for low bitrates (under 8kbps with good MOS correlation), but not formally standardized
STOI/ESTOI: Full-reference, focuses on speech intelligibility, computationally efficient with high correlation to subjective tests in noisy conditions. ESTOI improves robustness to nonlinear distortions (e.g., neural codecs)
SCOREQ: No-reference model with strong cross-domain robustness and improved correlation with human judgments

Capabilities and Limitations for ULBC

The document analyzes each method's suitability for ultra-low bitrate scenarios:

P.863: Most widely adopted, broad bandwidth support, proven performance at low bitrates
P.563: Limited adaptability to non-linear distortions from neural codecs
ViSQOL: Good consistency with MOS at low bitrates but lacks formal standardization
STOI/ESTOI: Effective for intelligibility assessment, robust to nonlinear distortions, but not ITU-T/3GPP standardized
SCOREQ: Addresses domain-generalization shortcomings with improved out-of-domain robustness

Proposal

Recommended Objective Assessment Methods

After excluding unsuitable methods, the contribution recommends considering P.863, ViSQOL, and ESTOI as potential objective quality assessment methods for ULBC.

Text Proposal for TR 26.940

The document proposes a pCR to TR 26.940 Section 9 (Test methodologies) that includes:

New Section 9.1.1: Typical Quality Impairments

Identifies ULBC-specific impairment categories:
- Loss of listening-only audio quality
- Audio bandwidth loss
- Impaired intelligibility
- Impaired speaker identifiability
- Prosodic impairments
- Hallucination (word and phone confusions)
- Sensitivity to non-speech input (background noise, music, reverberant speech)

New Section 9.1.2: Challenges of Quality Assessment

Addresses testing challenges specific to ULBC:

Traditional 3GPP Practice: AMR/AMR-WB/EVS used P.800 ACR for clean speech and DCR for noisy/mixed content, but did not focus on intelligibility, speaker identifiability, or prosodic impairments
ULBC-Specific Challenges: ML-based codecs introduce new impairment types (e.g., hallucination) requiring alternative test methods
Additional Test Methodologies (non-exhaustive list):
Diagnostic Rhyme Tests (DRT)
Modified Rhyme Tests (MRT)
MOS testing for speaker similarity
Speaker verification/identification tests
Prosodic naturalness MOS tests
Intonation recognition tests
Transcription tests for word/semantic equivalence
Phoneme recognition tests
Automatic speech recognition tests
Objective Methods as Optional Tools: Proposes documenting that objective methods (P.863, ViSQOL, ESTOI, etc.) can be considered as optional tools for predicting speech quality during ULBC simulation testing and parameter optimization, acknowledging that subjective listening remains the most important evaluation method despite being time and resource-intensive
Speech Enhancement Evaluation: Notes that P.835 multi-dimensional rating scales can be used for speech enhancement tools that may be part of ULBC

Technical Contribution

The main technical contribution is establishing a framework for objective quality assessment in ULBC standardization that:
1. Recognizes the unique challenges of ML-based codecs
2. Identifies suitable objective methods as predictive tools
3. Proposes their documentation as optional assessment methods in TR 26.940
4. Maintains subjective testing as the primary benchmark while enabling more efficient intermediate evaluation

Document Information

TDoc:
S4-260132

Source:
China Mobile Com. Corporation

Type:
pCR

Original Document:
View on 3GPP

Title: [FS_ULBC] Discussion of FS_ULBC Objective Speech Quality Assessment Method

Agenda item: 7.8

Agenda item description: FS_ULBC (Study on Ultra Low Bitrate Speech Codec)

Doc type: pCR

Release: Rel-20

Specification: 26.94

Version: 0.4.0

Related WIs: FS_ULBC

Spec: 26.94

Contact: Fei Gao

Uploaded: 2026-02-03T09:17:16.873000

Contact ID: 90252

TDoc Status: noted

Is revision of: S4-251814

Reservation date: 03/02/2026 09:04:41

Agenda item sort order: 20