# Summary of S4-260132: Discussion of FS_ULBC Objective Speech Quality Assessment Method

## Background

This contribution addresses speech quality assessment challenges for ultra-low bitrate codecs (ULBC). While subjective testing remains the benchmark for ULBC codec selection, objective speech evaluation methods can serve as predictive tools during intermediate testing and parameter adjustment processes, enabling more convenient and efficient quality verification.

## Overview of Existing Speech Objective Quality Evaluation Methods

The document provides a comprehensive comparison of available objective assessment tools:

### Standardized ITU-T Methods

- **P.863 (POLQA)**: Full-reference method, widely adopted in ITU/3GPP, supports NB/WB/SWB, maintains performance below 4kbps in SWB mode
- **P.563**: No-reference method suitable for real-time applications, but less accurate for extreme noise or complex distortions compared to full-reference methods

### Open Source Methods

- **ViSQOL**: Full-reference, performs well for low bitrates (under 8kbps with good MOS correlation), but not formally standardized
- **STOI/ESTOI**: Full-reference, focuses on speech intelligibility, computationally efficient with high correlation to subjective tests in noisy conditions. ESTOI improves robustness to nonlinear distortions (e.g., neural codecs)
- **SCOREQ**: No-reference model with strong cross-domain robustness and improved correlation with human judgments

## Capabilities and Limitations for ULBC

The document analyzes each method's suitability for ultra-low bitrate scenarios:

- **P.863**: Most widely adopted, broad bandwidth support, proven performance at low bitrates
- **P.563**: Limited adaptability to non-linear distortions from neural codecs
- **ViSQOL**: Good consistency with MOS at low bitrates but lacks formal standardization
- **STOI/ESTOI**: Effective for intelligibility assessment, robust to nonlinear distortions, but not ITU-T/3GPP standardized
- **SCOREQ**: Addresses domain-generalization shortcomings with improved out-of-domain robustness

## Proposal

### Recommended Objective Assessment Methods

After excluding unsuitable methods, the contribution recommends considering **P.863, ViSQOL, and ESTOI** as potential objective quality assessment methods for ULBC.

### Text Proposal for TR 26.940

The document proposes a pCR to TR 26.940 Section 9 (Test methodologies) that includes:

#### New Section 9.1.1: Typical Quality Impairments

Identifies ULBC-specific impairment categories:
- Loss of listening-only audio quality
- Audio bandwidth loss
- Impaired intelligibility
- Impaired speaker identifiability
- Prosodic impairments
- Hallucination (word and phone confusions)
- Sensitivity to non-speech input (background noise, music, reverberant speech)

#### New Section 9.1.2: Challenges of Quality Assessment

Addresses testing challenges specific to ULBC:

- **Traditional 3GPP Practice**: AMR/AMR-WB/EVS used P.800 ACR for clean speech and DCR for noisy/mixed content, but did not focus on intelligibility, speaker identifiability, or prosodic impairments

- **ULBC-Specific Challenges**: ML-based codecs introduce new impairment types (e.g., hallucination) requiring alternative test methods

- **Additional Test Methodologies** (non-exhaustive list):
  - Diagnostic Rhyme Tests (DRT)
  - Modified Rhyme Tests (MRT)
  - MOS testing for speaker similarity
  - Speaker verification/identification tests
  - Prosodic naturalness MOS tests
  - Intonation recognition tests
  - Transcription tests for word/semantic equivalence
  - Phoneme recognition tests
  - Automatic speech recognition tests

- **Objective Methods as Optional Tools**: Proposes documenting that objective methods (P.863, ViSQOL, ESTOI, etc.) can be considered as **optional** tools for predicting speech quality during ULBC simulation testing and parameter optimization, acknowledging that subjective listening remains the most important evaluation method despite being time and resource-intensive

- **Speech Enhancement Evaluation**: Notes that P.835 multi-dimensional rating scales can be used for speech enhancement tools that may be part of ULBC

## Technical Contribution

The main technical contribution is establishing a framework for objective quality assessment in ULBC standardization that:
1. Recognizes the unique challenges of ML-based codecs
2. Identifies suitable objective methods as predictive tools
3. Proposes their documentation as optional assessment methods in TR 26.940
4. Maintains subjective testing as the primary benchmark while enabling more efficient intermediate evaluation