# Technical Summary: Audio Bandwidth Requirements for ULBC

## 1. Introduction and Scope

This contribution addresses audio bandwidth design constraints for the Ultra-Low Bitrate Codec (ULBC), targeting primarily voice over GEO satellite communications. The document argues against mandatory Wideband (WB) and Super-Wideband (SWB) support, proposing instead that Narrowband (NB) should be mandatory with WB as an enhancement.

## 2. Key Technical Arguments

### 2.1 Global NB Usage and System Efficiency

**Current Network Reality:**
- 2G/3G connections (primarily AMR-NB) still represent **20% of global technology mix** (end of 2023)
- Regional variations: 81% in Sub-Saharan Africa, 46% in Middle East and North Africa
- NB serves as universal fallback for interoperability (CS fallback scenarios)

**System Inefficiency Without NB Mode:**
- WB ULBC to NB user calls waste upper frequency band (4-8 kHz)
- Significant bitrate wasted transmitting data that recipient cannot hear
- Over expensive, scarce satellite link, this inefficiency is unacceptable
- Native NB mode provides most efficient solution for legacy network connectivity

### 2.2 User Expectations in "Last Resort" Scenarios

**Baseline Expectation Setting:**
- GEO call is final option after terrestrial network failure
- Users typically experience AMR-NB fallback before resorting to GEO
- ULBC must be at least as reliable as NB fallback to meet user expectations
- WB-only ULBC failure in conditions where NB would work represents service failure

### 2.3 Primary Use Case: Emergency Communications

**Typical Deployment Scenario:**
- Rescue teams in remote areas (e.g., Himalayan mountains)
- Mixed-connectivity environment:
  - Squad A: GEO-only (outside TN coverage)
  - Squad B: GSM fallback at coverage fringe
  - Base Camp: PSTN connection (NB service)

**Technical Implications:**
- Terminating endpoints predominantly NB
- Emergency systems use traditional NB codecs (Codec2, MELP) for robustness
- Transmitting WB over satellite to NB endpoint wastes critical resources in life-or-death situations
- Real-world deployment example provided (China rescue missions)

**Evaluation Priority:**
- ULBC candidates should prioritize intelligibility and robustness testing in NB mode

### 2.4 Performance at Very Low Bitrates

**Quality vs. Bandwidth Trade-off:**
- Forcing wider bandwidth at very low bitrates spreads available data too thinly
- Research shows lower sampling rates can achieve higher perceptual quality at very low bitrates
- WB codec at ~1 kbps may compromise intelligibility, especially with packet loss
- NB signal more robustly reconstructed under constrained conditions

**Analogy:** "Spreading butter" - concentrating bits on narrower bandwidth preserves speech richness and intelligibility

### 2.5 Complexity and Power Consumption

**Computational Scaling Issues:**
- AI-based codec architectures don't scale gracefully
- Doubling sampling rate (NB to WB): **2x to 4x complexity increase** for CNN/Transformer models
- WB-only mandate imposes unnecessary computational burden
- Critical issue for power-constrained mobile devices
- Native NB mode offers high-quality voice at significantly lower complexity/power budget

## 3. Experimental Analysis: Higher Bandwidth Inefficiency

### 3.1 Experiment Setup

**Test Configuration:**
- Codec: Descript Audio Codec (DAC) with pre-trained models
- Sampling rates tested: 44.1 kHz, 24 kHz (SWB), 16 kHz (WB)
- Test corpus: 100 clean speech samples from MS-SNSD dataset
- Bitrate variation: 1-9 active quantization codebooks
- Quality metric: ViSQOL algorithm (speech mode, MOS estimate)

**Model Specifications:**

| Model | Compression | Frame Rate | Codebooks | Bitrate/Codebook |
|-------|-------------|------------|-----------|------------------|
| 16 kHz (WB) | 320x [2,4,5,8] | 50 Hz | 12 (10-bit) | 0.50 kbps |
| 24 kHz (SWB) | 320x [2,4,5,8] | 75 Hz | 32 (10-bit) | 0.75 kbps |
| 44.1 kHz | 512x [2,4,8,8] | ~86.1 Hz | 9 (10-bit) | ~0.86 kbps |

### 3.2 Key Experimental Findings

**Quality vs. Bitrate Results:**
- **WB (16 kHz):** Achieves excellent quality (ViSQOL MOS > 4.0) at ~2.5 kbps
- **24 kHz SWB:** Requires higher bitrate to match WB quality
- **44.1 kHz:** Provides minimal perceptible improvement over 24 kHz SWB
- **Conclusion:** Bitrate cost of SWB not justified by quality improvement for voice content

**Efficiency Analysis:**
- Clear trend: diminishing returns for bandwidth beyond WB
- SWB/FB represents inefficient use of bandwidth for ULBC service

## 4. Proposed Design Constraints

### 4.1 Bandwidth Requirements

**Mandatory Support:**
1. **8 kHz sampling rate (NB):** 50-4000 Hz audio bandwidth
2. **16 kHz sampling rate (WB):** 50-8000 Hz audio bandwidth
   - Enhanced quality where channel conditions and device capabilities permit
   - WB support can be limited to higher bitrates than NB operation

**Further Study:**
- Necessity and feasibility of SWB and FB support remains FFS

### 4.2 Text Proposal for TR 26.940

**Change to Table 6.2-1 (Design Constraint Parameters):**

**Sample rate and audio bandwidth:**
- The ultra low bitrate codec shall support sampling rates of 8kHz (NB) and 16kHz (WB)
- Supported audio bandwidth:
  - NB: 50-4000 Hz
  - WB: 50-8000 Hz

## 5. Supporting Evidence Summary

**Quantitative Data:**
- 20% global 2G/3G connections (hundreds of millions of users)
- Regional NB dominance: up to 81% in some areas
- WB achieves MOS > 4.0 at 2.5 kbps
- 2x-4x complexity increase for WB vs. NB in AI codecs

**Qualitative Arguments:**
- System efficiency (no wasted bandwidth to NB endpoints)
- User expectation alignment (last resort reliability)
- Emergency use case requirements
- Computational/power constraints for mobile devices
- Diminishing returns for SWB/FB at target bitrates