[FS_ULBC] Discussion on Audio Bandwidth for ULBC
This contribution addresses audio bandwidth design constraints for the Ultra-Low Bitrate Codec (ULBC), targeting primarily voice over GEO satellite communications. The document argues against mandatory Wideband (WB) and Super-Wideband (SWB) support, proposing instead that Narrowband (NB) should be mandatory with WB as an enhancement.
Current Network Reality:
- 2G/3G connections (primarily AMR-NB) still represent 20% of global technology mix (end of 2023)
- Regional variations: 81% in Sub-Saharan Africa, 46% in Middle East and North Africa
- NB serves as universal fallback for interoperability (CS fallback scenarios)
System Inefficiency Without NB Mode:
- WB ULBC to NB user calls waste upper frequency band (4-8 kHz)
- Significant bitrate wasted transmitting data that recipient cannot hear
- Over expensive, scarce satellite link, this inefficiency is unacceptable
- Native NB mode provides most efficient solution for legacy network connectivity
Baseline Expectation Setting:
- GEO call is final option after terrestrial network failure
- Users typically experience AMR-NB fallback before resorting to GEO
- ULBC must be at least as reliable as NB fallback to meet user expectations
- WB-only ULBC failure in conditions where NB would work represents service failure
Typical Deployment Scenario:
- Rescue teams in remote areas (e.g., Himalayan mountains)
- Mixed-connectivity environment:
- Squad A: GEO-only (outside TN coverage)
- Squad B: GSM fallback at coverage fringe
- Base Camp: PSTN connection (NB service)
Technical Implications:
- Terminating endpoints predominantly NB
- Emergency systems use traditional NB codecs (Codec2, MELP) for robustness
- Transmitting WB over satellite to NB endpoint wastes critical resources in life-or-death situations
- Real-world deployment example provided (China rescue missions)
Evaluation Priority:
- ULBC candidates should prioritize intelligibility and robustness testing in NB mode
Quality vs. Bandwidth Trade-off:
- Forcing wider bandwidth at very low bitrates spreads available data too thinly
- Research shows lower sampling rates can achieve higher perceptual quality at very low bitrates
- WB codec at ~1 kbps may compromise intelligibility, especially with packet loss
- NB signal more robustly reconstructed under constrained conditions
Analogy: "Spreading butter" - concentrating bits on narrower bandwidth preserves speech richness and intelligibility
Computational Scaling Issues:
- AI-based codec architectures don't scale gracefully
- Doubling sampling rate (NB to WB): 2x to 4x complexity increase for CNN/Transformer models
- WB-only mandate imposes unnecessary computational burden
- Critical issue for power-constrained mobile devices
- Native NB mode offers high-quality voice at significantly lower complexity/power budget
Test Configuration:
- Codec: Descript Audio Codec (DAC) with pre-trained models
- Sampling rates tested: 44.1 kHz, 24 kHz (SWB), 16 kHz (WB)
- Test corpus: 100 clean speech samples from MS-SNSD dataset
- Bitrate variation: 1-9 active quantization codebooks
- Quality metric: ViSQOL algorithm (speech mode, MOS estimate)
Model Specifications:
| Model | Compression | Frame Rate | Codebooks | Bitrate/Codebook |
|-------|-------------|------------|-----------|------------------|
| 16 kHz (WB) | 320x [2,4,5,8] | 50 Hz | 12 (10-bit) | 0.50 kbps |
| 24 kHz (SWB) | 320x [2,4,5,8] | 75 Hz | 32 (10-bit) | 0.75 kbps |
| 44.1 kHz | 512x [2,4,8,8] | ~86.1 Hz | 9 (10-bit) | ~0.86 kbps |
Quality vs. Bitrate Results:
- WB (16 kHz): Achieves excellent quality (ViSQOL MOS > 4.0) at ~2.5 kbps
- 24 kHz SWB: Requires higher bitrate to match WB quality
- 44.1 kHz: Provides minimal perceptible improvement over 24 kHz SWB
- Conclusion: Bitrate cost of SWB not justified by quality improvement for voice content
Efficiency Analysis:
- Clear trend: diminishing returns for bandwidth beyond WB
- SWB/FB represents inefficient use of bandwidth for ULBC service
Mandatory Support:
1. 8 kHz sampling rate (NB): 50-4000 Hz audio bandwidth
2. 16 kHz sampling rate (WB): 50-8000 Hz audio bandwidth
- Enhanced quality where channel conditions and device capabilities permit
- WB support can be limited to higher bitrates than NB operation
Further Study:
- Necessity and feasibility of SWB and FB support remains FFS
Change to Table 6.2-1 (Design Constraint Parameters):
Sample rate and audio bandwidth:
- The ultra low bitrate codec shall support sampling rates of 8kHz (NB) and 16kHz (WB)
- Supported audio bandwidth:
- NB: 50-4000 Hz
- WB: 50-8000 Hz
Quantitative Data:
- 20% global 2G/3G connections (hundreds of millions of users)
- Regional NB dominance: up to 81% in some areas
- WB achieves MOS > 4.0 at 2.5 kbps
- 2x-4x complexity increase for WB vs. NB in AI codecs
Qualitative Arguments:
- System efficiency (no wasted bandwidth to NB endpoints)
- User expectation alignment (last resort reliability)
- Emergency use case requirements
- Computational/power constraints for mobile devices
- Diminishing returns for SWB/FB at target bitrates