# Summary of S4-260165: ULBC Complexity and RTF Analysis

## Background and Motivation

This contribution addresses the need to finalize complexity and memory design constraints for the ULBC (Ultra-Low Bitrate Codec) study. Previous discussions at SA4 #133-e and the ULBC ADHOC meeting explored various complexity metrics and RTF performance data for existing AI codecs (DAC, Lyra v2, HIL). However, insufficient data exists to draw definitive conclusions on complexity constraints for ULBC.

The document builds upon previous contribution S4-251844 with the following modifications:
- Added CPU core information for experiments
- Aligned RTF definition with TR 26.940 clause 7.5.3
- Focused on model sizes 3-20M parameters (more relevant to ULBC use cases)
- Provided pCR for TR 26.940
- Removed large chunk-based processing experiments (not relevant for real-time voice communication)

## Experimental Setup and Methodology

### Model Configuration
Modified DAC architecture with reduced parameters while maintaining general structure:
- Model sizes: 20M, 15M, 9M, and 3M parameters (float32 precision)
- Training: Optimized for ~1 kbps bitrate at 32 kHz sampling rate
- Encoder rates: 4,4,8,10 for all models

### Complexity Analysis
**Theoretical Complexity (GMACS):**
- Computed using ptflops library
- Results show linear relationship between model size and GMACS:
  - 20M: 5.14 GMACS
  - 15M: 4.03 GMACS
  - 9M: 2.39 GMACS
  - 3M: 0.79 GMACS

### RTF Testing Methodology
- PyTorch models converted to ONNX format
- ONNX runtime with XNNPACK execution provider
- Frame-by-frame processing (80 ms frames)
- Test duration: 2 minutes (1500 inferences per session)
- 5 repetitions per experiment
- Single-threaded execution
- RTF calculation: max(inference time / frame length) across all frames

## Experimental Results

### Test Devices
**Device 1 (2023):**
- Hexa-core CPU: 2×3.46 GHz (P core) + 4×2.02 GHz (E core)
- Dynamic core switching observed between P and E cores

**Device 2 (2022):**
- Octa-core CPU: 1×3.00 GHz Cortex-X2 + 3×2.50 GHz Cortex-A710 + 4×1.80 GHz Cortex-A510
- Processing on Cortex-X2 with frequency switching between 2.4 GHz and 1.8 GHz

### RTF Performance Results

| Model Size | Max RTF (High Performance) | Max RTF (Power Efficient) |
|------------|---------------------------|---------------------------|
| 20M | 0.39-0.63 | 0.81-0.9 |
| 15M | 0.29-0.43 | 0.66-0.74 |
| 9M | 0.19-0.29 | 0.44-0.57 |
| 3M | 0.09-0.13 | 0.18-0.31 |

Results demonstrate linear increase in RTF with model size across both performance modes.

## Key Observations

1. All tested models achieve RTF < 1.0, indicating real-time capability
2. Significant RTF variation between high-performance and power-efficient modes
3. Dynamic CPU core/frequency switching impacts performance
4. 20M model shows max RTF=0.63 (high performance) and RTF=0.9 (power efficient)
5. Smaller models (3M-9M) provide substantial RTF headroom for real-time operation

## Proposed Text for TR 26.940

The contribution provides a comprehensive pCR adding new clause 6.2.1.7 "RTF and MACS analysis for AI based codecs" with detailed experimental results. Key additions to TR 26.940 include:

### Complexity Considerations (Clause 6.2.1)
- Real-time processing requirements for voice communication
- Model size considerations (5-10M parameters for efficient operation)
- Memory access and power consumption challenges with larger models

### Complexity Metrics (Clause 6.2.1.4)
- Discussion of NPU/TPU capabilities measured in TOPS
- TOPS/W as power efficiency metric (2-15 TOPS/W range for smartphones)
- MAC operations and MACS as practical complexity metrics
- RTF as reliable complexity assessment metric
- Comparison with traditional WMOPS metric

### Target Devices (Clause 6.2.1.5)
- NPUs present in most modern smartphones
- Theoretical max TOPS: 8-59 TOPS (varying precision)
- TOPS/W range: 2-15 TOPS/W
- DAC codec estimate: ~150 Giga MAC/sec (~0.3 TOPS)
- Note on DRAM operations significantly impacting power consumption

### Key Conclusions (Clause 6.2.1.6)
- ML codecs require careful model size and complexity optimization
- NPUs offer 5-20× power efficiency vs CPUs for AI tasks
- ULBC complexity constraints should not reference existing 3GPP speech codecs
- Million MACS + model size provide first-order complexity indication
- RTF useful but requires standardized test platforms
- WMOPS not directly suitable for NPU-based AI solutions

### Experimental Data (Clause 6.2.1.7)
- Complete documentation of DAC-like architecture experiments
- Detailed RTF and GMACS results for 3M-20M parameter models
- Device specifications and performance characteristics

## Proposal

Document the experimental methodology, results, and observations in clause 6.2.1 of TR 26.940 as shown in the provided pCR.