# Analysis of AI Codec Real-Time Performance (RTF) and Complexity Scaling

## 1. Introduction and Motivation

This contribution addresses a critical gap in the Ultra Low Bitrate Speech Codec (ULBC) study by moving beyond theoretical complexity metrics (FLOPs, WMOPS) to evaluate real-world performance on mobile devices. The key observation is that static metrics fail to capture system-level bottlenecks including memory bandwidth pressure and thermal constraints on mobile SoCs. The document presents a comprehensive RTF analysis of a neural audio codec (based on Descript Audio Codec architecture) across multiple model sizes and sample rates on representative mid-range mobile hardware.

## 2. Experimental Setup

### 2.1 Model Configuration

Eight model variants were evaluated, ranging from **enc8dec144** to **enc64dec1536**, with parameter counts spanning **1M to 74M**:

- Architecture: Fully convolutional encoder-decoder with Residual Vector Quantization (RVQ)
- Frame length: 40ms (fixed across all variants)
- Total up/down-sampling factor: 320 (consistent across variants)
- Sample rates tested: 8 kHz (320 samples), 16 kHz (640 samples), 32 kHz (1280 samples)
- Export format: ONNX with Float32 precision

**Key complexity observations from Table 1:**
- Parameter counts range from 1.09M (enc8dec144) to 74.50M (enc64dec1536)
- Model sizes range from 4.3 MB to 283.6 MB
- Computational complexity scales proportionally with sample rate (e.g., enc32dec768: 4955.9 MFlops/s @ 8kHz, 9972.6 MFlops/s @ 16kHz, 20006.1 MFlops/s @ 32kHz)

### 2.2 Device Under Test (DUT) Environment

- **Platform:** MediaTek Dimensity 1200 (6nm) - representative mid-range SoC
- **Inference engine:** ONNX Runtime v1.14+ with CPU execution provider (single-threaded)
- **CPU clusters tested:**
  - Efficiency cluster: Cortex-A55
  - Performance cluster: Cortex-A78
  - Prime core: Cortex-A78+
- **Methodology:** Frequency-locked operation with disabled thermal services and power HALs to eliminate dynamic frequency scaling noise

## 3. Results and Analysis

### 3.1 Complexity Scaling vs. Bandwidth

**Critical finding:** For a given model variant, computational complexity scales linearly with sample rate:
- **enc32dec768 example:**
  - 8 kHz: ~0.20 GFLOP counts (4955.9 MFlops/s)
  - 16 kHz: ~0.40 GFLOP counts (9972.6 MFlops/s) - **2x increase**
  - 32 kHz: ~0.80 GFLOP counts (20006.1 MFlops/s) - **4x increase**

**Implication:** Higher sampling rates incur proportional computational penalty. For resource-constrained devices (IoT, wearables), NB mode at 8 kHz is recommended.

### 3.2 Real-Time Factor (RTF) Analysis Across Three Frequency Tiers

#### 3.2.1 Tier 1: Low Frequency (A55@750MHz, A78@902MHz, A78+@1.1GHz)

**Energy-conserving state with severe constraints:**

- **Cortex-A55 @ 750 MHz:** Only smallest models (enc8dec144) maintain real-time at 8 kHz; 16/32 kHz unfeasible
- **Cortex-A78 @ 902 MHz:** 
  - 32 kHz: Limited to <3M parameters
  - 16 kHz: Supports up to ~8M parameters
  - 8 kHz: Supports up to ~10M parameters
- **Cortex-A78+ @ 1.108 GHz:** Similar to A78 but extends 16 kHz limit closer to 10M parameters

#### 3.2.2 Tier 2: Mid Frequency (A55@1.0GHz, A78@1.16GHz, A78+@1.37GHz)

**Typical sustained workload state:**

- **Cortex-A55 @ 1.0 GHz:** 8 kHz supports up to ~2M parameters; 16/32 kHz remain largely unfeasible
- **Cortex-A78 @ 1.162 GHz:**
  - 32 kHz: ~5M parameter limit
  - 16 kHz: ~10M parameters (covers "Low Complexity" profile)
  - 8 kHz: Robust up to ~20M parameters
- **Cortex-A78+ @ 1.37 GHz:** Performance parity with A78 (clock speed is primary differentiator)

#### 3.2.3 Tier 3: High Frequency (A55@1.73GHz, A78@1.45GHz, A78+@1.63GHz)

**High-performance state approaching sustained limits:**

- **Cortex-A55 @ 1.73 GHz:**
  - 8 kHz: ~3M parameters
  - 16 kHz: ~2M parameters
  - 32 kHz: ~1M parameters
- **Cortex-A78 @ 1.451 GHz:**
  - 32 kHz: ~7M parameters
  - 16 kHz: ~10M parameters
  - 8 kHz: ~20M parameters
- **Cortex-A78+ @ 1.632 GHz:** Highest headroom
  - 32 kHz: ~8M parameters
  - 16 kHz: Comfortably supports 10M parameters
  - 8 kHz: ~20M parameters

**Key observation:** Inverse relationship between sample rate and model size capacity is consistently demonstrated.

### 3.3 Maximum Performance Envelope

Analysis at peak locked frequencies establishes absolute upper bounds:

#### 3.3.1 Efficiency Core (Cortex-A55 @ 2.0 GHz)

Even at peak frequency, A55 remains highly constrained. Models exceeding ~5M parameters (enc16dec384) fail real-time constraints at 8 kHz and above. **Unsuitable for large weight matrices.**

#### 3.3.2 Performance Core (Cortex-A78 @ 2.6 GHz)

**Most relevant benchmark for ULBC** - represents sustained compute capability of modern mobile devices.

**Critical "Complexity vs. Bandwidth" trade-off identified:**

- **32 kHz:** RTF crosses 1.0 near **10M parameters** (enc24dec576 variant)
  - **Hard limit for High-Fidelity ULBC candidates**
- **16 kHz:** Feasible model size effectively **doubles to ~20M parameters** (enc32dec768 variant)
  - enc40dec960 fails real-time constraints
  - **Linear relationship between bandwidth reduction and parameter capacity**
- **8 kHz:** Extends to **~39M parameters**
  - enc40dec960 (29M) is safe
  - Trend suggests failure before enc64dec1536

#### 3.3.3 Prime Core (Cortex-A78+ @ 3.0 GHz)

Results mirror A78 trends with slight improvements due to higher clock frequency. The **bandwidth bottleneck remains dominant** - higher clock speed provides safety margin for borderline models (e.g., enc24dec576 @ 32kHz) but doesn't fundamentally shift feasible model size category.

## 4. Key Technical Contributions

### 4.1 Quantified Complexity-Bandwidth Trade-off

Established precise inverse relationship: **halving sample rate approximately doubles feasible parameter count** on performance cores:
- 32 kHz → 10M parameters
- 16 kHz → 20M parameters  
- 8 kHz → 39M parameters

### 4.2 Real-World Performance Benchmarks

Provided concrete RTF measurements across representative mobile hardware configurations, revealing that:
- Theoretical complexity metrics (FLOPs) don't capture real-world bottlenecks
- Memory bandwidth and thermal constraints significantly impact feasibility
- Efficiency cores (A55) are unsuitable for neural codec workloads beyond minimal complexity

### 4.3 Practical Complexity Constraints for ULBC

Identified **10M parameter hard limit for 32 kHz operation** on mid-range mobile devices (A78 @ 2.6 GHz), providing concrete guidance for ULBC candidate selection.

## 5. Proposal

The contribution proposes including these RTF analysis findings in TR 26.940 to inform complexity constraint selection for ULBC candidates, moving the standardization process toward real-world deployability considerations rather than purely theoretical metrics.