S4-260155 - AI Summary

[FS_ULBC] Analysis of AI Codec Real-Time Performance (RTF) and Complexity Scaling

Back to Agenda Download Summary
AI-Generated Summary AI

Analysis of AI Codec Real-Time Performance (RTF) and Complexity Scaling

1. Introduction and Motivation

This contribution addresses a critical gap in the Ultra Low Bitrate Speech Codec (ULBC) study by moving beyond theoretical complexity metrics (FLOPs, WMOPS) to evaluate real-world performance on mobile devices. The key observation is that static metrics fail to capture system-level bottlenecks including memory bandwidth pressure and thermal constraints on mobile SoCs. The document presents a comprehensive RTF analysis of a neural audio codec (based on Descript Audio Codec architecture) across multiple model sizes and sample rates on representative mid-range mobile hardware.

2. Experimental Setup

2.1 Model Configuration

Eight model variants were evaluated, ranging from enc8dec144 to enc64dec1536, with parameter counts spanning 1M to 74M:

  • Architecture: Fully convolutional encoder-decoder with Residual Vector Quantization (RVQ)
  • Frame length: 40ms (fixed across all variants)
  • Total up/down-sampling factor: 320 (consistent across variants)
  • Sample rates tested: 8 kHz (320 samples), 16 kHz (640 samples), 32 kHz (1280 samples)
  • Export format: ONNX with Float32 precision

Key complexity observations from Table 1:
- Parameter counts range from 1.09M (enc8dec144) to 74.50M (enc64dec1536)
- Model sizes range from 4.3 MB to 283.6 MB
- Computational complexity scales proportionally with sample rate (e.g., enc32dec768: 4955.9 MFlops/s @ 8kHz, 9972.6 MFlops/s @ 16kHz, 20006.1 MFlops/s @ 32kHz)

2.2 Device Under Test (DUT) Environment

  • Platform: MediaTek Dimensity 1200 (6nm) - representative mid-range SoC
  • Inference engine: ONNX Runtime v1.14+ with CPU execution provider (single-threaded)
  • CPU clusters tested:
  • Efficiency cluster: Cortex-A55
  • Performance cluster: Cortex-A78
  • Prime core: Cortex-A78+
  • Methodology: Frequency-locked operation with disabled thermal services and power HALs to eliminate dynamic frequency scaling noise

3. Results and Analysis

3.1 Complexity Scaling vs. Bandwidth

Critical finding: For a given model variant, computational complexity scales linearly with sample rate:
- enc32dec768 example:
- 8 kHz: ~0.20 GFLOP counts (4955.9 MFlops/s)
- 16 kHz: ~0.40 GFLOP counts (9972.6 MFlops/s) - 2x increase
- 32 kHz: ~0.80 GFLOP counts (20006.1 MFlops/s) - 4x increase

Implication: Higher sampling rates incur proportional computational penalty. For resource-constrained devices (IoT, wearables), NB mode at 8 kHz is recommended.

3.2 Real-Time Factor (RTF) Analysis Across Three Frequency Tiers

3.2.1 Tier 1: Low Frequency (A55@750MHz, A78@902MHz, A78+@1.1GHz)

Energy-conserving state with severe constraints:

  • Cortex-A55 @ 750 MHz: Only smallest models (enc8dec144) maintain real-time at 8 kHz; 16/32 kHz unfeasible
  • Cortex-A78 @ 902 MHz:
  • 32 kHz: Limited to <3M parameters
  • 16 kHz: Supports up to ~8M parameters
  • 8 kHz: Supports up to ~10M parameters
  • Cortex-A78+ @ 1.108 GHz: Similar to A78 but extends 16 kHz limit closer to 10M parameters

3.2.2 Tier 2: Mid Frequency (A55@1.0GHz, A78@1.16GHz, A78+@1.37GHz)

Typical sustained workload state:

  • Cortex-A55 @ 1.0 GHz: 8 kHz supports up to ~2M parameters; 16/32 kHz remain largely unfeasible
  • Cortex-A78 @ 1.162 GHz:
  • 32 kHz: ~5M parameter limit
  • 16 kHz: ~10M parameters (covers "Low Complexity" profile)
  • 8 kHz: Robust up to ~20M parameters
  • Cortex-A78+ @ 1.37 GHz: Performance parity with A78 (clock speed is primary differentiator)

3.2.3 Tier 3: High Frequency (A55@1.73GHz, A78@1.45GHz, A78+@1.63GHz)

High-performance state approaching sustained limits:

  • Cortex-A55 @ 1.73 GHz:
  • 8 kHz: ~3M parameters
  • 16 kHz: ~2M parameters
  • 32 kHz: ~1M parameters
  • Cortex-A78 @ 1.451 GHz:
  • 32 kHz: ~7M parameters
  • 16 kHz: ~10M parameters
  • 8 kHz: ~20M parameters
  • Cortex-A78+ @ 1.632 GHz: Highest headroom
  • 32 kHz: ~8M parameters
  • 16 kHz: Comfortably supports 10M parameters
  • 8 kHz: ~20M parameters

Key observation: Inverse relationship between sample rate and model size capacity is consistently demonstrated.

3.3 Maximum Performance Envelope

Analysis at peak locked frequencies establishes absolute upper bounds:

3.3.1 Efficiency Core (Cortex-A55 @ 2.0 GHz)

Even at peak frequency, A55 remains highly constrained. Models exceeding ~5M parameters (enc16dec384) fail real-time constraints at 8 kHz and above. Unsuitable for large weight matrices.

3.3.2 Performance Core (Cortex-A78 @ 2.6 GHz)

Most relevant benchmark for ULBC - represents sustained compute capability of modern mobile devices.

Critical "Complexity vs. Bandwidth" trade-off identified:

  • 32 kHz: RTF crosses 1.0 near 10M parameters (enc24dec576 variant)
  • Hard limit for High-Fidelity ULBC candidates
  • 16 kHz: Feasible model size effectively doubles to ~20M parameters (enc32dec768 variant)
  • enc40dec960 fails real-time constraints
  • Linear relationship between bandwidth reduction and parameter capacity
  • 8 kHz: Extends to ~39M parameters
  • enc40dec960 (29M) is safe
  • Trend suggests failure before enc64dec1536

3.3.3 Prime Core (Cortex-A78+ @ 3.0 GHz)

Results mirror A78 trends with slight improvements due to higher clock frequency. The bandwidth bottleneck remains dominant - higher clock speed provides safety margin for borderline models (e.g., enc24dec576 @ 32kHz) but doesn't fundamentally shift feasible model size category.

4. Key Technical Contributions

4.1 Quantified Complexity-Bandwidth Trade-off

Established precise inverse relationship: halving sample rate approximately doubles feasible parameter count on performance cores:
- 32 kHz → 10M parameters
- 16 kHz → 20M parameters
- 8 kHz → 39M parameters

4.2 Real-World Performance Benchmarks

Provided concrete RTF measurements across representative mobile hardware configurations, revealing that:
- Theoretical complexity metrics (FLOPs) don't capture real-world bottlenecks
- Memory bandwidth and thermal constraints significantly impact feasibility
- Efficiency cores (A55) are unsuitable for neural codec workloads beyond minimal complexity

4.3 Practical Complexity Constraints for ULBC

Identified 10M parameter hard limit for 32 kHz operation on mid-range mobile devices (A78 @ 2.6 GHz), providing concrete guidance for ULBC candidate selection.

5. Proposal

The contribution proposes including these RTF analysis findings in TR 26.940 to inform complexity constraint selection for ULBC candidates, moving the standardization process toward real-world deployability considerations rather than purely theoretical metrics.

Document Information
Source:
vivo Mobile Communication Co., Xiaomi Technology, Spreadtrum, Bytedance
Type:
pCR
For:
Agreement
Original Document:
View on 3GPP
Title: [FS_ULBC] Analysis of AI Codec Real-Time Performance (RTF) and Complexity Scaling
Agenda item: 7.8
Agenda item description: FS_ULBC (Study on Ultra Low Bitrate Speech Codec)
Doc type: pCR
For action: Agreement
Abstract: As part of the study on the new Ultra Low Bitrate Speech Codec (ULBC) [1], it is necessary to establish complexity constraints that reflect real-world device capabilities. Previous contributions have analyzed theoretical complexity using static metrics such as FLOPs and WMOPS [2] [5]. However, static metrics often fail to capture system-level bottlenecks, such as memory bandwidth pressure and thermal constraints on mobile System-on-Chips (SoCs). This contribution presents a comprehensive performance analysis of a neural audio codec (based on the Descript Audio Codec architecture) running on a representative mid-range mobile platform. By sweeping across model sizes (1M to 74M parameters) and sample rates (8, 16, 32 kHz), we evaluate the correlation between theoretical complexity and the Real-Time Factor (RTF).
Release: Rel-20
Specification: 26.94
Version: 0.4.0
Related WIs: FS_ULBC
Spec: 26.94
Contact: Wang Dong
Uploaded: 2026-02-03T13:43:09.937000
Contact ID: 107237
Revised to: S4-260445
TDoc Status: revised
Reservation date: 03/02/2026 12:35:47
Agenda item sort order: 20