S4-260158 - AI Summary

[FS_ULBC] Analysis of AI Codec Complexity Scaling

Back to Agenda Download Summary
AI-Generated Summary AI

Complexity Analysis of AI Codec Scaling for ULBC

1. Introduction

This contribution addresses the need for establishing relevant complexity evaluation methods for the new ULBC codec standardization. Previous contributions (e.g., S4aA250264) highlighted potential gaps between theoretical complexity metrics (FLOPs) and practical on-device performance (Real-Time Factor).

This document provides a complementary analysis focusing on how complexity metrics scale with AI model architecture itself. The analysis investigates the relationship between model architecture, theoretical complexity, and traditional metrics using the publicly available DAC codec as a test case.

2. Analysis of AI Codec Complexity Scaling

2.1. Methodology

The analysis created seven "dummy" model variants based on the open-source DAC codec's 16kHz configuration. The approach:

  • Base Configuration:
  • Sample rate: 16kHz
  • Encoder dimension: 64
  • Encoder rates: [2, 4, 5, 8]
  • Decoder dimension: 1536
  • Decoder rates: [8, 5, 4, 2]

  • Scaling Approach:

  • Only encoder_dim and decoder_dim were modified
  • Encoder/decoder rates kept constant across all variants
  • Total up/down-sampling factor maintained at 320 (2×4×5×8 = 8×5×4×2)
  • Frame size: 20ms (320 samples at 16kHz)

  • Variant Configurations:

  • enc8dec144
  • enc12dec288
  • enc16dec384
  • enc24dec576
  • enc32dec768
  • enc40dec960
  • enc64dec1536

Complexity Metrics Measured:

  1. Model Parameters (Millions): Total trainable parameters
  2. Theoretical Complexity (MFLOP/s): Calculated using thop profiling library (aligned with S4aA250264 and S4aA250231)
  3. WMOPS: Traditional methodology using ITU-T STL wmc_tool, measured separately for encoder and decoder

Implementation Notes:
- Each AI operation implemented in pure C
- Source files annotated and compiled using wmc_tool
- WMOPS highly sensitive to C implementation efficiency
- Naive implementations can yield significantly higher counts than optimized versions

2.2. Complexity vs. Model Dimensions

Key Findings:

  • Clear non-linear relationship between latent dimensions and resulting parameters/computational load
  • Model parameters and MFLOP/s scale quadratically (or faster), not linearly, as encoder_dim and decoder_dim increase
  • Results visualized in Figure 1 (Parameters vs. Dimension) and Figure 2 (MFLOP/s vs. Dimension)
  • Encoder and decoder points are linked pairs corresponding to bundled setups

2.3. WMOPS vs. Model Parameters

Key Finding: Clear relationship between AI model size (in millions of parameters) and traditional WMOPS complexity.

Observations on DAC Model:

  1. Clear correlation between number of model parameters and resulting WMOPS when using same architecture with same C optimization level
  2. Decoder complexity scales significantly faster and is substantially higher than encoder complexity for all variants (DAC arranges more parameters/complexity for decoder to achieve better reconstructed audio quality)
  3. Growth in WMOPS appears linear relative to increase in parameters for both encoder and decoder

2.4. Summary of Scaled Variants

Complete complexity metrics for all seven DAC variants (16kHz, 20ms frame):

| Variant | Enc Dim | Dec Dim | Params (M) | GFLOP counts | MFLOP/s | WMOPS Enc | WMOPS Dec |
|---------|---------|---------|------------|--------------|---------|-----------|-----------|
| enc8dec144 | 8 | 144 | 1.09 | 0.009 | 437.09 | 333.92 | 760.53 |
| enc12dec288 | 12 | 288 | 2.89 | 0.028 | 1397.63 | 648.23 | 2732.96 |
| enc16dec384 | 16 | 384 | 4.94 | 0.050 | 2481.98 | 1060.79 | 4724.38 |
| enc24dec576 | 24 | 576 | 10.76 | 0.112 | 5578.38 | 2228.92 | 10399.00 |
| enc32dec768 | 32 | 768 | 18.90 | 0.198 | 9911.72 | 3693.56 | 18093.30 |
| enc40dec960 | 40 | 960 | 29.34 | 0.310 | 15482.00 | 5599.48 | 28019.70 |
| enc64dec1536 | 64 | 1536 | 74.50 | 0.792 | 39614.50 | 13675.30 | 70766.69 |

Data demonstrates rapid scaling of all metrics as encoder and decoder dimensions increase.

3. Observations and Conclusions

Based on the DAC model variant analysis:

  1. Linear Relationship: For the DAC model, there is a clear linear relationship between Theoretical Complexity (MFLOP/s), Model Parameters, and measured WMOPS. As MFLOP/s or parameter count increases, WMOPS increases linearly, provided C coding style remains consistent.

  2. Quadratic Growth: Increasing model's internal dimensions causes complexity to grow quadratically. Even small dimension increases lead to disproportionately large jumps in MFLOP/s and WMOPS.

  3. Implementation Dependency: WMOPS score depends heavily on source C code efficiency.

4. Proposal

It is proposed to capture the above analysis into 3GPP TR 26.940.

Document Information
Source:
vivo Mobile Communication Co.,
Type:
pCR
For:
Agreement
Original Document:
View on 3GPP
Title: [FS_ULBC] Analysis of AI Codec Complexity Scaling
Agenda item: 7.8
Agenda item description: FS_ULBC (Study on Ultra Low Bitrate Speech Codec)
Doc type: pCR
For action: Agreement
Abstract: For the standardization of the new ULBC codec [1], establishing a relevant method for evaluating complexity is essential. Previous contributions (e.g., S4aA250264 [2]) have highlighted the potential gap between theoretical complexity metrics (e.g., FLOPs) and practical, on-device performance (e.g., Real-Time Factor). A complementary aspect to this discussion is understanding how these complexity metrics scale, not just with frame size, but with the AI model's architecture itself. As AI-based codecs may be proposed with different model sizes or "operating points" (e.g., trading off quality for complexity), it is crucial to understand the relationship between model architecture, theoretical complexity, and traditional metrics. To investigate this, this contribution provides a complexity analysis of a publicly available AI codec (DAC [3]), where different "dummy" variants of the model were created by scaling the model's internal latent dimensions (DAC.encoder_dim and DAC.decoder_dim). The analysis maps the relationship between model parameters, theoretical FLOPs, and traditional WMOPS, providing data to help inform the setting of a reasonable complexity constraint framework.
Release: Rel-20
Specification: 26.94
Version: 0.4.0
Related WIs: FS_ULBC
Spec: 26.94
Contact: Wang Dong
Uploaded: 2026-02-03T13:43:09.967000
Contact ID: 107237
Revised to: S4-260444
TDoc Status: revised
Is revision of: S4-251793
Reservation date: 03/02/2026 12:42:27
Agenda item sort order: 20