S4-260158 - AI Summary

[FS_ULBC] Analysis of AI Codec Complexity Scaling

AI-Generated Summary AI

Complexity Analysis of AI Codec Scaling for ULBC

1. Introduction

This contribution addresses the need for establishing relevant complexity evaluation methods for the new ULBC codec standardization. Previous contributions (e.g., S4aA250264) highlighted potential gaps between theoretical complexity metrics (FLOPs) and practical on-device performance (Real-Time Factor).

This document provides a complementary analysis focusing on how complexity metrics scale with AI model architecture itself. The analysis investigates the relationship between model architecture, theoretical complexity, and traditional metrics using the publicly available DAC codec as a test case.

2. Analysis of AI Codec Complexity Scaling

2.1. Methodology

The analysis created seven "dummy" model variants based on the open-source DAC codec's 16kHz configuration. The approach:

Base Configuration:
Sample rate: 16kHz
Encoder dimension: 64
Encoder rates: [2, 4, 5, 8]
Decoder dimension: 1536
Decoder rates: [8, 5, 4, 2]
Scaling Approach:
Only encoder_dim and decoder_dim were modified
Encoder/decoder rates kept constant across all variants
Total up/down-sampling factor maintained at 320 (2×4×5×8 = 8×5×4×2)
Frame size: 20ms (320 samples at 16kHz)
Variant Configurations:
enc8dec144
enc12dec288
enc16dec384
enc24dec576
enc32dec768
enc40dec960
enc64dec1536

Complexity Metrics Measured:

Model Parameters (Millions): Total trainable parameters
Theoretical Complexity (MFLOP/s): Calculated using thop profiling library (aligned with S4aA250264 and S4aA250231)
WMOPS: Traditional methodology using ITU-T STL wmc_tool, measured separately for encoder and decoder

Implementation Notes:
- Each AI operation implemented in pure C
- Source files annotated and compiled using wmc_tool
- WMOPS highly sensitive to C implementation efficiency
- Naive implementations can yield significantly higher counts than optimized versions

2.2. Complexity vs. Model Dimensions

Key Findings:

Clear non-linear relationship between latent dimensions and resulting parameters/computational load
Model parameters and MFLOP/s scale quadratically (or faster), not linearly, as encoder_dim and decoder_dim increase
Results visualized in Figure 1 (Parameters vs. Dimension) and Figure 2 (MFLOP/s vs. Dimension)
Encoder and decoder points are linked pairs corresponding to bundled setups

2.3. WMOPS vs. Model Parameters

Key Finding: Clear relationship between AI model size (in millions of parameters) and traditional WMOPS complexity.

Observations on DAC Model:

Clear correlation between number of model parameters and resulting WMOPS when using same architecture with same C optimization level
Decoder complexity scales significantly faster and is substantially higher than encoder complexity for all variants (DAC arranges more parameters/complexity for decoder to achieve better reconstructed audio quality)
Growth in WMOPS appears linear relative to increase in parameters for both encoder and decoder

2.4. Summary of Scaled Variants

Complete complexity metrics for all seven DAC variants (16kHz, 20ms frame):

| Variant | Enc Dim | Dec Dim | Params (M) | GFLOP counts | MFLOP/s | WMOPS Enc | WMOPS Dec |
|---------|---------|---------|------------|--------------|---------|-----------|-----------|
| enc8dec144 | 8 | 144 | 1.09 | 0.009 | 437.09 | 333.92 | 760.53 |
| enc12dec288 | 12 | 288 | 2.89 | 0.028 | 1397.63 | 648.23 | 2732.96 |
| enc16dec384 | 16 | 384 | 4.94 | 0.050 | 2481.98 | 1060.79 | 4724.38 |
| enc24dec576 | 24 | 576 | 10.76 | 0.112 | 5578.38 | 2228.92 | 10399.00 |
| enc32dec768 | 32 | 768 | 18.90 | 0.198 | 9911.72 | 3693.56 | 18093.30 |
| enc40dec960 | 40 | 960 | 29.34 | 0.310 | 15482.00 | 5599.48 | 28019.70 |
| enc64dec1536 | 64 | 1536 | 74.50 | 0.792 | 39614.50 | 13675.30 | 70766.69 |

Data demonstrates rapid scaling of all metrics as encoder and decoder dimensions increase.

3. Observations and Conclusions

Based on the DAC model variant analysis:

Linear Relationship: For the DAC model, there is a clear linear relationship between Theoretical Complexity (MFLOP/s), Model Parameters, and measured WMOPS. As MFLOP/s or parameter count increases, WMOPS increases linearly, provided C coding style remains consistent.
Quadratic Growth: Increasing model's internal dimensions causes complexity to grow quadratically. Even small dimension increases lead to disproportionately large jumps in MFLOP/s and WMOPS.
Implementation Dependency: WMOPS score depends heavily on source C code efficiency.

4. Proposal

It is proposed to capture the above analysis into 3GPP TR 26.940.

Document Information

TDoc:
S4-260158

Source:
vivo Mobile Communication Co.,

Type:
pCR

For:
Agreement

Original Document:
View on 3GPP

Title: [FS_ULBC] Analysis of AI Codec Complexity Scaling

Agenda item: 7.8

Agenda item description: FS_ULBC (Study on Ultra Low Bitrate Speech Codec)

Doc type: pCR

For action: Agreement

Abstract: For the standardization of the new ULBC codec [1], establishing a relevant method for evaluating complexity is essential. Previous contributions (e.g., S4aA250264 [2]) have highlighted the potential gap between theoretical complexity metrics (e.g., FLOPs) and practical, on-device performance (e.g., Real-Time Factor). A complementary aspect to this discussion is understanding how these complexity metrics scale, not just with frame size, but with the AI model's architecture itself. As AI-based codecs may be proposed with different model sizes or "operating points" (e.g., trading off quality for complexity), it is crucial to understand the relationship between model architecture, theoretical complexity, and traditional metrics. To investigate this, this contribution provides a complexity analysis of a publicly available AI codec (DAC [3]), where different "dummy" variants of the model were created by scaling the model's internal latent dimensions (DAC.encoder_dim and DAC.decoder_dim). The analysis maps the relationship between model parameters, theoretical FLOPs, and traditional WMOPS, providing data to help inform the setting of a reasonable complexity constraint framework.

Release: Rel-20

Specification: 26.94

Version: 0.4.0

Related WIs: FS_ULBC

Spec: 26.94

Contact: Wang Dong

Uploaded: 2026-02-03T13:43:09.967000

Contact ID: 107237

Revised to: S4-260444

TDoc Status: revised

Is revision of: S4-251793

Reservation date: 03/02/2026 12:42:27

Agenda item sort order: 20