# Complexity Analysis of AI Codec Scaling for ULBC

## 1. Introduction

This contribution addresses the need for establishing relevant complexity evaluation methods for the new ULBC codec standardization. Previous contributions (e.g., S4aA250264) highlighted potential gaps between theoretical complexity metrics (FLOPs) and practical on-device performance (Real-Time Factor).

This document provides a complementary analysis focusing on how complexity metrics scale with AI model architecture itself. The analysis investigates the relationship between model architecture, theoretical complexity, and traditional metrics using the publicly available DAC codec as a test case.

## 2. Analysis of AI Codec Complexity Scaling

### 2.1. Methodology

The analysis created seven "dummy" model variants based on the open-source DAC codec's 16kHz configuration. The approach:

- **Base Configuration:**
  - Sample rate: 16kHz
  - Encoder dimension: 64
  - Encoder rates: [2, 4, 5, 8]
  - Decoder dimension: 1536
  - Decoder rates: [8, 5, 4, 2]

- **Scaling Approach:**
  - Only `encoder_dim` and `decoder_dim` were modified
  - Encoder/decoder rates kept constant across all variants
  - Total up/down-sampling factor maintained at 320 (2×4×5×8 = 8×5×4×2)
  - Frame size: 20ms (320 samples at 16kHz)

- **Variant Configurations:**
  - enc8dec144
  - enc12dec288
  - enc16dec384
  - enc24dec576
  - enc32dec768
  - enc40dec960
  - enc64dec1536

**Complexity Metrics Measured:**

1. **Model Parameters (Millions):** Total trainable parameters
2. **Theoretical Complexity (MFLOP/s):** Calculated using thop profiling library (aligned with S4aA250264 and S4aA250231)
3. **WMOPS:** Traditional methodology using ITU-T STL wmc_tool, measured separately for encoder and decoder

**Implementation Notes:**
- Each AI operation implemented in pure C
- Source files annotated and compiled using wmc_tool
- WMOPS highly sensitive to C implementation efficiency
- Naive implementations can yield significantly higher counts than optimized versions

### 2.2. Complexity vs. Model Dimensions

**Key Findings:**

- Clear **non-linear relationship** between latent dimensions and resulting parameters/computational load
- Model parameters and MFLOP/s scale **quadratically (or faster)**, not linearly, as encoder_dim and decoder_dim increase
- Results visualized in Figure 1 (Parameters vs. Dimension) and Figure 2 (MFLOP/s vs. Dimension)
- Encoder and decoder points are linked pairs corresponding to bundled setups

### 2.3. WMOPS vs. Model Parameters

**Key Finding:** Clear relationship between AI model size (in millions of parameters) and traditional WMOPS complexity.

**Observations on DAC Model:**

1. **Clear correlation** between number of model parameters and resulting WMOPS when using same architecture with same C optimization level
2. **Decoder complexity scales significantly faster** and is substantially higher than encoder complexity for all variants (DAC arranges more parameters/complexity for decoder to achieve better reconstructed audio quality)
3. **Growth in WMOPS appears linear** relative to increase in parameters for both encoder and decoder

### 2.4. Summary of Scaled Variants

Complete complexity metrics for all seven DAC variants (16kHz, 20ms frame):

| Variant | Enc Dim | Dec Dim | Params (M) | GFLOP counts | MFLOP/s | WMOPS Enc | WMOPS Dec |
|---------|---------|---------|------------|--------------|---------|-----------|-----------|
| enc8dec144 | 8 | 144 | 1.09 | 0.009 | 437.09 | 333.92 | 760.53 |
| enc12dec288 | 12 | 288 | 2.89 | 0.028 | 1397.63 | 648.23 | 2732.96 |
| enc16dec384 | 16 | 384 | 4.94 | 0.050 | 2481.98 | 1060.79 | 4724.38 |
| enc24dec576 | 24 | 576 | 10.76 | 0.112 | 5578.38 | 2228.92 | 10399.00 |
| enc32dec768 | 32 | 768 | 18.90 | 0.198 | 9911.72 | 3693.56 | 18093.30 |
| enc40dec960 | 40 | 960 | 29.34 | 0.310 | 15482.00 | 5599.48 | 28019.70 |
| enc64dec1536 | 64 | 1536 | 74.50 | 0.792 | 39614.50 | 13675.30 | 70766.69 |

Data demonstrates rapid scaling of all metrics as encoder and decoder dimensions increase.

## 3. Observations and Conclusions

Based on the DAC model variant analysis:

1. **Linear Relationship:** For the DAC model, there is a clear linear relationship between Theoretical Complexity (MFLOP/s), Model Parameters, and measured WMOPS. As MFLOP/s or parameter count increases, WMOPS increases linearly, provided C coding style remains consistent.

2. **Quadratic Growth:** Increasing model's internal dimensions causes complexity to grow quadratically. Even small dimension increases lead to disproportionately large jumps in MFLOP/s and WMOPS.

3. **Implementation Dependency:** WMOPS score depends heavily on source C code efficiency.

## 4. Proposal

**It is proposed to capture the above analysis into 3GPP TR 26.940.**