# Alignment Analysis on Complexity of DAC Model

## 1. Introduction

This contribution addresses a significant discrepancy in complexity reporting for AI-based codecs in the ULBC study. Two contributions (S4-260165 from Dolby et al. and S4-260155 from vivo et al.) both reported models with approximately 3M parameters but showed substantially different complexity metrics:

- **S4-260165**: ~3M parameter model (32 kHz) requires 0.79 GMACS
- **S4-260155**: ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s)

Notably, the S4-260165 model's complexity (0.79 GMACS) aligns more closely with the S4-260155 model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate.

The contribution demonstrates that **Model Size (parameter count) is an insufficient metric** for constraining complexity across different neural architectures, and proposes **GMACS** as a robust, architecture-agnostic metric that provides linear correlation with RTF.

## 2. Architectural Analysis and Discrepancy Resolution

### 2.1 The "Model Size" Trap

A detailed breakdown comparison was performed between the two architectures to understand why models with similar parameter counts exhibit different computational footprints:

| Metric | [2] (16k, ~3M) | [1] (32k, ~3M) |
|--------|----------------|----------------|
| Input Rate | 16,000 Hz | 32,000 Hz |
| Total Stride | 320 (2×4×5×8) | 1280 (4×4×8×10) |
| Latent Rate | 50.0 Hz | 25.0 Hz |
| Encoder MACs (M) | 436.30 | 461.92 |
| Quantizer MACs (M) | 2.25 | 0.50 |
| Decoder MACs (M) | 984.50 | 1037.12 |
| Total MFlops/s | 1423.05 | 1499.54 |

**Key Analysis:**

- The S4-260165 (32k, ~3M) model runs at **2× higher input rate** (32k vs 16k), increasing encoder computational cost
- The S4-260165 model uses **4× higher stride** (1280 vs 320), reducing the latent rate to 25Hz (compared to standard 50Hz)
- The reduced latent rate significantly lowers decoder cost (fewer frames to upsample)
- Higher input cost balances with lower decoder/latent cost, resulting in comparable total MFlops/s

**Conclusion:** Two models with identical parameter counts can have vastly different runtimes depending on parameter location (shallow vs. deep layers) and stride configuration.

### 2.2 Verification of Complexity Metrics

Theoretical complexity (GMACS) was recalculated to validate the analysis:

- Using the standard conversion: **GMACS ≈ MFlops/s / 1000 × 0.5**
- The S4-260165 (32k, ~3M) model at 32 kHz yields ~1,499.5 MFlops/s
- **Calculated GMACS: 1499.5 / 1000 × 0.5 ≈ 0.75 GMACS**
- This aligns closely with the reference value of **0.79 GMACS** reported in S4-260165

## 3. GMACS as the Metric

When RTF data from S4-260155 is plotted against **GMACS** (rather than Model Size), the data aligns consistently across architectures.

**Key Findings:**

- **RTF scales linearly with GMACS** across different CPU tiers (Efficiency, Performance, Prime cores)
- A specific GMACS budget (e.g., 2.0 GMACS) yields predictable RTF on a target CPU core and frequency, **regardless of architectural choices** (high-sample-rate input vs. large parameter count in decoder)
- This metric **decouples complexity constraint from specific architectural choices** (stride, latent rate), allowing codec designers flexibility in optimization
- High-complexity validation: S4-260155's 20M model (~5.14 GMACS) demonstrates RTF of 0.9 in power-efficient execution mode on high-end 2023 device, aligning with mid-range Prime Core (3.0 GHz) trend where ~5.3 GMACS corresponds to RTF ≈ 1.0

## 4. Conclusion

By adopting **GMACS as the primary complexity metric**, the apparent discrepancies between different contribution data are resolved. This enables a unified set of requirements that accurately reflects real-time capability of mobile devices.

## 5. Proposal

**Propose to include this analysis in 3GPP TR 26.940**, specifically capturing:

- **Model Size is not a consistent proxy for complexity** across varying architectures (e.g., high-stride vs. low-stride configurations)
- **GMACS/GFLOPs demonstrates strong linear correlation** with real-time performance on mobile devices
- This analysis provides a solid basis for defining complexity constraints for ULBC candidates

## References

[1] S4-260165, "[FS_ULBC] On ULBC complexity and RTF analysis"

[2] S4-260155, "[FS_ULBC] Analysis of AI Codec Real-Time Performance (RTF) and Complexity Scaling"