[FS_ULBC] Alignment Analysis on Complexity of DAC model
This contribution addresses a significant discrepancy in complexity reporting for AI-based codecs in the ULBC study. Two contributions (S4-260165 from Dolby et al. and S4-260155 from vivo et al.) both reported models with approximately 3M parameters but showed substantially different complexity metrics:
Notably, the S4-260165 model's complexity (0.79 GMACS) aligns more closely with the S4-260155 model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate.
The contribution demonstrates that Model Size (parameter count) is an insufficient metric for constraining complexity across different neural architectures, and proposes GMACS as a robust, architecture-agnostic metric that provides linear correlation with RTF.
A detailed breakdown comparison was performed between the two architectures to understand why models with similar parameter counts exhibit different computational footprints:
| Metric | [2] (16k, ~3M) | [1] (32k, ~3M) |
|--------|----------------|----------------|
| Input Rate | 16,000 Hz | 32,000 Hz |
| Total Stride | 320 (2×4×5×8) | 1280 (4×4×8×10) |
| Latent Rate | 50.0 Hz | 25.0 Hz |
| Encoder MACs (M) | 436.30 | 461.92 |
| Quantizer MACs (M) | 2.25 | 0.50 |
| Decoder MACs (M) | 984.50 | 1037.12 |
| Total MFlops/s | 1423.05 | 1499.54 |
Key Analysis:
Conclusion: Two models with identical parameter counts can have vastly different runtimes depending on parameter location (shallow vs. deep layers) and stride configuration.
Theoretical complexity (GMACS) was recalculated to validate the analysis:
When RTF data from S4-260155 is plotted against GMACS (rather than Model Size), the data aligns consistently across architectures.
Key Findings:
By adopting GMACS as the primary complexity metric, the apparent discrepancies between different contribution data are resolved. This enables a unified set of requirements that accurately reflects real-time capability of mobile devices.
Propose to include this analysis in 3GPP TR 26.940, specifically capturing:
[1] S4-260165, "[FS_ULBC] On ULBC complexity and RTF analysis"
[2] S4-260155, "[FS_ULBC] Analysis of AI Codec Real-Time Performance (RTF) and Complexity Scaling"