TDoc: S4-260209

Meeting: TSGS4_135_India | Agenda Item: 7.8

Back to Agenda
Document Information
Title

[FS_ULBC] Alignment Analysis on Complexity of DAC model

Source

vivo Mobile Communication Co.,

Type

pCR

For

Agreement

Release

Rel-20

Specification

26.94

3GPP Document
View on 3GPP
Abstract

In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.

TDoc S4-260209
Title [FS_ULBC] Alignment Analysis on Complexity of DAC model
Source vivo Mobile Communication Co.,
Agenda item 7.8
Agenda item description FS_ULBC (Study on Ultra Low Bitrate Speech Codec)
Doc type pCR
For action Agreement
Abstract In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.
Release Rel-20
Specification 26.94
Version 0.5.1
Related WIs FS_ULBC
download_url https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260209.zip
For Agreement
Spec 26.94
Type pCR
Contact Wang Dong
Uploaded 2026-02-03T17:47:01.727000
Contact ID 107237
Revised to S4-260443
TDoc Status revised
Reservation date 03/02/2026 17:43:14
Agenda item sort order 20
Comments
You must log in to post comment