Meeting: TSGS4_135_India | Agenda Item: 7.8
[FS_ULBC] Alignment Analysis on Complexity of DAC model
vivo Mobile Communication Co.,
pCR
Agreement
In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.
| TDoc | S4-260443 |
| Title | [FS_ULBC] Alignment Analysis on Complexity of DAC model |
| Source | vivo Mobile Communication Co., |
| Agenda item | 7.8 |
| Agenda item description | FS_ULBC (Study on Ultra Low Bitrate Speech Codec) |
| Doc type | pCR |
| For action | Agreement |
| Abstract | In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates. |
| Release | Rel-20 |
| Specification | 26.94 |
| Version | 0.5.1 |
| Related WIs | FS_ULBC |
| download_url | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260443.zip |
| For | Agreement |
| Spec | 26.94 |
| Type | pCR |
| Contact | Wang Dong |
| Uploaded | 2026-02-12T13:53:11.590000 |
| Contact ID | 107237 |
| TDoc Status | agreed |
| Is revision of | S4-260209 |
| Reservation date | 12/02/2026 08:41:23 |
| Agenda item sort order | 20 |