TDoc: S4-260443

Meeting: TSGS4_135_India | Agenda Item: 7.8

Back to Agenda
Document Information
Title

[FS_ULBC] Alignment Analysis on Complexity of DAC model

Source

vivo Mobile Communication Co.,

Type

pCR

For

Agreement

Release

Rel-20

Specification

26.94

3GPP Document
View on 3GPP
Abstract

In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.

TDoc S4-260443
Title [FS_ULBC] Alignment Analysis on Complexity of DAC model
Source vivo Mobile Communication Co.,
Agenda item 7.8
Agenda item description FS_ULBC (Study on Ultra Low Bitrate Speech Codec)
Doc type pCR
For action Agreement
Abstract In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.
Release Rel-20
Specification 26.94
Version 0.5.1
Related WIs FS_ULBC
download_url https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260443.zip
For Agreement
Spec 26.94
Type pCR
Contact Wang Dong
Uploaded 2026-02-12T13:53:11.590000
Contact ID 107237
TDoc Status agreed
Is revision of S4-260209
Reservation date 12/02/2026 08:41:23
Agenda item sort order 20
Comments
You must log in to post comment