S4-260209 - AI Proposals

[FS_ULBC] Alignment Analysis on Complexity of DAC model

AI-Generated Proposals AI

Proposal

We propose to include the analysis presented in this contribution into 3GPP TR 26.940. Specifically, the text should capture the findings that Model Size is not a consistent proxy for complexity across varying architectures (e.g., high-stride vs. low-stride), and that GMACS/GFLOPs demonstrates a strong linear correlation with real-time performance on mobile devices. Documenting this analysis will provide a solid basis for defining the complexity constraints for the ULBC candidate.

Document Information

TDoc:
S4-260209

Source:
vivo Mobile Communication Co.,

Type:
pCR

For:
Agreement

Original Document:
View on 3GPP

Title: [FS_ULBC] Alignment Analysis on Complexity of DAC model

Agenda item: 7.8

Agenda item description: FS_ULBC (Study on Ultra Low Bitrate Speech Codec)

Doc type: pCR

For action: Agreement

Abstract: In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.

Release: Rel-20

Specification: 26.94

Version: 0.5.1

Related WIs: FS_ULBC

Spec: 26.94

Contact: Wang Dong

Uploaded: 2026-02-03T17:47:01.727000

Contact ID: 107237

Revised to: S4-260443

TDoc Status: revised

Reservation date: 03/02/2026 17:43:14

Agenda item sort order: 20