S4-260209 - AI Proposals

[FS_ULBC] Alignment Analysis on Complexity of DAC model

Back to Agenda Download Proposals
AI-Generated Proposals AI

Proposal

We propose to include the analysis presented in this contribution into 3GPP TR 26.940. Specifically, the text should capture the findings that Model Size is not a consistent proxy for complexity across varying architectures (e.g., high-stride vs. low-stride), and that GMACS/GFLOPs demonstrates a strong linear correlation with real-time performance on mobile devices. Documenting this analysis will provide a solid basis for defining the complexity constraints for the ULBC candidate.

Document Information
Source:
vivo Mobile Communication Co.,
Type:
pCR
For:
Agreement
Original Document:
View on 3GPP
Title: [FS_ULBC] Alignment Analysis on Complexity of DAC model
Agenda item: 7.8
Agenda item description: FS_ULBC (Study on Ultra Low Bitrate Speech Codec)
Doc type: pCR
For action: Agreement
Abstract: In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.
Release: Rel-20
Specification: 26.94
Version: 0.5.1
Related WIs: FS_ULBC
Spec: 26.94
Contact: Wang Dong
Uploaded: 2026-02-03T17:47:01.727000
Contact ID: 107237
Revised to: S4-260443
TDoc Status: revised
Reservation date: 03/02/2026 17:43:14
Agenda item sort order: 20