S4-260209 Metadata - 3GPP Contribution Reviewer

Document Information

Title

[FS_ULBC] Alignment Analysis on Complexity of DAC model

Source

vivo Mobile Communication Co.,

Type

pCR

For

Agreement

Release

Rel-20

Specification

26.94

3GPP Document

View on 3GPP

Abstract

In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.

TDoc	S4-260209
Title	[FS_ULBC] Alignment Analysis on Complexity of DAC model
Source	vivo Mobile Communication Co.,
Agenda item	7.8
Agenda item description	FS_ULBC (Study on Ultra Low Bitrate Speech Codec)
Doc type	pCR
For action	Agreement
Abstract	In this meeting, several companies have presented complexity analyses for AI-based codecs suitable for ULBC. notably S4-260165 [1] (Dolby et al.) and S4-260155 [2] (vivo et al.). While both contributions agree on the general feasibility of AI codecs on modern smartphones, there appeared to be a discrepancy when comparing "Model Size" (parameter count) to Real-Time Factor (RTF). Specifically, for a model of a similar parameter count (e.g., ~3M parameters), the reported complexity and RTF varied significantly between the architectures tested in [1] and [2]. For instance, the [1] ~3M parameter model (32 kHz) requires 0.79 GMACS, whereas the equivalent [2] ~3M parameter model (32 kHz) requires approximately 1.41 GMACS (derived from 2821 MFlops/s). In fact, the [1] model's complexity (0.79 GMACS) aligns more closely with the [2] model operating at 16 kHz (~0.70 GMACS), despite the difference in sampling rate. This contribution provides a detailed analysis of this discrepancy. It demonstrates that "Model Size" is an insufficient metric for constraining complexity across different neural architectures, or even within the same architecture (e.g., DAC). Instead, we show that GMACS (Giga Multiply-Accumulate Operations per Second) provides a robust, linear correlation with RTF across different architectures, sampling rates, and frame rates.
Release	Rel-20
Specification	26.94
Version	0.5.1
Related WIs	FS_ULBC
download_url	https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260209.zip
For	Agreement
Spec	26.94
Type	pCR
Contact	Wang Dong
Uploaded	2026-02-03T17:47:01.727000
Contact ID	107237
Revised to	S4-260443
TDoc Status	revised
Reservation date	03/02/2026 17:43:14
Agenda item sort order	20

Comments

You must log in to post comment

Log In

TDoc: S4-260209