Proposal: It is proposed to include the findings of this RTF analysis in TR 26.940 to inform the selection of complexity constraint for the ULBC candidate.
Abstract: As part of the study on the new Ultra Low Bitrate Speech Codec (ULBC) [1], it is necessary to establish complexity constraints that reflect real-world device capabilities. Previous contributions have analyzed theoretical complexity using static metrics such as FLOPs and WMOPS [2] [5]. However, static metrics often fail to capture system-level bottlenecks, such as memory bandwidth pressure and thermal constraints on mobile System-on-Chips (SoCs).
This contribution presents a comprehensive performance analysis of a neural audio codec (based on the Descript Audio Codec architecture) running on a representative mid-range mobile platform. By sweeping across model sizes (1M to 74M parameters) and sample rates (8, 16, 32 kHz), we evaluate the correlation between theoretical complexity and the Real-Time Factor (RTF).