Meeting: TSGS4_135_India | Agenda Item: 11.1
demonstration of real-time ai codec transmission in WebRTC
Huawei Tech.(UK) Co.. Ltd
discussion
Discussion
Rel-20
| TDoc | S4-260098 |
| Title | demonstration of real-time ai codec transmission in WebRTC |
| Source | Huawei Tech.(UK) Co.. Ltd |
| Agenda item | 11.1 |
| Agenda item description | FS_6G_MED (Study on Media aspects for 6G System) |
| Doc type | discussion |
| For action | Discussion |
| Release | Rel-20 |
| download_url | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260098.zip |
| For | Discussion |
| Type | discussion |
| Contact | Rufail Mekuria |
| Uploaded | 2026-02-03T08:50:06.013000 |
| Contact ID | 104180 |
| TDoc Status | noted |
| Reservation date | 02/02/2026 14:00:19 |
| Agenda item sort order | 60 |
[Technical] The contribution does not map the demo to any concrete Rel-20 normative work item deliverable (e.g., RTP payload format specification, SDP signaling, WebRTC integration requirements), so it is unclear what SA4 action is requested beyond a general “take into account.”
[Technical] The “Custom RTP Payload Format Design” is underspecified and not aligned with 3GPP/RTC practice: no RTP payload type name, no clock rate, no timestamping rules, no marker-bit semantics, no fragmentation/reassembly rules, and no handling of packet loss/out-of-order beyond “aiortc buffers,” which is not a spec.
[Technical] The proposed payload header fields (“Latent Shape | Hyperprior Byte Length | Latent Byte Length”) lack bit-level definition (field sizes, endianness, allowed ranges) and do not address how “Latent Shape” is encoded or negotiated, making interoperability impossible.
[Technical] Fragmentation is described as “large payloads fragmented due to MTU limitations” with aiortc appending RTP headers, but there is no defined fragmentation unit, no FU indicator/header, and no recovery behavior; relying on library behavior is not acceptable for a payload format intended for standardization.
[Technical] The document claims “RTP retransmission enabled” but does not specify whether this is RTX (RFC 4588), NACK (RFC 4585), or WebRTC-specific mechanisms, nor how the AI codec payload interacts with retransmission, FEC, or congestion control—key for real-time media feasibility.
[Technical] Congestion control is mentioned generically (“RTP packets transmitted with congestion control”) without stating which algorithm (e.g., GCC/SCReAM), what bitrate adaptation hooks exist for the AI codec, or how encoder rate control reacts to loss/jitter—critical for “real-time” claims.
[Technical] The demo’s “error resilient codec compensates for potential packet loss” contradicts the later statement “error recovery not yet implemented”; this inconsistency undermines conclusions about robustness and should be clarified with exact mechanisms implemented (if any).
[Technical] SDP negotiation support is asserted (“Enabled codec recognition during SDP negotiation”) but no SDP offer/answer examples, fmtp parameters, or MIME subtype registration approach are provided; without this, WebRTC interoperability and signaling feasibility cannot be evaluated.
[Technical] There is no discussion of packetization timing and RTP timestamp derivation for frame-by-frame neural codec output (e.g., variable frame sizes, variable encode time), which impacts jitter buffering, playout, and marker-bit usage.
[Technical] The trace analysis focuses on RTP header fields only, but does not provide quantitative results (loss rate vs. quality, latency/jitter distributions, bitrate, frame rate, retransmission overhead), so the “feasibility proven” claim is not substantiated for SA4 evaluation.
[Technical] Using bmshj2018_factorized (an image compression model) as a stand-in for a video AI codec raises questions about temporal prediction, inter-frame dependencies, and real-time constraints; the document should explain how video is handled (intra-only vs inter) and implications for packet loss and bitrate.
[Editorial] The contribution reads like an implementation report rather than a standards contribution: it lacks section references to any 3GPP spec, does not identify gaps in current specs, and does not propose specific normative text or study conclusions.
[Editorial] Terminology is inconsistent/vague (“AI codec,” “AI traffic,” “AI media delivery,” “real-time AI codec-based traffic”) and should be aligned with FS_6G_MED definitions to avoid ambiguity about whether this targets conversational video, XR, or a new media type.
[Editorial] The payload format diagram is informal and missing a figure number, field definitions, and alignment with RTP payload format conventions (e.g., “payload header,” “payload data,” optional extensions), making it hard to review or compare with existing SA4 payload formats.