S4-260098 Metadata - 3GPP Contribution Reviewer

Document Information

Title

demonstration of real-time ai codec transmission in WebRTC

Source

Huawei Tech.(UK) Co.. Ltd

Type

discussion

For

Discussion

Release

Rel-20

3GPP Document

View on 3GPP

TDoc	S4-260098
Title	demonstration of real-time ai codec transmission in WebRTC
Source	Huawei Tech.(UK) Co.. Ltd
Agenda item	11.1
Agenda item description	FS_6G_MED (Study on Media aspects for 6G System)
Doc type	discussion
For action	Discussion
Release	Rel-20
download_url	https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260098.zip
For	Discussion
Type	discussion
Contact	Rufail Mekuria
Uploaded	2026-02-03T08:50:06.013000
Contact ID	104180
TDoc Status	noted
Reservation date	02/02/2026 14:00:19
Agenda item sort order	60

Comments

Previous Comments:

manager

2026-02-09 04:27:45

[Technical] The contribution does not map the demo to any concrete Rel-20 normative work item deliverable (e.g., RTP payload format specification, SDP signaling, WebRTC integration requirements), so it is unclear what SA4 action is requested beyond a general “take into account.”

[Technical] The “Custom RTP Payload Format Design” is underspecified and not aligned with 3GPP/RTC practice: no RTP payload type name, no clock rate, no timestamping rules, no marker-bit semantics, no fragmentation/reassembly rules, and no handling of packet loss/out-of-order beyond “aiortc buffers,” which is not a spec.

[Technical] The proposed payload header fields (“Latent Shape | Hyperprior Byte Length | Latent Byte Length”) lack bit-level definition (field sizes, endianness, allowed ranges) and do not address how “Latent Shape” is encoded or negotiated, making interoperability impossible.

[Technical] Fragmentation is described as “large payloads fragmented due to MTU limitations” with aiortc appending RTP headers, but there is no defined fragmentation unit, no FU indicator/header, and no recovery behavior; relying on library behavior is not acceptable for a payload format intended for standardization.

[Technical] The document claims “RTP retransmission enabled” but does not specify whether this is RTX (RFC 4588), NACK (RFC 4585), or WebRTC-specific mechanisms, nor how the AI codec payload interacts with retransmission, FEC, or congestion control—key for real-time media feasibility.

[Technical] Congestion control is mentioned generically (“RTP packets transmitted with congestion control”) without stating which algorithm (e.g., GCC/SCReAM), what bitrate adaptation hooks exist for the AI codec, or how encoder rate control reacts to loss/jitter—critical for “real-time” claims.

[Technical] The demo’s “error resilient codec compensates for potential packet loss” contradicts the later statement “error recovery not yet implemented”; this inconsistency undermines conclusions about robustness and should be clarified with exact mechanisms implemented (if any).

[Technical] SDP negotiation support is asserted (“Enabled codec recognition during SDP negotiation”) but no SDP offer/answer examples, fmtp parameters, or MIME subtype registration approach are provided; without this, WebRTC interoperability and signaling feasibility cannot be evaluated.

[Technical] There is no discussion of packetization timing and RTP timestamp derivation for frame-by-frame neural codec output (e.g., variable frame sizes, variable encode time), which impacts jitter buffering, playout, and marker-bit usage.

[Technical] The trace analysis focuses on RTP header fields only, but does not provide quantitative results (loss rate vs. quality, latency/jitter distributions, bitrate, frame rate, retransmission overhead), so the “feasibility proven” claim is not substantiated for SA4 evaluation.

[Technical] Using bmshj2018_factorized (an image compression model) as a stand-in for a video AI codec raises questions about temporal prediction, inter-frame dependencies, and real-time constraints; the document should explain how video is handled (intra-only vs inter) and implications for packet loss and bitrate.

[Editorial] The contribution reads like an implementation report rather than a standards contribution: it lacks section references to any 3GPP spec, does not identify gaps in current specs, and does not propose specific normative text or study conclusions.

[Editorial] Terminology is inconsistent/vague (“AI codec,” “AI traffic,” “AI media delivery,” “real-time AI codec-based traffic”) and should be aligned with FS_6G_MED definitions to avoid ambiguity about whether this targets conversational video, XR, or a new media type.

[Editorial] The payload format diagram is informal and missing a figure number, field definitions, and alignment with RTP payload format conventions (e.g., “payload header,” “payload data,” optional extensions), making it hard to review or compare with existing SA4 payload formats.

<ol>
<li>
[Technical] The contribution does not map the demo to any concrete Rel-20 normative work item deliverable (e.g., RTP payload format specification, SDP signaling, WebRTC integration requirements), so it is unclear what SA4 action is requested beyond a general “take into account.” 
</li>
<li>
[Technical] The “Custom RTP Payload Format Design” is underspecified and not aligned with 3GPP/RTC practice: no RTP payload type name, no clock rate, no timestamping rules, no marker-bit semantics, no fragmentation/reassembly rules, and no handling of packet loss/out-of-order beyond “aiortc buffers,” which is not a spec. 
</li>
<li>
[Technical] The proposed payload header fields (“Latent Shape | Hyperprior Byte Length | Latent Byte Length”) lack bit-level definition (field sizes, endianness, allowed ranges) and do not address how “Latent Shape” is encoded or negotiated, making interoperability impossible. 
</li>
<li>
[Technical] Fragmentation is described as “large payloads fragmented due to MTU limitations” with aiortc appending RTP headers, but there is no defined fragmentation unit, no FU indicator/header, and no recovery behavior; relying on library behavior is not acceptable for a payload format intended for standardization. 
</li>
<li>
[Technical] The document claims “RTP retransmission enabled” but does not specify whether this is RTX (RFC 4588), NACK (RFC 4585), or WebRTC-specific mechanisms, nor how the AI codec payload interacts with retransmission, FEC, or congestion control—key for real-time media feasibility. 
</li>
<li>
[Technical] Congestion control is mentioned generically (“RTP packets transmitted with congestion control”) without stating which algorithm (e.g., GCC/SCReAM), what bitrate adaptation hooks exist for the AI codec, or how encoder rate control reacts to loss/jitter—critical for “real-time” claims. 
</li>
<li>
[Technical] The demo’s “error resilient codec compensates for potential packet loss” contradicts the later statement “error recovery not yet implemented”; this inconsistency undermines conclusions about robustness and should be clarified with exact mechanisms implemented (if any). 
</li>
<li>
[Technical] SDP negotiation support is asserted (“Enabled codec recognition during SDP negotiation”) but no SDP offer/answer examples, fmtp parameters, or MIME subtype registration approach are provided; without this, WebRTC interoperability and signaling feasibility cannot be evaluated. 
</li>
<li>
[Technical] There is no discussion of packetization timing and RTP timestamp derivation for frame-by-frame neural codec output (e.g., variable frame sizes, variable encode time), which impacts jitter buffering, playout, and marker-bit usage. 
</li>
<li>
[Technical] The trace analysis focuses on RTP header fields only, but does not provide quantitative results (loss rate vs. quality, latency/jitter distributions, bitrate, frame rate, retransmission overhead), so the “feasibility proven” claim is not substantiated for SA4 evaluation. 
</li>
<li>
[Technical] Using bmshj2018_factorized (an image compression model) as a stand-in for a video AI codec raises questions about temporal prediction, inter-frame dependencies, and real-time constraints; the document should explain how video is handled (intra-only vs inter) and implications for packet loss and bitrate. 
</li>
<li>
[Editorial] The contribution reads like an implementation report rather than a standards contribution: it lacks section references to any 3GPP spec, does not identify gaps in current specs, and does not propose specific normative text or study conclusions. 
</li>
<li>
[Editorial] Terminology is inconsistent/vague (“AI codec,” “AI traffic,” “AI media delivery,” “real-time AI codec-based traffic”) and should be aligned with FS_6G_MED definitions to avoid ambiguity about whether this targets conversational video, XR, or a new media type. 
</li>
<li>
[Editorial] The payload format diagram is informal and missing a figure number, field definitions, and alignment with RTP payload format conventions (e.g., “payload header,” “payload data,” optional extensions), making it hard to review or compare with existing SA4 payload formats.
</li>
</ol>

You must log in to post comment

Log In

TDoc: S4-260098