S4-260094

Media related real-time AI traffic Characteristics

Source: Huawei Tech.(UK) Co.. Ltd
Meeting: TSGS4_135_India
Agenda Item: 11.1

All Metadata

Agenda item description	FS_6G_MED (Study on Media aspects for 6G System)
Doc type	pCR
For action	Agreement
Release	Rel-20
Specification	26.87
Version	0.0.1
Related WIs	FS_6G_MED
download_url	Download Original
For	Agreement
Spec	26.87
Type	pCR
Contact	Rufail Mekuria
Uploaded	2026-02-03T08:50:05.967000
Contact ID	104180
Revised to	S4aP260016
TDoc Status	noted
Reservation date	02/02/2026 13:39:34
Agenda item sort order	60

Review Comments

manager - 2026-02-09 04:24

[Technical] The proposal introduces “native AI data units” as a new media format but does not define their syntax/semantics, timing model, or decoder interoperability requirements, making the subsequent packetization and KPI claims non-actionable and hard to align with existing 3GPP media frameworks.

[Technical] The end-to-end architecture (UE AI encoder, AS AI decoder) implicitly assumes application-layer processing but does not map to any 3GPP service-based architecture elements (e.g., AF/NEF, edge hosting, QoS flows) or clarify whether this is OTT-only; this weakens consistency with a “media-related TR” and limits how network implications can be derived.

[Technical] The “compatibility handling” statement (“AI decoder at AS may be needed if UE’s AI encoder is not compatible with AS’s AI model”) is conceptually inverted/unclear: if the AS model cannot consume the UE representation, adding a decoder alone may not resolve feature-space/model mismatch without a defined common representation or negotiated model/versioning.

[Technical] The basic procedure step “UE provides supported AI encoder information” lacks a defined signaling mechanism (SIP/SDP, HTTP APIs, 5G NAS, application protocol), negotiation parameters (model ID, version, quantization, modality set), and fallback behavior, so the call flow is incomplete for reproducible traffic characterization.

[Technical] The content delivery model reuses “NALU” terminology and H.26x-like aggregation/fragmentation for latent chunks, but does not specify an RTP payload format, header fields, fragmentation rules, or congestion control behavior; without a defined payload format, the traffic model cannot be consistently implemented or measured.

[Technical] The KPI table is internally inconsistent: e.g., “Image GenAI” burst size 15 KB with “service bit rate 8 Mbps” and “max latency 15 ms” implies a much higher instantaneous rate than 8 Mbps, while “Video GenAI” 1.5 MB burst with 120 Mbps and 100 ms similarly needs clarification of averaging window, burst periodicity, and whether uplink/downlink is meant.

[Technical] The latency discussion mixes “max latency” and “delay” columns (15 ms vs 20 ms, etc.) without defining one-way vs RTT, E2E vs network-only budget, or inclusion of AS inference time; this undermines the stated conclusion that network latency is “constrained by AS processing time.”

[Technical] The claim that ≤20% payload error rate is tolerable for “GenAI applications” is overly broad and not tied to a specific loss model (random vs burst), concealment method, modality, or task metric; for many token/feature-streaming systems, loss can be catastrophic without retransmission/FEC, so the tolerance needs qualification and evidence.

[Technical] The “differentiated importance” assertions (e.g., “preceding image data units more critical”) are plausible for some autoregressive tokenizations but not generally true for VQ/VAE-style codebooks or spatial token layouts; the document should specify which encoder families exhibit this property and how importance is signaled for scheduling.

[Technical] The evaluation methodology relies on deriving P-traces from RTP header fields, but for non-media AI payloads the timestamp/marker semantics are undefined; without a defined clock rate, frame boundary indication, and packetization rules, the trace extraction method is not robust.

[Technical] The proposal recommends RTP/UDP universally, but does not address real-time congestion control (e.g., RTP over QUIC, WebRTC congestion control, or application-layer rate adaptation) which materially affects burstiness, jitter, and loss—key characteristics the clause aims to model.

[Technical] The GRACE resilience description (“lost chunks set to zeros, graceful degradation”) is codec-specific and may not generalize; presenting it as a representative mechanism risks misleading conclusions about error propagation and HARQ/FEC needs across AI encoders.

[Editorial] Clause numbering placeholders (6.2.6.X.1 … X.7) suggest an insertion but the contribution does not indicate exact placement, dependencies, or whether it modifies existing clauses; this makes it hard to assess consistency with surrounding text and avoid duplication with TR 26.926 methodology already referenced.

[Editorial] Several terms are used without definition or with overloaded meaning (“MLM” vs common “multimodal LLM,” “AI data unit,” “native/customized packet format,” “service bit rate”), and the document would benefit from a short terminology subclause to prevent ambiguity.

[Editorial] The added references include academic papers and RP material, but it is unclear which are intended as normative vs informative and whether they meet 3GPP referencing rules; the contribution should justify why each reference is required for the TR text rather than background reading.

<ol>
<li>
[Technical] The proposal introduces “native AI data units” as a new media format but does not define their syntax/semantics, timing model, or decoder interoperability requirements, making the subsequent packetization and KPI claims non-actionable and hard to align with existing 3GPP media frameworks. 
</li>
<li>
[Technical] The end-to-end architecture (UE AI encoder, AS AI decoder) implicitly assumes application-layer processing but does not map to any 3GPP service-based architecture elements (e.g., AF/NEF, edge hosting, QoS flows) or clarify whether this is OTT-only; this weakens consistency with a “media-related TR” and limits how network implications can be derived. 
</li>
<li>
[Technical] The “compatibility handling” statement (“AI decoder at AS may be needed if UE’s AI encoder is not compatible with AS’s AI model”) is conceptually inverted/unclear: if the AS model cannot consume the UE representation, adding a decoder alone may not resolve feature-space/model mismatch without a defined common representation or negotiated model/versioning. 
</li>
<li>
[Technical] The basic procedure step “UE provides supported AI encoder information” lacks a defined signaling mechanism (SIP/SDP, HTTP APIs, 5G NAS, application protocol), negotiation parameters (model ID, version, quantization, modality set), and fallback behavior, so the call flow is incomplete for reproducible traffic characterization. 
</li>
<li>
[Technical] The content delivery model reuses “NALU” terminology and H.26x-like aggregation/fragmentation for latent chunks, but does not specify an RTP payload format, header fields, fragmentation rules, or congestion control behavior; without a defined payload format, the traffic model cannot be consistently implemented or measured. 
</li>
<li>
[Technical] The KPI table is internally inconsistent: e.g., “Image GenAI” burst size 15 KB with “service bit rate 8 Mbps” and “max latency 15 ms” implies a much higher instantaneous rate than 8 Mbps, while “Video GenAI” 1.5 MB burst with 120 Mbps and 100 ms similarly needs clarification of averaging window, burst periodicity, and whether uplink/downlink is meant. 
</li>
<li>
[Technical] The latency discussion mixes “max latency” and “delay” columns (15 ms vs 20 ms, etc.) without defining one-way vs RTT, E2E vs network-only budget, or inclusion of AS inference time; this undermines the stated conclusion that network latency is “constrained by AS processing time.” 
</li>
<li>
[Technical] The claim that ≤20% payload error rate is tolerable for “GenAI applications” is overly broad and not tied to a specific loss model (random vs burst), concealment method, modality, or task metric; for many token/feature-streaming systems, loss can be catastrophic without retransmission/FEC, so the tolerance needs qualification and evidence. 
</li>
<li>
[Technical] The “differentiated importance” assertions (e.g., “preceding image data units more critical”) are plausible for some autoregressive tokenizations but not generally true for VQ/VAE-style codebooks or spatial token layouts; the document should specify which encoder families exhibit this property and how importance is signaled for scheduling. 
</li>
<li>
[Technical] The evaluation methodology relies on deriving P-traces from RTP header fields, but for non-media AI payloads the timestamp/marker semantics are undefined; without a defined clock rate, frame boundary indication, and packetization rules, the trace extraction method is not robust. 
</li>
<li>
[Technical] The proposal recommends RTP/UDP universally, but does not address real-time congestion control (e.g., RTP over QUIC, WebRTC congestion control, or application-layer rate adaptation) which materially affects burstiness, jitter, and loss—key characteristics the clause aims to model. 
</li>
<li>
[Technical] The GRACE resilience description (“lost chunks set to zeros, graceful degradation”) is codec-specific and may not generalize; presenting it as a representative mechanism risks misleading conclusions about error propagation and HARQ/FEC needs across AI encoders. 
</li>
<li>
[Editorial] Clause numbering placeholders (6.2.6.X.1 … X.7) suggest an insertion but the contribution does not indicate exact placement, dependencies, or whether it modifies existing clauses; this makes it hard to assess consistency with surrounding text and avoid duplication with TR 26.926 methodology already referenced. 
</li>
<li>
[Editorial] Several terms are used without definition or with overloaded meaning (“MLM” vs common “multimodal LLM,” “AI data unit,” “native/customized packet format,” “service bit rate”), and the document would benefit from a short terminology subclause to prevent ambiguity. 
</li>
<li>
[Editorial] The added references include academic papers and RP material, but it is unclear which are intended as normative vs informative and whether they meet 3GPP referencing rules; the contribution should justify why each reference is required for the TR text rather than background reading.
</li>
</ol>