Media related real-time AI traffic Characteristics
Source: Huawei Tech.(UK) Co.. Ltd
Meeting:
TSGS4_135_India
Agenda Item:
11.1
| Agenda item description | FS_6G_MED (Study on Media aspects for 6G System) |
|---|---|
| Doc type | pCR |
| For action | Agreement |
| Release | Rel-20 |
| Specification | 26.87 |
| Version | 0.0.1 |
| Related WIs | FS_6G_MED |
| download_url | Download Original |
| For | Agreement |
| Spec | 26.87 |
| Type | pCR |
| Contact | Rufail Mekuria |
| Uploaded | 2026-02-03T08:50:05.967000 |
| Contact ID | 104180 |
| Revised to | S4aP260016 |
| TDoc Status | noted |
| Reservation date | 02/02/2026 13:39:34 |
| Agenda item sort order | 60 |
Review Comments
[Technical] The proposal introduces “native AI data units” as a new media format but does not define their syntax/semantics, timing model, or decoder interoperability requirements, making the subsequent packetization and KPI claims non-actionable and hard to align with existing 3GPP media frameworks.
[Technical] The end-to-end architecture (UE AI encoder, AS AI decoder) implicitly assumes application-layer processing but does not map to any 3GPP service-based architecture elements (e.g., AF/NEF, edge hosting, QoS flows) or clarify whether this is OTT-only; this weakens consistency with a “media-related TR” and limits how network implications can be derived.
[Technical] The “compatibility handling” statement (“AI decoder at AS may be needed if UE’s AI encoder is not compatible with AS’s AI model”) is conceptually inverted/unclear: if the AS model cannot consume the UE representation, adding a decoder alone may not resolve feature-space/model mismatch without a defined common representation or negotiated model/versioning.
[Technical] The basic procedure step “UE provides supported AI encoder information” lacks a defined signaling mechanism (SIP/SDP, HTTP APIs, 5G NAS, application protocol), negotiation parameters (model ID, version, quantization, modality set), and fallback behavior, so the call flow is incomplete for reproducible traffic characterization.
[Technical] The content delivery model reuses “NALU” terminology and H.26x-like aggregation/fragmentation for latent chunks, but does not specify an RTP payload format, header fields, fragmentation rules, or congestion control behavior; without a defined payload format, the traffic model cannot be consistently implemented or measured.
[Technical] The KPI table is internally inconsistent: e.g., “Image GenAI” burst size 15 KB with “service bit rate 8 Mbps” and “max latency 15 ms” implies a much higher instantaneous rate than 8 Mbps, while “Video GenAI” 1.5 MB burst with 120 Mbps and 100 ms similarly needs clarification of averaging window, burst periodicity, and whether uplink/downlink is meant.
[Technical] The latency discussion mixes “max latency” and “delay” columns (15 ms vs 20 ms, etc.) without defining one-way vs RTT, E2E vs network-only budget, or inclusion of AS inference time; this undermines the stated conclusion that network latency is “constrained by AS processing time.”
[Technical] The claim that ≤20% payload error rate is tolerable for “GenAI applications” is overly broad and not tied to a specific loss model (random vs burst), concealment method, modality, or task metric; for many token/feature-streaming systems, loss can be catastrophic without retransmission/FEC, so the tolerance needs qualification and evidence.
[Technical] The “differentiated importance” assertions (e.g., “preceding image data units more critical”) are plausible for some autoregressive tokenizations but not generally true for VQ/VAE-style codebooks or spatial token layouts; the document should specify which encoder families exhibit this property and how importance is signaled for scheduling.
[Technical] The evaluation methodology relies on deriving P-traces from RTP header fields, but for non-media AI payloads the timestamp/marker semantics are undefined; without a defined clock rate, frame boundary indication, and packetization rules, the trace extraction method is not robust.
[Technical] The proposal recommends RTP/UDP universally, but does not address real-time congestion control (e.g., RTP over QUIC, WebRTC congestion control, or application-layer rate adaptation) which materially affects burstiness, jitter, and loss—key characteristics the clause aims to model.
[Technical] The GRACE resilience description (“lost chunks set to zeros, graceful degradation”) is codec-specific and may not generalize; presenting it as a representative mechanism risks misleading conclusions about error propagation and HARQ/FEC needs across AI encoders.
[Editorial] Clause numbering placeholders (6.2.6.X.1 … X.7) suggest an insertion but the contribution does not indicate exact placement, dependencies, or whether it modifies existing clauses; this makes it hard to assess consistency with surrounding text and avoid duplication with TR 26.926 methodology already referenced.
[Editorial] Several terms are used without definition or with overloaded meaning (“MLM” vs common “multimodal LLM,” “AI data unit,” “native/customized packet format,” “service bit rate”), and the document would benefit from a short terminology subclause to prevent ambiguity.
[Editorial] The added references include academic papers and RP material, but it is unclear which are intended as normative vs informative and whether they meet 3GPP referencing rules; the contribution should justify why each reference is required for the TR text rather than background reading.