Meeting: TSGS4_135_India | Agenda Item: 11.1
[FS_6G_MED]pCR on Embodied Video for 6G Media
China Mobile Com. Corporation
pCR
Agreement
| TDoc | S4-260161 |
| Title | [FS_6G_MED]pCR on Embodied Video for 6G Media |
| Source | China Mobile Com. Corporation |
| Agenda item | 11.1 |
| Agenda item description | FS_6G_MED (Study on Media aspects for 6G System) |
| Doc type | pCR |
| For action | Agreement |
| Release | Rel-20 |
| Specification | 26.87 |
| Version | 0.0.1 |
| Related WIs | FS_6G_MED |
| download_url | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260161.zip |
| For | Agreement |
| Spec | 26.87 |
| Type | pCR |
| Contact | Jiayi Xu |
| Uploaded | 2026-02-03T12:59:10.947000 |
| Contact ID | 89460 |
| TDoc Status | noted |
| Reservation date | 03/02/2026 12:50:08 |
| Agenda item sort order | 60 |
[Technical] The proposed new use case “Embodied Video Internet (EVI)” is not clearly mapped to the existing TR 26.870 study objectives and terminology; it reads like a new umbrella concept rather than a media-centric use case, and the CR should explicitly justify why it belongs in TR 26.870 (SA4 media study) versus remaining in SA1/SA2 domain.
[Technical] Several KPI values appear internally inconsistent or insufficiently specified for media work: e.g., “6x 1080p @ 15Hz → 20 Mbps” and “compression ratio 240:1” are asserted without stating codec, chroma format, bit depth, target quality, or whether “Hz” means fps, making the derived bitrates non-reproducible and potentially misleading for SA4 conclusions.
[Technical] The latency requirement “E2E RTT 100–300 ms” for “real-time” embodied control/offloading is not reconciled with the tighter control-loop needs implied elsewhere (e.g., 10 ms sensor intervals, motion control), and the text should distinguish clearly between (a) media transport latency for perception streams and (b) closed-loop control latency/reliability requirements.
[Technical] The contribution mixes “video” requirements with non-media payloads (LiDAR, point clouds, sensor data) but does not define the scope boundary for TR 26.870 (media codecs/protocols/QoE); without scoping, the clause risks driving requirements that are more appropriate for generic data transport or edge computing studies.
[Technical] “AI codec with error-tolerant capabilities (Grace method)” is introduced as a requirement but is not defined, referenced, or aligned with ongoing 3GPP/MPEG terminology (e.g., neural codecs, feature/latent compression, ROI coding); as written it is not actionable and could conflict with existing codec evaluation frameworks.
[Technical] “AI-native Video Protocol” is proposed as a key requirement without identifying what is missing in existing protocol stacks (RTP/RTCP, QUIC, DASH/CMAF, WebRTC, 5G media streaming) or what protocol functions are uniquely required (e.g., semantic prioritization, multi-stream synchronization, in-network adaptation), so it reads as a vague solution statement rather than a requirement.
[Technical] Reliability targets such as “>99.99%” are stated for UAV inspection and robot sensor/LiDAR traffic without defining the reliability metric (packet success probability, frame delivery, application-level inference success, within what time bound), which is critical for translating to media-layer mechanisms.
[Technical] The UAV “event security” latency “≤10 ms” for 1K/4K video at ≥5/≥25 Mbps is extremely stringent and likely infeasible end-to-end for typical video pipelines (capture/encode/packetize/decode/render), unless it refers to one-way transport only; the clause should clarify the latency definition and include processing components or explicitly exclude them.
[Technical] Multi-camera scenarios (6–8 cameras, mixed 1080p/4K, 15/30/60 fps) are listed, but there is no requirement discussion on synchronization (inter-camera time alignment), multi-stream correlation, or joint encoding/transport, which are central media issues for embodied perception and 3D reconstruction.
[Technical] The “QoE model” section is too generic and user-centric for a machine-consumer scenario; embodied AI often optimizes task success (e.g., mAP, tracking stability, control error) rather than human QoE, so the clause should introduce task-oriented QoS/QoE (QoTask) metrics and how they relate to media impairments.
[Technical] The contribution cites SA1 TR 22.870 use cases but does not ensure consistent numbering/traceability (e.g., “Use Case 6.28/6.19/6.48/6.11”) to the exact clauses/tables in TR 22.870; without precise references, the extracted KPIs risk being challenged as non-authoritative.
[Editorial] Clause/table numbering appears inconsistent in the summary (“Table 2.1.3-1/2.1.3-2” under “Clause 6.1.3”), suggesting the CR may introduce numbering conflicts or incorrect cross-references in TR 26.870; numbering should follow the target document’s clause structure.
[Editorial] Terms are used inconsistently or non-standardly (“15Hz” for frame rate, “E2E RTT” vs “E2E latency”, “1K” resolution), and should be normalized to 3GPP style (fps, one-way latency vs RTT, explicit pixel dimensions).
[Editorial] The text frequently shifts from requirements to solution proposals (“new protocol design”, “AI codec technology”) without using normative/requirements language appropriate for a TR study clause (e.g., “may need”, “is expected to”), which could be seen as over-prescriptive for a study item.