Meeting: TSGS4_135_India | Agenda Item: 11.1
Embodied AI use case and related requirements
Huawei Tech.(UK) Co.. Ltd
pCR
Agreement
| TDoc | S4-260097 |
| Title | Embodied AI use case and related requirements |
| Source | Huawei Tech.(UK) Co.. Ltd |
| Agenda item | 11.1 |
| Agenda item description | FS_6G_MED (Study on Media aspects for 6G System) |
| Doc type | pCR |
| For action | Agreement |
| Release | Rel-20 |
| Specification | 26.87 |
| Version | 0.0.1 |
| Related WIs | FS_6G_MED |
| download_url | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260097.zip |
| For | Agreement |
| Spec | 26.87 |
| Type | pCR |
| Contact | Rufail Mekuria |
| Uploaded | 2026-02-03T08:50:06.013000 |
| Contact ID | 104180 |
| TDoc Status | noted |
| Reservation date | 02/02/2026 13:48:07 |
| Agenda item sort order | 60 |
[Technical] The claimed uplink “peak data rates: 20–100 Mbit for 6–8 cameras using 3GPP codecs (e.g., HEVC)” is not substantiated with camera resolution/FPS/bitrate assumptions and may be inconsistent with TR 22.870 unless the exact referenced clause/text is quoted; add explicit parameterization (e.g., 1080p/4K, 30/60 fps, number of concurrent streams) and clarify whether this is per-UE peak, per-robot, or per-session aggregate.
[Technical] “Ultra-low latency” and “error resilience” are repeatedly asserted for real-time navigation, but no concrete latency/jitter/reliability targets (e.g., E2E, UL one-way, packet loss tolerance) are proposed, making the requirement non-actionable for FS_6G_MED and inconsistent with how SA requirements are normally expressed.
[Technical] The proposed clause mixes AI research task descriptions/metrics (surprisal, IoU+/IoU-, SPL, etc.) with 3GPP service requirements, but it never maps these metrics to communication KPIs (latency, throughput, reliability, synchronization), so the added text risks being non-normative narrative rather than requirements usable by RAN/CT.
[Technical] The document assumes cloud/server inference as the dominant architecture (“AI processing may occur at cloud/server”), but does not address split inference options (on-device, edge, hybrid) and the resulting different UL/DL traffic patterns (e.g., downlink control/trajectory updates), which is critical for embodied control loops.
[Technical] Traffic characterization is incomplete: it states “bursty” uplink but omits key properties needed for system design (packet sizes, periodicity, concurrency across sensors, synchronization between multi-modal streams, and whether traffic is constant bitrate video vs event-driven keyframes/point clouds).
[Technical] Multi-modal data is mentioned (video, point clouds, embeddings), but the proposal does not specify whether point clouds are LiDAR-like, depth maps, or reconstructed meshes, nor their typical rates; this omission can materially change the bandwidth/latency conclusions.
[Technical] The “Transmission format” table introduces MPEG VCM/FCM and JPEG AI, but it does not explain how these would be carried in 3GPP (e.g., media framework, application layer, QoS handling) or what network feature is actually required (e.g., generic support for opaque payloads vs specific codec awareness).
[Technical] The statement “efficient transmission support needed” for proprietary embeddings/tokenizers is vague and risks implying new 3GPP standardization of AI feature codecs without scoping; it should instead identify concrete network enablers (e.g., QoS, prioritization, segmentation/reassembly, loss protection) independent of payload semantics.
[Technical] The rationale for cloud offloading (“keep robots simple/light”, “centralize AI for multiple robots”) is plausible but one-sided; it ignores privacy/safety/regulatory constraints and local fallback requirements, which are often decisive for medical/industrial deployments and should be reflected as requirements (e.g., local autonomy under connectivity loss).
[Editorial] The proposed new clause numbering is inconsistent in the summary (“new clause 4.2.2.X” vs “based on the proposed text in clause 8”); the contribution should clearly identify the target TR, exact clause location, and provide the actual proposed text with proper numbering.
[Editorial] References to prior work are too loose (“TR 22.870 clause 6.28”, “SA4#134 (S4-251826)”) without quoting the baseline text being extended; for a change proposal, the delta versus existing TR wording should be explicit to avoid duplication or contradiction.
[Editorial] Several terms are undefined or used inconsistently for 3GPP context (“mobile embodied sensors”, “ultra-low latency”, “error resilience”, “cloud/server”, “gateway”), and the clause should add definitions or align to existing 3GPP terminology (UE, edge, DN, application server, URLLC-like requirements).
[Editorial] The document includes vendor/industry examples (e.g., NVIDIA Isaac GR00T, ITU-T SG21 workshop) that are not necessary for TR requirements text and may be inappropriate in 3GPP specifications; keep background in the contribution but avoid embedding such references in proposed TR clause text.
[Technical] The proposal focuses almost exclusively on uplink, but embodied AI control typically requires timely downlink (commands, maps, model updates) and possibly sidelink/robot-to-robot coordination; omitting DL/bi-directional requirements may lead to an incomplete requirement set for FS_6G_MED.