S4-260097

Embodied AI use case and related requirements

Source: Huawei Tech.(UK) Co.. Ltd
Meeting: TSGS4_135_India
Agenda Item: 11.1

All Metadata

Agenda item description	FS_6G_MED (Study on Media aspects for 6G System)
Doc type	pCR
For action	Agreement
Release	Rel-20
Specification	26.87
Version	0.0.1
Related WIs	FS_6G_MED
download_url	Download Original
For	Agreement
Spec	26.87
Type	pCR
Contact	Rufail Mekuria
Uploaded	2026-02-03T08:50:06.013000
Contact ID	104180
TDoc Status	noted
Reservation date	02/02/2026 13:48:07
Agenda item sort order	60

Review Comments

manager - 2026-02-09 04:27

[Technical] The claimed uplink “peak data rates: 20–100 Mbit for 6–8 cameras using 3GPP codecs (e.g., HEVC)” is not substantiated with camera resolution/FPS/bitrate assumptions and may be inconsistent with TR 22.870 unless the exact referenced clause/text is quoted; add explicit parameterization (e.g., 1080p/4K, 30/60 fps, number of concurrent streams) and clarify whether this is per-UE peak, per-robot, or per-session aggregate.

[Technical] “Ultra-low latency” and “error resilience” are repeatedly asserted for real-time navigation, but no concrete latency/jitter/reliability targets (e.g., E2E, UL one-way, packet loss tolerance) are proposed, making the requirement non-actionable for FS_6G_MED and inconsistent with how SA requirements are normally expressed.

[Technical] The proposed clause mixes AI research task descriptions/metrics (surprisal, IoU+/IoU-, SPL, etc.) with 3GPP service requirements, but it never maps these metrics to communication KPIs (latency, throughput, reliability, synchronization), so the added text risks being non-normative narrative rather than requirements usable by RAN/CT.

[Technical] The document assumes cloud/server inference as the dominant architecture (“AI processing may occur at cloud/server”), but does not address split inference options (on-device, edge, hybrid) and the resulting different UL/DL traffic patterns (e.g., downlink control/trajectory updates), which is critical for embodied control loops.

[Technical] Traffic characterization is incomplete: it states “bursty” uplink but omits key properties needed for system design (packet sizes, periodicity, concurrency across sensors, synchronization between multi-modal streams, and whether traffic is constant bitrate video vs event-driven keyframes/point clouds).

[Technical] Multi-modal data is mentioned (video, point clouds, embeddings), but the proposal does not specify whether point clouds are LiDAR-like, depth maps, or reconstructed meshes, nor their typical rates; this omission can materially change the bandwidth/latency conclusions.

[Technical] The “Transmission format” table introduces MPEG VCM/FCM and JPEG AI, but it does not explain how these would be carried in 3GPP (e.g., media framework, application layer, QoS handling) or what network feature is actually required (e.g., generic support for opaque payloads vs specific codec awareness).

[Technical] The statement “efficient transmission support needed” for proprietary embeddings/tokenizers is vague and risks implying new 3GPP standardization of AI feature codecs without scoping; it should instead identify concrete network enablers (e.g., QoS, prioritization, segmentation/reassembly, loss protection) independent of payload semantics.

[Technical] The rationale for cloud offloading (“keep robots simple/light”, “centralize AI for multiple robots”) is plausible but one-sided; it ignores privacy/safety/regulatory constraints and local fallback requirements, which are often decisive for medical/industrial deployments and should be reflected as requirements (e.g., local autonomy under connectivity loss).

[Editorial] The proposed new clause numbering is inconsistent in the summary (“new clause 4.2.2.X” vs “based on the proposed text in clause 8”); the contribution should clearly identify the target TR, exact clause location, and provide the actual proposed text with proper numbering.

[Editorial] References to prior work are too loose (“TR 22.870 clause 6.28”, “SA4#134 (S4-251826)”) without quoting the baseline text being extended; for a change proposal, the delta versus existing TR wording should be explicit to avoid duplication or contradiction.

[Editorial] Several terms are undefined or used inconsistently for 3GPP context (“mobile embodied sensors”, “ultra-low latency”, “error resilience”, “cloud/server”, “gateway”), and the clause should add definitions or align to existing 3GPP terminology (UE, edge, DN, application server, URLLC-like requirements).

[Editorial] The document includes vendor/industry examples (e.g., NVIDIA Isaac GR00T, ITU-T SG21 workshop) that are not necessary for TR requirements text and may be inappropriate in 3GPP specifications; keep background in the contribution but avoid embedding such references in proposed TR clause text.

[Technical] The proposal focuses almost exclusively on uplink, but embodied AI control typically requires timely downlink (commands, maps, model updates) and possibly sidelink/robot-to-robot coordination; omitting DL/bi-directional requirements may lead to an incomplete requirement set for FS_6G_MED.

<ol>
<li>
[Technical] The claimed uplink “peak data rates: 20–100 Mbit for 6–8 cameras using 3GPP codecs (e.g., HEVC)” is not substantiated with camera resolution/FPS/bitrate assumptions and may be inconsistent with TR 22.870 unless the exact referenced clause/text is quoted; add explicit parameterization (e.g., 1080p/4K, 30/60 fps, number of concurrent streams) and clarify whether this is per-UE peak, per-robot, or per-session aggregate.
</li>
<li>
[Technical] “Ultra-low latency” and “error resilience” are repeatedly asserted for real-time navigation, but no concrete latency/jitter/reliability targets (e.g., E2E, UL one-way, packet loss tolerance) are proposed, making the requirement non-actionable for FS_6G_MED and inconsistent with how SA requirements are normally expressed.
</li>
<li>
[Technical] The proposed clause mixes AI research task descriptions/metrics (surprisal, IoU+/IoU-, SPL, etc.) with 3GPP service requirements, but it never maps these metrics to communication KPIs (latency, throughput, reliability, synchronization), so the added text risks being non-normative narrative rather than requirements usable by RAN/CT.
</li>
<li>
[Technical] The document assumes cloud/server inference as the dominant architecture (“AI processing may occur at cloud/server”), but does not address split inference options (on-device, edge, hybrid) and the resulting different UL/DL traffic patterns (e.g., downlink control/trajectory updates), which is critical for embodied control loops.
</li>
<li>
[Technical] Traffic characterization is incomplete: it states “bursty” uplink but omits key properties needed for system design (packet sizes, periodicity, concurrency across sensors, synchronization between multi-modal streams, and whether traffic is constant bitrate video vs event-driven keyframes/point clouds).
</li>
<li>
[Technical] Multi-modal data is mentioned (video, point clouds, embeddings), but the proposal does not specify whether point clouds are LiDAR-like, depth maps, or reconstructed meshes, nor their typical rates; this omission can materially change the bandwidth/latency conclusions.
</li>
<li>
[Technical] The “Transmission format” table introduces MPEG VCM/FCM and JPEG AI, but it does not explain how these would be carried in 3GPP (e.g., media framework, application layer, QoS handling) or what network feature is actually required (e.g., generic support for opaque payloads vs specific codec awareness).
</li>
<li>
[Technical] The statement “efficient transmission support needed” for proprietary embeddings/tokenizers is vague and risks implying new 3GPP standardization of AI feature codecs without scoping; it should instead identify concrete network enablers (e.g., QoS, prioritization, segmentation/reassembly, loss protection) independent of payload semantics.
</li>
<li>
[Technical] The rationale for cloud offloading (“keep robots simple/light”, “centralize AI for multiple robots”) is plausible but one-sided; it ignores privacy/safety/regulatory constraints and local fallback requirements, which are often decisive for medical/industrial deployments and should be reflected as requirements (e.g., local autonomy under connectivity loss).
</li>
<li>
[Editorial] The proposed new clause numbering is inconsistent in the summary (“new clause 4.2.2.X” vs “based on the proposed text in clause 8”); the contribution should clearly identify the target TR, exact clause location, and provide the actual proposed text with proper numbering.
</li>
<li>
[Editorial] References to prior work are too loose (“TR 22.870 clause 6.28”, “SA4#134 (S4-251826)”) without quoting the baseline text being extended; for a change proposal, the delta versus existing TR wording should be explicit to avoid duplication or contradiction.
</li>
<li>
[Editorial] Several terms are undefined or used inconsistently for 3GPP context (“mobile embodied sensors”, “ultra-low latency”, “error resilience”, “cloud/server”, “gateway”), and the clause should add definitions or align to existing 3GPP terminology (UE, edge, DN, application server, URLLC-like requirements).
</li>
<li>
[Editorial] The document includes vendor/industry examples (e.g., NVIDIA Isaac GR00T, ITU-T SG21 workshop) that are not necessary for TR requirements text and may be inappropriate in 3GPP specifications; keep background in the contribution but avoid embedding such references in proposed TR clause text.
</li>
<li>
[Technical] The proposal focuses almost exclusively on uplink, but embodied AI control typically requires timely downlink (commands, maps, model updates) and possibly sidelink/robot-to-robot coordination; omitting DL/bi-directional requirements may lead to an incomplete requirement set for FS_6G_MED.
</li>
</ol>