[FS_6G_MED]pCR on Embodied Video for 6G Media
This is a pCR (proposed Change Request) to 3GPP TR 26.870 introducing Embodied Video Internet (EVI) as a new use case for 6G Media studies. The document proposes adding a new clause 6.1 to the technical report, focusing on media requirements for embodied AI systems (robots, UAVs) that actively capture and process video in dynamic environments.
Core Concept:
- Defines Embodied AI as integration of AI into physical systems enabling real-world interaction
- Introduces paradigm shift from static/passive recording to dynamic/mobile/embodied sensing
- Distinguishes between:
- Old Paradigm: Fixed cameras with limited FOV and constrained coverage
- New Paradigm: Mobile devices (robots, UAVs) as "mobile eyes and limbs" actively exploring environments
Definition:
- Embodied Video: Use of 6G networks enabling intelligent agents to capture, process, and react to visual information in real-time within dynamic environments
Extracts and summarizes four relevant use cases from TR 22.870:
Technical Requirements:
- Multi-camera systems (6-8 cameras) with concurrent multi-modal data streams (video, point clouds)
- Three operational scenarios defined:
- Scenario I: 6x 1080p @ 15Hz → 20 Mbps
- Scenario II: 4x 1080p + 2x 4K @ 15/30Hz → 60 Mbps
- Scenario III: 2x 1080p + 4x 4K @ 15/30Hz → 100 Mbps
- Alternative: 4x 1080p + 2x 4K @ 60Hz
- E2E RTT: 100-300ms
- Compression ratio: 240:1 assumed
- Distributed AI inference tasks: multi-modal perception, 3D digital twin modeling, trajectory planning
Media Requirements:
- AI codec with error-tolerant capabilities (Grace method)
- Real-time processing of high-resolution video and multi-modality data
- High uplink data rate and low latency
Application Context:
- Real-time infrastructure inspection (utility poles, guardrails)
- Security surveillance
- Network offloading for resource-intensive video analysis
Media Requirements:
- Native integration of video analysis algorithms (object recognition, anomaly detection)
- Low latency communication
System Architecture:
- Embedded controllers for motion control (walking, grasping) - fast response
- Network offloading for computing-intensive tasks (large AI models, control command generation)
KPI Requirements:
| Traffic Type | Message Size | Transfer Interval | Data Rate | E2E Latency | Reliability |
|--------------|--------------|-------------------|-----------|-------------|-------------|
| UL sensor data | 1250-12500 Bytes | 10 ms | 1-10 Mbps | 100-150 ms | 99.99% |
| UL LiDAR | 345600 Bytes | 100 ms | 27.6 Mbps | 100-150 ms | 99.99% |
| DL Control command | 625-12500 Bytes | 50 ms | 0.1-2 Mbps | - | - |
Technical Notes:
- LiDAR: 10 Hz frame rate, 28800 points/frame, 12 bytes/point
- E2E latency breakdown: ~40ms communication + ~100ms AI inference
Media Requirements:
- Real-time processing of multi-modality data (video, audio, point clouds, LiDAR)
Operational Concept:
- UAVs with built-in AI capabilities for enhanced perception, decision-making, control
- Swarm deployment for full area coverage and complex task execution
- Network offloading during local computing overload (e.g., HD 3D map generation)
Media Requirements:
- Real-time processing of multi-modality data from multiple UAVs
| Use Case | Video Resolution | Data Rate | E2E Latency | Reliability |
|----------|------------------|-----------|-------------|-------------|
| Traffic surveillance | 1080p | ≥5 Mbps | <100 ms | >99.99% |
| Traffic surveillance | 4K | >25 Mbps | <100 ms | >99.99% |
| Urban management | 1080p | ≥5 Mbps | 20-100 ms | - |
| Event security | 1K | ≥5 Mbps | ≤10 ms | - |
| Event security | 4K | ≥25 Mbps | ≤10 ms | - |
| Rural inspections | 4K | ≥25 Mbps | <100 ms | - |
| Use Case | Data Type | Data Rate | E2E Latency |
|----------|-----------|-----------|-------------|
| Topographic surveying | High-res video, LiDAR | ≥30 Mbps | 20-100 ms |
| Reconstruction | 4K video | ≥50 Mbps | 20-100 ms |
| Mine monitoring | Video, LiDAR, sensor | ≥30 Mbps | 20-100 ms |
| Rural governance | High-res video, LiDAR | ≥30 Mbps | 20-100 ms |
Four Key Requirements Identified:
Grace method for better UX vs. traditional codecs
AI-native Video Protocol
New protocol design for AI-driven video systems
Low-latency Video Transmission
Critical for real-time embodied AI operations
QoE Model for Performance Measurement
This pCR establishes foundational requirements for supporting embodied AI systems in 6G media, addressing:
- Multi-modal concurrent data streaming
- Real-time AI inference offloading
- High-reliability, low-latency video transmission
- Novel QoE metrics for embodied video applications
- AI-native codec and protocol requirements
The document bridges SA1 service requirements with SA4 media specifications, providing concrete KPIs and use case evidence for the FS_6G_MED study.