# Summary of 3GPP Technical Document S4-260161

## Document Overview
This is a pCR (proposed Change Request) to 3GPP TR 26.870 introducing **Embodied Video Internet (EVI)** as a new use case for 6G Media studies. The document proposes adding a new clause 6.1 to the technical report, focusing on media requirements for embodied AI systems (robots, UAVs) that actively capture and process video in dynamic environments.

## Main Technical Contributions

### 1. Introduction to Embodied Video Internet (Clause 6.1.1)

**Core Concept:**
- Defines **Embodied AI** as integration of AI into physical systems enabling real-world interaction
- Introduces paradigm shift from **static/passive recording** to **dynamic/mobile/embodied sensing**
- Distinguishes between:
  - **Old Paradigm**: Fixed cameras with limited FOV and constrained coverage
  - **New Paradigm**: Mobile devices (robots, UAVs) as "mobile eyes and limbs" actively exploring environments

**Definition:**
- **Embodied Video**: Use of 6G networks enabling intelligent agents to capture, process, and react to visual information in real-time within dynamic environments

### 2. SA1 Use Case Analysis (Clause 6.1.2)

Extracts and summarizes four relevant use cases from TR 22.870:

#### Use Case 6.28: Network-assisted Video-based AI Inference Task Offloading

**Technical Requirements:**
- Multi-camera systems (6-8 cameras) with concurrent multi-modal data streams (video, point clouds)
- Three operational scenarios defined:
  - **Scenario I**: 6x 1080p @ 15Hz → 20 Mbps
  - **Scenario II**: 4x 1080p + 2x 4K @ 15/30Hz → 60 Mbps
  - **Scenario III**: 2x 1080p + 4x 4K @ 15/30Hz → 100 Mbps
  - **Alternative**: 4x 1080p + 2x 4K @ 60Hz
- **E2E RTT**: 100-300ms
- Compression ratio: 240:1 assumed
- Distributed AI inference tasks: multi-modal perception, 3D digital twin modeling, trajectory planning

**Media Requirements:**
- AI codec with error-tolerant capabilities (Grace method)
- Real-time processing of high-resolution video and multi-modality data
- High uplink data rate and low latency

#### Use Case 6.19: AI-based Video Analysis

**Application Context:**
- Real-time infrastructure inspection (utility poles, guardrails)
- Security surveillance
- Network offloading for resource-intensive video analysis

**Media Requirements:**
- Native integration of video analysis algorithms (object recognition, anomaly detection)
- Low latency communication

#### Use Case 6.48: Service Robot for Power Grid

**System Architecture:**
- Embedded controllers for motion control (walking, grasping) - fast response
- Network offloading for computing-intensive tasks (large AI models, control command generation)

**KPI Requirements:**

| Traffic Type | Message Size | Transfer Interval | Data Rate | E2E Latency | Reliability |
|--------------|--------------|-------------------|-----------|-------------|-------------|
| UL sensor data | 1250-12500 Bytes | 10 ms | 1-10 Mbps | 100-150 ms | 99.99% |
| UL LiDAR | 345600 Bytes | 100 ms | 27.6 Mbps | 100-150 ms | 99.99% |
| DL Control command | 625-12500 Bytes | 50 ms | 0.1-2 Mbps | - | - |

**Technical Notes:**
- LiDAR: 10 Hz frame rate, 28800 points/frame, 12 bytes/point
- E2E latency breakdown: ~40ms communication + ~100ms AI inference

**Media Requirements:**
- Real-time processing of multi-modality data (video, audio, point clouds, LiDAR)

#### Use Case 6.11: Intelligent UAV Swarms

**Operational Concept:**
- UAVs with built-in AI capabilities for enhanced perception, decision-making, control
- Swarm deployment for full area coverage and complex task execution
- Network offloading during local computing overload (e.g., HD 3D map generation)

**Media Requirements:**
- Real-time processing of multi-modality data from multiple UAVs

### 3. External Evidence - UAV Inspection Use Cases (Clause 6.1.3)

#### Table 2.1.3-1: UAV Inspection Requirements

| Use Case | Video Resolution | Data Rate | E2E Latency | Reliability |
|----------|------------------|-----------|-------------|-------------|
| Traffic surveillance | 1080p | ≥5 Mbps | <100 ms | >99.99% |
| Traffic surveillance | 4K | >25 Mbps | <100 ms | >99.99% |
| Urban management | 1080p | ≥5 Mbps | 20-100 ms | - |
| Event security | 1K | ≥5 Mbps | ≤10 ms | - |
| Event security | 4K | ≥25 Mbps | ≤10 ms | - |
| Rural inspections | 4K | ≥25 Mbps | <100 ms | - |

#### Table 2.1.3-2: UAV 3D Mapping Requirements

| Use Case | Data Type | Data Rate | E2E Latency |
|----------|-----------|-----------|-------------|
| Topographic surveying | High-res video, LiDAR | ≥30 Mbps | 20-100 ms |
| Reconstruction | 4K video | ≥50 Mbps | 20-100 ms |
| Mine monitoring | Video, LiDAR, sensor | ≥30 Mbps | 20-100 ms |
| Rural governance | High-res video, LiDAR | ≥30 Mbps | 20-100 ms |

### 4. Consolidated Requirements for 6G Media (Clause 6.1.4)

**Four Key Requirements Identified:**

1. **AI Codec Technology**
   - Error-tolerant capabilities within frames
   - Grace method for better UX vs. traditional codecs

2. **AI-native Video Protocol**
   - New protocol design for AI-driven video systems

3. **Low-latency Video Transmission**
   - Critical for real-time embodied AI operations

4. **QoE Model for Performance Measurement**
   - **User-centric parameters**: 
     - Multi-stream data types
     - Data quality
     - Accuracy and reliability of feedback results
   - **Network-centric parameters**:
     - Network delivery speed
     - Latency
     - End-to-end packet loss
     - Network usage

## Technical Significance

This pCR establishes foundational requirements for supporting embodied AI systems in 6G media, addressing:
- Multi-modal concurrent data streaming
- Real-time AI inference offloading
- High-reliability, low-latency video transmission
- Novel QoE metrics for embodied video applications
- AI-native codec and protocol requirements

The document bridges SA1 service requirements with SA4 media specifications, providing concrete KPIs and use case evidence for the FS_6G_MED study.