# Summary of S4-260094: Media Related Real-Time AI Traffic Characteristics

## Document Overview

This is a pseudo Change Request (pCR) from Huawei/HiSilicon to TSG-SA WG4 Meeting #135, proposing to add a new clause on end-to-end real-time multi-modal AI traffic characteristics to a 6G media-related TR. The document follows the methodology established in TR 26.926 for traffic modeling and quality evaluation.

## Main Objective

The document aims to characterize AI traffic for 6G use cases in real-time video conferencing and robotics by defining end-to-end architecture, procedures, content coding models, and delivery mechanisms for real-time AI inference applications.

## Technical Contributions

### End-to-End Architecture (Clause 6.2.6.X.1)

- **Core Concept**: Multimodal Large Language Models (MLM) incorporating different AI encoders/decoders for various modalities (text, image, video, audio)
- **Architecture Components**:
  - UE/client implements AI encoding and packetization
  - Application Server (AS) implements AI decoding
  - Media-related AI service request/response model
- **Key Innovation**: Introduction of "native AI data units" - a new media format generated by AI encoders that can be used for media reconstruction, generation, and comprehension
- **Compatibility Handling**: AI decoder at AS may be needed if UE's AI encoder is not compatible with AS's AI model; otherwise, encoded data can be processed directly

### Basic Procedures (Clause 6.2.6.X.2)

The document defines a 10-step call flow:
1. UE connects and provides supported AI encoder information
2. AS configures AI model and corresponding decoder
3. Operational flow includes:
   - Media data collection and AI encoding at UE
   - Packetization using native or customized packet format
   - Transmission to AS
   - Optional AI decoding at AS (if compatibility required)
   - Media-related response generation and transmission back to UE
   - Response decoding and presentation at UE

### Content Coding Model (Clause 6.2.6.X.3)

Two types of AI encoders are defined:

#### Type 1: Reconstruction-Oriented AI Encoders
- **Examples**: DVC, GRACE codec
- **GRACE Codec Details**:
  - Input: 2×H×W×3 tensors (two consecutive frames)
  - Encoder: Analyzes inter-frame differences (motion vectors and residuals), maps to compact latent representation
  - Resilience mechanism: Latent randomly split into multiple chunks, individually entropy coded to prevent error propagation
  - Decoder: Entropy decoding, latent reorganization, lost chunks set to zeros, graceful quality degradation without cliff effect

#### Type 2: AI Model Processing-Oriented Encoders
- **Examples**: VILA-U, Liquid, Chameleon, Emu3, VQGAN
- **Processing Flow**:
  - Pre-processing to predefined sizes (256×256 or 512×512 pixels, RGB)
  - Feature extraction via CNN or Transformer layers
  - Quantization to AI data units
  - Joint optimization with associated AI decoder
- **Benefits**:
  - Distributed AI workload with privacy-sensitive offloading
  - Direct AI model processing without decompression/re-encoding
  - Reduced data size, latency, and bandwidth
  - Unified format for multiple modalities

### Content Delivery Model (Clause 6.2.6.X.4)

- **Protocol Selection**: RTP over UDP for real-time delivery
- **Packetization Approaches**:

#### For Reconstruction-Oriented Encoders:
  - Latent chunks treated as NALUs
  - NALU aggregation or fragmentation for MTU (typically 1500 bytes)
  - Customized NALU headers for AI codec characteristics
  - Standard RTP/UDP/IP header structure

#### For AI Model Processing Encoders:
  - AI data units grouped as payload with customized payload header
  - Group size determined by protocol overhead and integration efficiency
  - AI data unit group limited to single IP packet size

### AI Transmission Characteristics (Clause 6.2.6.X.5)

Three key characteristics identified:

#### 1. Data Bursts and Periodicity
- Burst pattern linked to intrinsic framerate of multi-modal media
- Uplink: Periodicity matches video frame rate
- Downlink: Related to AI model inference speed
- Data rate depends on AI encoder output dimension and quantization parameters

#### 2. Low Latency Requirements
- Tight end-to-end latency for conferencing and robotics
- Network latency budget constrained by AS processing time for large AI models

#### 3. Error Resilience
- Packet success rate requirement linked to AI service characteristics
- **Error Tolerance Examples**:
  - **Autoregressive models**: Can predict missing AI data units; reasonable quality maintained even with data unit loss
  - **GRACE codec**: Trained for error resilience; maintains good SSIM with packet errors
  - **GenAI applications**: High error rates (≤20%) tolerable with UE-side recovery

#### Differentiated Importance
- **Cross-modality**: Image AI data units more error-tolerant than text
- **Intra-modality**: Positional importance for image data units (preceding units more critical than subsequent ones)

### Example KPIs for GenAI Applications

| Traffic Type | Burst Size | Max Latency | Service Bit Rate | Delay | Payload Error Rate |
|--------------|------------|-------------|------------------|-------|-------------------|
| Image GenAI | 15 KB | 15 ms | 8 Mbps | 20 ms | ≤20% |
| Video GenAI | 1.5 MB | 100 ms | 120 Mbps | 20 ms | ≤20% |
| Chatbot | 0.5 KB | 20 ms | 200 Kbps | 30 ms | ≤20% |

### Evaluation Methodology (Clause 6.2.6.X.6)

- Simulation of packet loss and jitter
- P-Trace derivation from RTP header information (Sequence Number, Timestamp, Marker Bit)
- Packet size obtained at UDP layer
- Packet arrival time recorded at receiving port
- AI service-specific quality evaluation based on successful task completion

### Summary and Network Implications (Clause 6.2.6.X.7)

The document concludes that AI traffic characteristics can be leveraged in 3GPP networks to improve transmission efficiency:
- **RAN awareness** of latency requirements, packet arrival patterns, error tolerance, and differentiated importance
- **Enhanced operations**: Improved scheduling and HARQ operations
- **System capacity**: Potential to increase supported number of UEs

## References Added

The document adds seven new normative/informative references including:
- TR 22.870 (6G Use Cases)
- TR 26.926 (Traffic Models and Quality Evaluation)
- Various academic papers on neural codecs (GRACE, Liquid, DVC)
- RP-253288 on AI services for 6G