# Summary of S4-260234: 6GMedia - Characteristics of AI-enabled Applications

## Introduction

This contribution from InterDigital addresses work topic 2 of the 6GMedia study, focusing on key characteristics of XR and AI-enabled mobile applications and services. The document proposes use cases and elaborates on requirements for interoperable and widespread deployment.

## Use Cases

The document identifies several representative use cases:

- **AR applications**: Require AI-based Spatial Computing functions (segmentation, semantic perception) for virtual content insertion in real environments (TR 26.819)
- **Personalized interactive immersive guided tour**: Requires AI inference for proper virtual content placement in fast-evolving real environments (TR 22.870, clause 9.12)
- **Video/image analysis**: Requires remote AI processing with adaptive upstream video quality adjustments (TR 22.870, clause 6.28)
- **Conversational services**: Uses AI for real-time translation and media transformation (TR 22.870, clause 6.42)
- **Context-aware recommendation**: Uses generative AI for environment-related queries (TR 22.870, clause 6.3)
- **AI model training/transfer/update**: Requires transmission of AI data including training data, models, and inference data (TR 22.870, clause 6.25)

## Technical Contributions

### 3.1 Heterogeneous and Multimodal Mobile Applications and Services

**Key Observations:**
- AI-enabled applications are highly heterogeneous and multimodal, encompassing video, image, audio, text, haptics, and sensor data
- Applications exchange AI/ML data including prompts, model parameters, and compressed/uncompressed intermediate data (embeddings)

**Table 1 Analysis** provides detailed mapping of:
- **AR**: UL (video, audio, prompt, inference data) / DL (video, audio, dynamic 3D media, haptics, spatial descriptions) - requires MPEG haptics, scene description enhancements, dynamic mesh/gaussian splat codecs
- **Real-time Object Detection**: Feature representations, MPEG-7 descriptors, MPEG FCM
- **Speech Recognition/Conversational AI**: ULBC, tokens, embeddings
- **Model Learning/Updates**: ONNX, GGUF, MPEG NNC formats
- **Avatar communication**: Upcoming MPEG avatar, gaussian and mesh codecs
- **Context-aware recommendation**: W3C Media Annotations, MPEG-7 descriptors

**Proposals:**
- **Proposition 1**: SA4 should study support of additional media modalities and codecs/enhancements for 6G
- **Proposition 2**: SA4 should define terminology for AI/ML data (features, tokens, embeddings, latent, intent) and study relevant AI representation formats and interchangeable formats/codecs
- **Observation 2**: Some applications require remote AI-based Spatial Computing functions (TR 26.819)
- **Proposition 3**: SA4 should identify and study spatial compute functions benefiting from off-device processing

### 3.2 QoS Granularity and QoE-driven Dynamic Media Adaptation

**Traffic Characteristics:**
- Applications are uplink-heavy with greatly varying characteristics across modalities
- Continuous video capture results in high-rate, periodic uplink traffic
- Audio/sensor data generates lower-rate, aperiodic, bursty transmissions
- Traffic composition changes dynamically based on user behavior, interaction patterns, mobility, and environmental factors

**Table 2 Analysis** characterizes requirements:
- **AR, Real-time Object Detection, Avatar communication**: High data rate, real-time latency, mid reliability, high need for QoE-based adaptation
- **Speech Recognition/Conversational AI, Context-aware Recommendation**: Mid data rate, real-time latency, mid reliability, mid adaptation need
- **Model Learning/Updates**: High data rate, non-real-time latency, mid reliability, low adaptation need

**Key Observations:**
- **Observation 3**: Diversity of applications and modalities makes traffic characteristics evaluation/classification challenging
- **Observation 4**: Temporal dependency and synchronization required between media modalities and AI data for real-time/delay-bound AI inference
- **Observation 5**: Applications characterized by uplink-intensive, bursty/continuous, multi-modal traffic with diverse latency sensitivity and QoE impact
- **Observation 6**: Current QoS frameworks lack application/context awareness, granularity, and adaptability for dynamic 6G network conditions

**Proposals:**
- **Proposition 4**: SA4 should develop generic QoS and QoE mechanisms suitable across diverse traffic patterns
- **Proposition 5**: SA4 should study QoS framework enhancements enabling finer granularity and context awareness
- **Proposal 6**: SA4 should specify procedures for real-time QoE-based adaptation of multimodal media and define QoE metrics for real-time/delay-bound AI inference

### 3.3 New Protocols

**Key Points:**
- Transport protocols (QUIC-based, HTTP/3-based) are rapidly evolving to suit AI-enabled use cases
- These evolutions substantially impact traffic characteristics including latency, reliability, and resource utilization
- Rel-19 SA2 specified techniques for delivering Media Related Information (MRI) when XRM traffic is end-to-end encrypted (QUIC)
- TS 23.501 clause 5.37.9 specifies options for relaying MRI over N6 interface
- Rel-18/19 SA4 specified solutions in TS 26.522 enabling RTP senders to transmit MRI using RTP header extensions

**Proposals:**
- **Observation 8**: New transport protocols impact media transmission reliability, latency, and traffic characteristics
- **Proposal 7**: SA4 should characterize impact of QUIC-based protocols on AI data delivery and traffic characteristics, especially for real-time/delay-bound applications
- **Observation 9**: SA4 has specified RTP-based MRI solutions in TS 26.522
- **Proposal 8**: SA4 should study integration of SA2-defined QUIC-based transport extensions into media delivery architecture, leveraging FS_Q4RTC-MED study

### 3.4 Multi-Device Scenarios

**Key Characteristics:**
- AI-enabled services deployed across smartphones, AI glasses, smartwatches, fitness devices, companion compute devices
- Services involve continuous sensing, media capture/processing, on-device/distributed AI inference, and frequent network data exchange
- Services are inherently multi-device with different devices contributing sensing, media, compute, display, or connectivity functions
- Introduces QoS/QoE challenges for modality/format adaptation, AI processing coordination with partial/full offload, and traffic correlation across UEs

**Figure 1** illustrates UE tethering where AI-enabled services are delivered across multiple user devices relying on a tethered UE for cellular connectivity and coordination.

**Observations and Proposals:**
- **Observation 7**: AI-enabled services increasingly operate across heterogeneous multi-devices associated with same user; modalities and AI processing may be distributed
- **Observation 8**: Existing system assumptions are UE-centric and don't address QoS/QoE requirements of multi-device scenarios
- **Proposal 8**: SA4 should study impact of multi-devices on QoS and QoE framework
- **Observation 9**: QoS enhancement and QoE-driven dynamic media adaptation need to operate across heterogeneous multi-devices
- **Proposition 9**: SA4 should consider heterogeneous multi-devices for QoE metrics definition and QoS enhancement study for real-time/delay-bound AI inference

## Conclusion

The document proposes to discuss and agree on all proposals as part of the 6GMedia study and document them in a new section 6.X of the TR. The contribution emphasizes three main areas requiring SA4 attention:
1. Support for heterogeneous and multimodal media types including AI/ML data
2. Enhanced QoS/QoE frameworks with finer granularity and context awareness
3. Multi-device scenario support for AI-enabled services