Meeting: TSGS4_135_India | Agenda Item: 11.1
25 documents found
[FS_6G_MED] Work Plan for Media Aspects for 6G System
This document presents the work plan for the Feasibility Study on "Media Aspects for 6G System" (FS_6G_MED), approved at SA4#134 (S4-252142) and SA plenary #110 (SP-251652).
The study aims to:
WT#5: Trusted and private media communication
Identify dependencies with other WGs and collect information on relevant developments within 3GPP and externally
Map work topics to basic functions and develop high-level call flows based on existing media delivery architectures and 6G design concepts
Identify gaps and opportunities, recommending either:
Candidate solutions to address issues
Coordinate with SA1, SA2, SA3, SA5, SA6 and external organizations (SVTA, CTA WAVE, ISO/IEC JTC1 SC 29, 5G-MAG, MSF, Khronos, IETF)
The study addresses the need for: - CAPEX/OPEX reduction and monetization opportunities - New services and experiences in the 6G era - System simplification and integration of new technologies - Support for new use cases: ISAC, XR/immersive communication, AI-based services - Improved end-user QoE across diverse devices and network conditions
Study media delivery architecture aspects for 6G based on TS 26.501, TS 26.506 and new 6G architecture developments. Key aspects: - Accommodation of new 6G use cases by current 5G media delivery architecture - Identification of reusable components from 5G and earlier generations - Architecture simplification for improved deployability/implementability - Further harmonization between streaming and conversational services - Collection of existing/emerging content delivery protocols - Alignment with SA2 6G design concepts - Accommodation of commercially relevant media services
Identify trends and expected services related to media, including immersive and AI-related media. Sub-topics:
a) End-to-end service quality: Study aspects for defining end-to-end service quality for media services, including capturing, rendering, and QoE metrics definition
b) Traffic characteristics: Study and identify traffic characteristics of media services and use cases from TR 22.870 to support 6G radio and service architecture design
c) Immersive media formats: Collect, categorize and characterize (3C) emerging media formats (including different media types) for 6G XR/immersive media services, building on TR 26.956
d) Media communication for emerging AI services: Study AI representation formats and traffic characteristics for AI-related media services (agents, multi-modal LLMs, diffusion models), identifying gaps in QoS requirements, dynamic traffic characteristics, or AI-representation format definitions
Study media-related impacts from SA2 study topics:
a) AI for 6G: Media-related impacts from "AI for 6G (e.g. AI agent, framework)" aligned with SA2 WT#3
b) Integration of Sensing and Communication: Media-related impacts aligned with SA2 WT#4
c) Data handling: Media-related impacts on data collection, distribution, processing, storage, access and exposure, considering access control/user consent and privacy, aligned with SA2 WT#5
d) Computing: Media-related impacts on computing support for UE and application servers, aligned with SA2 WT#6
Note: Analysis may confirm no impact on SA4 specifications. Topics may be updated based on SA2 decisions.
Study aspects and opportunities for media services on ubiquitous networks including NTN and other low bit-rate/low power scenarios beyond speech. Focus on supported bitrates, functionalities, delays, power consumption and other design vectors, considering FS_ULBC study information.
Study aspects and opportunities for trusted and private media communication in applications including generative AI or agent-to-agent communication, covering end-to-end workflows, authentication, trust and exploring 6G's role.
Note: Coordination with SA3 expected on authentication and trust-related topics.
The study has broad industry support from 50+ companies including operators (Vodafone, Deutsche Telekom, AT&T, Orange, China Mobile, NTT), vendors (Ericsson, Nokia, Huawei, Samsung, Qualcomm, MediaTek, Apple), content providers (Dolby, Sony, Tencent, Bytedance), and research organizations.
A work plan tracking sheet is maintained online at: https://docs.google.com/spreadsheets/d/1AHXc41lTVAJ84ENKfi2GgmpGx26hqHSoNQ7JKxnBNBo/edit?usp=sharing
Snapshot to be provided in published document showing: - Topic title - Lead - TR 26.870 clause - Completion percentage - Rel-21 normative work decisions - Stage-2 impact - Contributors
Current overall progress: 0%
To be determined.
Host: Qualcomm, 16:00-17:30 CET - Agree initial work and time plan - Agree skeleton TR 26.870 for SA4#135 submissions - Agree initial working procedures - Prepare initial thoughts for work topics - Identify initial contributors - Submission deadline: Jan 13, 16:00 CET
Host: Qualcomm, 16:00-17:30 CET - Preparation for extended AHG meeting on FS_6G_MED - Identify common themes for workshop inputs - Submission deadline: Feb 23, 16:00 CET
Host: Qualcomm, 15:00-17:00 CET - Address baseline assumptions clustered in common themes - Identify moderators/leads to summarize common topics - Submission deadline: March 19, 15:00 CET - Note: 5G-MAG workshop on media energy consumption scheduled March 19
TR skeleton for FS_6G_MED
This is an early-stage Technical Report (v0.0.1) establishing the framework for studying media-related aspects in 6G mobile networks. The document is in skeleton form, defining the structure and scope for investigating media opportunities and gaps in the context of 6G systems.
The study aims to:
The conclusions will form the basis for further detailed studies and normative work.
The document establishes three foundational areas:
This section will identify media-related industry trends from: - Operators - Third-party providers - Verticals
These trends will inform the impact on 6G media architectures.
The study defines five initial work topics, each structured with: - Description - Key Issues - Context and External Factors - Potential Solutions - Mapping of Issues to Solutions - Conclusions
Focus on architectural aspects of media delivery in 6G systems.
Investigation of new media capabilities and services specific to 6G.
Coordination with SA2 architectural work to ensure media aspects are properly addressed.
Addressing media delivery across diverse access scenarios in 6G.
Security and privacy considerations for media services.
Additional work topics to be defined.
The study builds upon: - TR 22.870: Study on 6G Use Cases and Service Requirements - TR 23.801-01: Study on Architecture for 6G System Stage 2 - TS 22.ABC: 6G System Requirements (to be replaced with normative specification) - TS 26.501: 5G Media Streaming (5GMS) architecture - TS 26.506: 5G Real-time Media Communication Architecture
This is the initial skeleton version (0.0.1) from February 2026, SA4#135. All technical content sections contain editor's notes indicating that detailed content is yet to be developed. The document establishes the framework for comprehensive study of media aspects in 6G systems, with work to be populated in subsequent versions.
[FS_6G_MED] Some considerations on ways of working
This document proposes ways of working for the Feasibility Study on "Media Aspects for 6G System" (FS_6G_MED), which was agreed at SA4#134 (S4-252142) and approved by SA plenary #110 (SP-251652). The study aims to deliver TR 26.870 by TSG SA#115 (Mar-27), with an information milestone at TSG SA#114 (Dec-26).
The study encompasses five work topics: - WT#1: Media Delivery Architecture - WT#2: 6G Media - WT#3: Media Aspects related to SA2 topics - WT#4: Media for ubiquitous access - WT#5: Trusted and private media communication
The document establishes several foundational principles:
The study should align with SA4's Terms of Reference (SP-241362) and create value within 3GPP and the broader ecosystem. Key objectives include:
The document proposes a flexible organizational approach:
The proposed TR structure includes:
The rationale emphasizes: - Opportunistic and flexible approach - Acceptance of slide decks and workshop-style contributions - Accommodation of both flat organization (key issues) and detailed, objective-driven sections - Serving as a baseline for future work, not a gating factor - Prioritization and flexibility, especially for topics with external pressure
Priority: Low to Medium - Initial work based on new needs or SA2 aspects not yet studied - Monitoring existing or new dependencies on SA2 - Some work in 5G Advanced addresses enhancements (FS_AMD_Ph2)
Priority: Highest - Highest complexity and breadth - Includes traffic characteristics, QoE, and other aspects - Requires significant focus and input to define scope and approach
Priority: Minimal unless clear dependency identified - Monitor progress in SA2 - Key issues to monitor in TR 23.801-01: - Key Issue #20: Integrated Sensing and Communication (collection and transport of sensing data) - Key Issue #21: 6G data framework (data storage, retrieval, quality, latency, volume) - Key Issue #22: 6G Computing Support
Priority: Treated as near-separate study - Leverage ULBC work - Focus on data rates, scheduling, and implications for media services over NTN (GEO and LEO)
Priority: Medium - Requires definition of key questions and scoping - May evolve as separate study - Contribution-driven priority - Should be driven by use cases
AI traffic characteristics are identified as requiring higher priority:
To market SA4 and 3GPP work for 6G, larger themes and KPIs should be identified. Example themes include:
Input for new or larger themes is welcomed.
The document proposes to:
[FS_6G_MED] Preliminaries: assumptions and requirements
This change request introduces foundational content for the new FS_6G_MED (6G Media Delivery) study in TR 26.870. It establishes the preliminary assumptions, requirements framework, and baseline for existing media services that will inform the 6G media architecture work.
Adds comprehensive normative and informative references including: - Core 6G specifications (TR 22.870, TR 23.801-01, TS 22.ABC) - Existing 5G media specifications (TS 26.501, TS 26.506, TS 26.511, TS 26.512, TS 26.114, TS 26.117, TS 26.510, TS 26.517, TS 26.502, TS 26.143)
Establishes baseline architectural assumptions carried forward from TR 23.801-01:
Identifies SA2 key issues serving as baseline for 6G media delivery: - Key Issue #3: Network Slicing - Key Issue #4: User Plane Architecture - Key Issue #5: QoS Framework - Key Issue #7: Network Exposure - Key Issue #11: Non-3GPP access support - Key Issue #12: Voice Services - Key Issue #15: Messaging Services - Key Issue #19: 6G Network for AI - Key Issue #20: Integrated Sensing and Communication - Key Issue #21: 6G data framework - Key Issue #22: 6G Computing Support - Key Issue #23: 6G NTN Support
Establishes 5G media specifications as starting points:
Placeholder for architectural and media-related requirements from SA1, including use cases and requirements.
Defines 3GPP media services structure across working groups: - SA1: Service definition - SA2: Architectural support - SA4: Media technologies (protocols, codecs, formats, QoE)
Distinguishes between: - Full media services: Complete 3GPP-defined services - Media service enablers: Components enabling interoperability for third-party services
Identifies existing services for 6G consideration:
SA4: TS 26.114 (codecs, formats, protocols, QoE, telephony acoustics)
XR (AR/VR/MR) Media Services
Lists existing 3GPP media service enabler components:
References TR 26.857 for formalized Media Service Enabler framework.
Multiple editor's notes indicate areas requiring further development: - Additional assumptions may be added - Requirements clause to be populated with SA1 inputs - Additional media services and enablers to be added - Existing media services status (relevancy/deployments) to be identified
[FS_6G_MED] Considerations on Work Topic 4: Ubiquitous access
This contribution from Qualcomm introduces initial considerations for Work Topic #4 (WT#4) "Media for Ubiquitous Access" in the FS_6G_MED study (TR 26.870). The document serves as a starting point for discussions on media services support in ubiquitous networks, particularly Non-Terrestrial Networks (NTN) and low bit-rate/low power scenarios beyond speech.
The document clarifies the primary focus of WT#4: - Study aspects and opportunities for media services on ubiquitous networks including NTN - Address low bit-rate/low power scenarios beyond speech - Identify supported bitrates, functionalities, delays, power consumption and other design vectors - Take into account information collected in the FS_ULBC study
The contribution proposes a structured approach with the following elements:
The CR adds relevant normative and informative references: - TR 22.870 (SA1 6G use cases and service requirements) - TR 23.801-01 (SA2 6G architecture study) - TS 26.501 (5GMS architecture) - TS 26.506 (5G RTC architecture) - Multiple existing SA4 specifications (TS 26.114, 26.117, 26.502, 26.510, 26.511, 26.512, 26.517) - ULBC study reference
Focuses on studying aspects and opportunities for media services on ubiquitous networks, with emphasis on: - Non-Terrestrial Networks - Low bit-rate/low power scenarios beyond speech - Identification of supported bitrates, functionalities, delays, power consumption - Leveraging FS_ULBC study findings
Placeholder for SA1 use cases (to be completed)
Identifies three key issues from TR 23.801-01: - Key Issue #4: User Plane Architecture - Key Issue #23: Support of 6G NTN - Key Issue #24: Analyse 5GS IoT features and solutions
The document proposes two initial key issues as starting points:
Channel Characteristics: What are bitrate ranges, latencies and loss characteristics of relevant 3GPP Non-Terrestrial Networks?
Service Performance: How would existing services and applications perform under such channel conditions?
The document includes several editor's notes indicating: - More content to be added to the key issues section - Need to complete use cases and requirements by checking SA1 work - The subsection ordering may be adapted as appropriate for specific content
The study explicitly requires coordination with: - 3GPP groups: SA1 (TR 22.870), SA2 (TR 23.801-01), RAN (TR 38.960), SA3, SA5, SA6 - External organizations: SVTA, CTA WAVE, ISO/IEC JTC1 SC 29, 5G-MAG, Metaverse Standards Forum, Khronos, IETF
[FS_6G_MED] Requirements and associated use cases
This pseudo-CR proposes draft content for Sub-clause 4.2 of TR 26.870 (FS_6G_MED), establishing preliminary requirements for the 6G media study. The contribution identifies media-related requirements derived from SA1 TR 22.870 use cases covering System and Operational Aspects (Clause 5) and AI (Clause 6).
Adds normative and informative references essential for 6G media requirements:
TS 22.ABC: 6G System Requirements
New Informative References:
Establishes structure for media-related requirements with an editor's note indicating revision pending SA1 consolidated 6G requirements completion.
Based on TR 22.870 Clause 5.3, identifies requirements for interoperability across access technologies:
Key Application: Leveraging non-3GPP technologies (ATSC, DVB) for multicast/broadcast delivery, complemented by 3GPP unicast for repair and UE join procedures.
Based on TR 22.870 Clause 5.4.2, identifies media-related capabilities requiring continuity from 5G:
Based on "Enhanced Network Service Awareness" use case (TR 22.870 Clause 5.9.8), addressing challenges of 6G services with multiple data streams (video, voice, text, sensor data, AI, sensing) having varied traffic patterns and QoS requirements:
Derived from multiple AI use cases in TR 22.870 Clause 6:
(Subject to operator policy and user consent; transformations via AI capabilities of 3rd party or 6G network including IMS)
[FS_6G_MED] Considerations on Work Topic 1: Media Delivery Architecture
This contribution from Qualcomm provides initial considerations and structure for Work Topic #1 (Media Delivery Architecture) in the FS_6G_MED study item. The document proposes text for TR 26.870 v0.0.1, establishing the foundation for studying media delivery architecture aspects for 6G systems.
The contribution establishes that the work topic will study a harmonized media delivery architecture for 6G based on: - TS 26.501 (5G Media Streaming architecture) - TS 26.506 (5G Real-time Media Communication Architecture) - New 6G architecture developments from TR 23.801-01
Key aspects to be studied include: - Assessment of whether current 5G media delivery architecture functionalities accommodate new 6G use cases - Identification of reusable and improvable components from 5G and earlier generations - Architecture simplification for improved deployability and implementability - Further harmonization of media delivery architecture for streaming and conversational services - Collection and enablement of relevant existing and emerging content delivery protocols in 6G - Alignment with SA2's 6G design concepts - Accommodation of commercially relevant media services and evolving standardization activities
Relevant SA2 Key Issues from TR 23.801-01: - Key Issue #2: SBA framework - Key Issue #3: Support of Network Slicing in the 6G system - Key Issue #4: User Plane Architecture - Key Issue #5: QoS Framework for 6G - Key Issue #7: Network Exposure - Key Issue #17: Migration and Interworking
External coordination needed with: - SA1 (use cases and requirements - TR 22.870) - SA2 (architecture - TR 23.801-01) - RAN (TR 38.960) - External organizations: SVTA, CTA WAVE, ISO/IEC JTC1 SC 29, 5G-MAG, Metaverse Standards Forum, Khronos, IETF
The contribution proposes documenting several conceptual models to define how applications can benefit from the media delivery architecture:
Defines an 8-point model describing: - Service Data Flows traversing the User Plane between Media Client and Media AS at reference point M4 - Support for bidirectional media content flow - Multiplexing of Application Data Flows onto Service Data Flows - Different service location endpoints during a session - Mapping to different deployed servers (physical or virtual) - Mapping to one or multiple physical network interfaces based on ANDSP/URSP - Traversal of different Data Networks and Access Networks - Service Data Flow migration due to mobility
Provides a structured table defining: - AS instance: An AS instance in the Media Delivery System deployment - Content Hosting Configuration: Corresponding to a single Provisioning Session in the AF - Distribution Configuration: Modeling a service location exposed by the AS instance - Downlink media streaming session: Modeling a session that may span multiple content items - Service Data Flow: An HTTP connection at reference point M4d (IP 5-tuple) - Application Data Flow: Series of media segment download requests, with support for multiplexing multiple HTTP requests on the same connection (e.g., different DASH Adaptation Sets on HTTP/3)
Provides similar structured table for uplink with: - 5GMSu AS instance: Edge AS instance in the 5GMS System deployment - Content Publishing Configuration: Corresponding to a Provisioning Session in the 5GMSu AF - Contribution Configuration: Modeling a service location exposed by the AS instance - Uplink media streaming session: Modeling a session that may span multiple content items - Service Data Flow: HTTP connection at reference point M4u (IP 5-tuple) - Application Data Flow: Series of media segment upload requests with multiplexing support
Key technical details: - Finest granularity visible to 5GMS AS is the Service Data Flow (HTTP connection) - Service Data Flows associated with sessions via media delivery session identifier in HTTP request headers (per TS 26.512 clause 6.2.3.6) - Service location (HTTP authority and URL path) enables association with Distribution/Contribution Configuration
Placeholder for future completion.
Two initial key issues are identified for study:
Should the media delivery architecture for streaming and real-time communication services be harmonized or separated?
What relevant existing and emerging content delivery protocols would map to the 5G media delivery architecture, and what extensions or simplifications can be done for 6G Media Delivery?
The contribution adds relevant normative and informative references including: - TR 22.870 (SA1 6G use cases) - TR 23.801-01 (SA2 6G architecture) - TS 26.501, 26.506 (5G media delivery architectures) - Multiple TS 26.5xx series specifications - Placeholder for TS 22.ABC (6G System Requirements)
The document includes several editor's notes indicating areas requiring future work: - Completion of potentially relevant use cases and requirements from SA1 - Possible addition of more application models - Completion of real-time communication application service model - Addition of more key issues
pCR [FS_6G_MED] Considerations on Work Topic 1: Media Delivery requirements for intelligent immersive calling
This pCR proposes updates to clause 4.2 of TR 26.870, introducing new requirements for intelligent immersive calling services. The contribution is motivated by SA1's work on TR 22.870, which identified use cases for immersive communication leveraging 6G technologies and AI capabilities. The focus is on enabling wearable devices (AR/VR glasses) and smart devices (smart TVs, watches) to provide intelligent immersive calling services, particularly targeting the aging population.
The use case describes one-to-one communication with multiple devices and media rendering capabilities (split-rendering or spatial computing rendering), where multi-modality data (video/text/audio/avatar) is transmitted. The service is empowered by AI capabilities including generative AI and multi-modal models, taking input from various smart devices and sensors.
The following technical requirements are introduced for intelligent immersive calling/conferencing:
The use case demonstrates requirements for: - Multi-device coordination and synchronization - Real-time video transcoding to immersive media formats - AI-based processing for face rendering, intent recognition, and scene reconstruction - Multi-modal data integration (video, audio, sensor data, biometric data) - Device-aware QoE management - Network-based rendering and processing capabilities - Protocol extensions to support new modalities and interaction paradigms
Media related real-time AI traffic Characteristics
This is a pseudo Change Request (pCR) from Huawei/HiSilicon to TSG-SA WG4 Meeting #135, proposing to add a new clause on end-to-end real-time multi-modal AI traffic characteristics to a 6G media-related TR. The document follows the methodology established in TR 26.926 for traffic modeling and quality evaluation.
The document aims to characterize AI traffic for 6G use cases in real-time video conferencing and robotics by defining end-to-end architecture, procedures, content coding models, and delivery mechanisms for real-time AI inference applications.
The document defines a 10-step call flow: 1. UE connects and provides supported AI encoder information 2. AS configures AI model and corresponding decoder 3. Operational flow includes: - Media data collection and AI encoding at UE - Packetization using native or customized packet format - Transmission to AS - Optional AI decoding at AS (if compatibility required) - Media-related response generation and transmission back to UE - Response decoding and presentation at UE
Two types of AI encoders are defined:
Three key characteristics identified:
| Traffic Type | Burst Size | Max Latency | Service Bit Rate | Delay | Payload Error Rate | |--------------|------------|-------------|------------------|-------|-------------------| | Image GenAI | 15 KB | 15 ms | 8 Mbps | 20 ms | ≤20% | | Video GenAI | 1.5 MB | 100 ms | 120 Mbps | 20 ms | ≤20% | | Chatbot | 0.5 KB | 20 ms | 200 Kbps | 30 ms | ≤20% |
The document concludes that AI traffic characteristics can be leveraged in 3GPP networks to improve transmission efficiency: - RAN awareness of latency requirements, packet arrival patterns, error tolerance, and differentiated importance - Enhanced operations: Improved scheduling and HARQ operations - System capacity: Potential to increase supported number of UEs
The document adds seven new normative/informative references including: - TR 22.870 (6G Use Cases) - TR 26.926 (Traffic Models and Quality Evaluation) - Various academic papers on neural codecs (GRACE, Liquid, DVC) - RP-253288 on AI services for 6G
Neural Network Based Video Codec Architecture and Support for Error Resilience
This contribution proposes documenting neural network-based codec (NNC) architectures and their error resilience capabilities in the 6G Media study (FS_6G_MED). The document focuses on two specific NNC implementations: DVC and GRACE codecs, highlighting their potential relevance for 6G deployments targeting 2030.
The document describes the DVC (Deep Video Compression) codec proposed by Guo Lu et al. (2019), which represents a hybrid approach to neural network-based video coding:
Key Architecture Features: - Replaces traditional video coding components with neural network equivalents while maintaining the overall predictive coding architecture - Uses CNN models for optical flow estimation in motion estimation and compression - Implements neural network-based motion compensation to generate predicted frames - Maintains functional similarity between traditional and NNC components
Joint Optimization Approach: The codec jointly trains/optimizes multiple components: - Motion estimation - Motion compensation - Residual compression - Quantization and bit-rate estimation
Performance: - Achieves competitive results with H.264 and H.265 - Publicly available source code and research paper - Similar approaches adopted in industry (Deep Render codec in FFMPEG and VLC)
The document presents GRACE codec (Yihua Cheng et al. 2025) as an extension of DVC with enhanced error resilience:
Channel-Aware Training: - Jointly trains encoder and decoder under simulated packet loss conditions - Enables codec awareness of specific loss patterns - Implements channel-aware source coding design
Technical Implementation: - Encodes each frame as a tensor split into independently decodable sub-tensors - Uses arithmetic coding mapped to packets - Tested across wide range of loss rates - Includes lighter profiles (GRACE-lite) for mobile devices
Performance Validation: - User study with 240 crowdsourced participants - Tested 61 videos under realistic conditions - Used Google GCC to emulate WebRTC congestion control - Channel conditions: LTE and broadband traces (0.2-8 Mbps, 100ms end-to-end delay) - MOS scores up to 38% better than H.264/H.265 with AL-FEC and error concealment
Key Performance Improvements: - Exceptional reduction in tail latency - Reduced non-rendered frames - Reduced stalls per second - Improved video smoothness
Hardware Requirements: - Original GRACE: NVIDIA A40 GPU (31.2-51.2 fps) - GRACE-lite: Real-time capable on current mobile devices
Content Specificity: - NNC performance may be content-specific due to training data dependencies
Reconstruction Challenges: - Potential reconstruction failures due to non-bit-exact arithmetic operations in GPU frameworks - Issues with floating-point arithmetic and convolution operations - Currently under discussion in SC29 (media standards organization) - Identified as potential key enabler requiring resolution for future NNC codec adoption
The document makes two specific proposals:
Documentation Request: Document NNC features and their application to error-resilient AI traffic in the 6G MED TR under 6G Media (based on clauses 2 and 3)
Use Case Consideration: Include the use case of NNC with channel-aware source coding training in AI traffic characteristics
The contribution includes specific text proposals for: - Change 1: Addition of two references to the normative references section - Change 2: New clause 6.2.4.X under Work topic #2d (AI Traffic Characteristics) containing the technical description of DVC and GRACE codecs, including architecture diagrams and performance characteristics
Survey of Native AI formats for multi-modal AI
This document surveys AI native formats for addressing generic AI-related tasks including generation, comprehension, information retrieval, and recommendation in advanced multimedia use cases (multi-modal AI).
The document acknowledges that standardization of such formats is challenging due to: - Constantly evolving field - Task-specific nature of AI native formats
However, SA4 should: - Document and study these formats in FS_6G_MED - Track progress in this area - Understand characteristics and QoS requirements relevant to 3GPP networks - Consider these formats when analyzing AI traffic characteristics
Recent advances in AI, particularly Large Language Models (LLMs) and Multi-Modal LLMs, enable new applications in generation, comprehension, information retrieval, and recommendation. Multi-modal LLMs require AI-related pre-processing to create AI native formats.
The document presents a comprehensive survey based on [Jian Jia et al. 2025], extended with 2025 techniques, showing:
Input → Encoder → Latent Vector (z) → Quantization → Decoder → Output
With supervision feedback loop for training.
Different applications use different decoder models and encoder processing, making native AI formats often task-specific.
Following [Jian Jia et al. 2025], the survey identifies:
Note: Some tokenizers may not use quantization and rely on floating-point arithmetic.
Native AI formats have been used to develop codecs: - JPEG AI [ISO/IEC 6048-1] - Deep Render codec (from InterDigital, available on FFMPEG and VLC platforms)
The document provides an extensive table (Table 1) surveying AI pre-processing techniques with the following characteristics:
a) AI Traffic Characteristics: Take this information into account when developing an overview of AI traffic characteristics with native AI format or codec besides options for traditional codec.
b) 6G Split Inferencing: Consider that split operation may include AI processing/formatting in addition to traditional model splitting considered in 5G.
c) TR Update: Add text and diagram based on clause 2 to TR for FS_6G_MED.
Add references [x1] through [x10] to the TR, including key papers on: - Discrete tokenizers survey - JPEG AI standard - Quantization techniques - Transformer architectures - CNN architectures - Specific implementations
Add comprehensive new clause under "AI Traffic Characteristics" covering: - Overview of multi-modal AI and native formats - Reasons for AI split processing - Encoder techniques (Transformer, CNN, MLP) - Decoder processing - Supervision methods - Application types - Quantization techniques - AI-based codecs
This clause provides the technical foundation for understanding native AI formats in the context of 6G media services.
Embodied AI use case and related requirements
This document builds upon previous work from SA4#134 (S4-251826) and TR 22.870 clause 6.28, which established the importance of embodied AI for the FS_6G_MED study. The paper represents a paradigm shift from static observation sensors (fixed cameras with limited fields of view) to mobile embodied sensors (robots, UAVs) that actively interact with and explore physical environments. This shift is aligned with recent industry developments including NVIDIA's Isaac GR00T project and ITU-T SG21 workshop discussions.
The core use case involves devices equipped with multiple cameras capturing and uploading multi-modal concurrent data streams (video, point clouds) for network-based AI inference supporting tasks like multi-modal perception, 3D digital twin modeling, trajectory planning, and task orchestration across educational, home, industrial, and hazardous environments.
The document provides detailed descriptions of four state-of-the-art embodied AI tasks based on current research:
An autonomous agent explores previously unknown environments while providing natural language descriptions at key moments. The approach uses: - Curiosity-based exploration using forward/inverse dynamics models with neural network embeddings - Surprisal value (L2 norm between predicted and actual embeddings) as reward function - Speaker policy triggered by depth or curiosity thresholds - Transformer-based captioning model with self-attention
Evaluation metrics: Average surprisal score, coverage measure (intersection with ground-truth semantic classes), diversity score for consecutive captions.
Agent identifies differences between an outdated map and current environment state, combining exploration with spatial reasoning.
Evaluation metrics: - Percentage of navigable area seen (Seen%) - Detection accuracy (Acc%) - Intersection over Union (IoU) for changed elements - Separate IoU+ (added objects) and IoU- (removed objects) - mAcc and mIoU (computed only on visited space)
Fundamental task for acquiring spatial information using deep reinforcement learning with intrinsic rewards (curiosity, novelty, coverage). Architecture comprises: - CNN-based mapper - Pose estimator - Hierarchical navigation policy
Evaluation metrics: IoU between reconstructed and ground-truth maps, map accuracy (m²), area seen (AS), free/occupied space metrics (FIoU, OIoU, FAS, OAS), mean positioning error.
Agent navigates to target destination guided only by natural language instructions, using: - 360° panorama encoding in 12×3 grid with 2048-dimensional feature maps - Attention mechanisms for instruction interpretation - Low-level actions (rotate, tilt, step ahead)
Evaluation metrics: Navigation error (NE), oracle success rate (OSR), success rate (SR), success rate weighted by path length (SPL).
Observation 1: AI processing may occur at cloud/server, requiring transmission of either raw visual data (with standard compression) or pre-processed data (embeddings).
Observation 2: Cloud-based implementation requires low latency connectivity and error resilience for real-time navigation and environmental interaction.
Observation 3: Evaluation methods are highly task-dependent with different metrics for different tasks.
The document identifies specific scenarios where cloud-based AI processing is preferable:
For 6-8 cameras using 3GPP codecs (e.g., HEVC): - Peak data rates: 20-100 Mbit - Direction: Uplink - Characteristics: Bursty, ultra-low latency
Observation 4: Offloaded embodied AI may demand uplink bit-rates of 20-100 Mbit.
The document presents three transmission format categories:
| Transmission Format | UE Requirements | Network Requirements | |---------------------|-----------------|----------------------| | 3GPP codec (HEVC) | Support HEVC encoding and transmission | ~20-100 Mbit peak, bursty, uplink, ultra-low latency | | Standardized Feature map/codec (MPEG VCM/FCM, JPEG AI) | Support standard-based feature/image codec | Unknown peak bit-rate, bursty, ultra-low latency uplink | | Proprietary/open source (embeddings, tokenizers) | Compute representation in software and transmit | Unknown, bursty, ultra-low latency uplink, efficient transmission support needed |
Observation 5: More investigation of proprietary and standardized feature map codecs is needed to support this use case.
The document proposes new clause 4.2.2.X "Requirements for embodied AI" incorporating: - Summary of example tasks (explore and explain, spot the difference, indoor exploration, vision and language navigation) - All five observations regarding cloud processing scenarios, latency requirements, task-dependent evaluation, uplink bit-rate demands, and codec investigation needs - Complete transmission format comparison table - Rationale for cloud offloading in different scenarios
This document makes significant contributions by: - Providing concrete, research-backed examples of embodied AI tasks with detailed technical descriptions - Establishing task-specific evaluation methodologies and metrics - Identifying network requirements for cloud-offloaded embodied AI (20-100 Mbit uplink, ultra-low latency, error resilience) - Analyzing alternative transmission formats beyond traditional video codecs (feature maps, embeddings) - Justifying cloud-based processing for specific deployment scenarios - Proposing specific text additions to the TR for FS_6G_MED study
demonstration of real-time ai codec transmission in WebRTC
Source: Huawei, HiSilicon
Meeting: SA4 #135, Goa, India (9-13 Feb 2026)
Work Item: FS_6G_MED / Rel-20
Purpose: Demonstration of AI codec for real-time AI traffic over WebRTC
This document presents a practical demonstration of end-to-end AI media delivery using WebRTC, specifically implementing an AI-codec video streaming system with RTP. The demonstration proves the feasibility of real-time AI codec-based traffic transmission over WebRTC infrastructure.
The implementation utilizes three key tools:
Encoding Process: - Video converted to bits frame-by-frame through encoder neural network processing and entropy encoding - Codec-specific metadata carried in RTP Payload Header
Payload Format Structure:
[[Latent Shape | Hyperprior Byte Length | Latent Byte Length] | [Hyperprior Bytes | Latent Bytes]]
Payload Components: - Latent Shape: Shape of the latent representation - Hyperprior Byte Length: Length of hyperprior parameter bytes (used for probability distributions in entropy coding) - Latent Byte Length: Length of latent representation bytes
Transmission Side: - Large payloads fragmented due to MTU limitations - aiortc automatically appends standard RTP Header to each fragment - RTP packets transmitted with congestion control
Reception Side: - RTP packets buffered and reorganized per frame by aiortc - Packets parsed according to agreed format - Video frame restoration through entropy decoding and decoder neural network processing - Error resilient codec compensates for potential packet loss
Testing Methodology: - Random packet loss simulated using clumsy software - Wireshark captures received packets at receiver - Analysis based on RTP Header fields: Timestamps, Sequence Numbers, Marker Bits
Traffic Characteristics Analyzed: - Packet loss situation per frame - Performance of restored video frames - Packet size distribution - Packet arrival patterns - Packet success rate requirements
Current Implementation Status: - Actual AI codec deployed (preliminary version) - Uses bmshj2018_factorized model [R1] instead of Grace for moderate fps on CPU - Low-resolution video used due to computational constraints - End-to-end link feasibility proven
Demo Versions Provided: 1. With packet loss: Simulated using clumsy; RTP retransmission enabled; packet loss causes slight stuttering (error recovery not yet implemented) 2. Without packet loss: Clean transmission demonstration
[R1] https://arxiv.org/abs/1802.01436 (bmshj2018_factorized model)
[FS_6G_MED] Discussion on AI traffic trends
This contribution addresses Work Task 2b (Traffic characteristics) and 2d (Media communication for emerging AI services) from the 6G Media study, focusing on AI media traffic analysis. The document provides insights into popular AI applications, their traffic generation patterns, and proposes organizational structure for the TR clauses.
Four broad categories of consumer AI applications identified:
Chat and conversation: Text-based chats with general-purpose chatbots (e.g., ChatGPT) and voice conversations with AI services. Includes task-specific use cases (scene recognition, solving handwritten math problems). Some use cases take images as conditioning input, increasing UL data volumes and rates.
Document generation: Creation of longer texts and formatted documents (PDFs, presentations). Prompts include text, voice, documents, and images.
Image generation: Creation of images from scratch based on prompts and AI-powered image manipulation. Entertainment-driven adoption among younger demographics. Heavy traffic impact on network.
Video generation: AI-based video creation. Throughput-intensive in DL, while image inputs drive relatively high UL volumes.
Key Technical Characteristics: - Cloud-based AI inferencing creates bursts in uplink traffic - Uses existing web-based protocols (e.g., WebRTC for live audio/video) - Existing codecs (AVC, HEVC) used for encoding before transport - Text and images are base-64 encoded and encapsulated in JSON (OpenAI API, Gemini API) - Agentic AI apps becoming more common
Shifting UL/DL Ratios: - Uplink data growing faster than downlink traffic - Driven by conditioning inputs (images) transmitted to AI inference factories - Data volume spread per app session documented (Figure 1 reference)
Rising Data Volumes: - Multi-modal, user-friendly experiences increasing overall traffic - Users "talking with their data" and interacting with AI assistants - Sharing photos and videos from smartphones to refine prompts
Sensitivity to Latency: - Conversational AI services respond non-linearly to extended latency - Application-level reaction to network conditions varies by application - Example case study: AI-app responded linearly for inserted latency up to 0.5s, then non-linear response begins. At ~1.5s inserted latency, response time grew by almost twice the inserted latency (Figure 2 reference)
Agentic AI Opportunities: - AI agents can shift inference loads and network traffic away from peak hours - Operating in scheduled, off-peak cycles ensures results ready when needed while avoiding congestion
Current Traffic Statistics: - AI traffic constitutes 0.06% of total traffic in observed mobile network - 74% downlink, 26% uplink
Architecture: - LLM-driven autonomous agent architecture with LLM as core reasoning engine - Additional components for planning, memory management, and interaction with external tools - Multi-agent systems with collaborative reasoning, persistent memory, and autonomous decision-making
Operational Characteristics: - Agentic tasks span multiple steps: data search, analysis, document generation in defined formats - Example: PDF-format travel plans including flights, accommodation, meeting schedules, budget limitations - Tested AI agents typically operated 10-20 minutes - Data volumes roughly in line with other AI apps analyzed - More data-rich outputs, partially offset by interim step results not sent to smartphone
Protocols for Agentic Communication:
Remote Procedure Calls (RPC): Run tasks on remote servers
Model Context Protocol (MCP): Open-source standard for connecting AI applications (e.g., LLMs like Claude, ChatGPT) to external systems (local files, databases), tools (search engines, calculators), and workflows. Uses JSON-RPC 2.0 as underlying RPC protocol.
Agent2Agent (A2A) Protocol: Open standard enabling seamless communication and collaboration between AI agents to solve complex tasks. Complementary to MCP. Provides standard methods and data structures for agent-to-agent communication over HTTPS, irrespective of underlying implementation. MCP can expose AI agents as tools to other agents, while A2A provides inter-agent communication.
NOTE: Agentic AI apps, like other AI apps, typically use existing transport protocols (e.g., HTTPS for A2A, JSON-RPC for MCP) and data types (e.g., encoded audio, video, text) for data exchange over the network.
The document proposes the following agreements:
Add Clause 2 content to TR 26.870 clause 6.2 as a basis for further work
Take into account that current AIML traffic reuses existing protocols and formats (i.e., audio, video, text over HTTP, RTP, etc.)
Agree to prioritize characterization of existing popular AI apps and provide initial analysis to SA by June 2026
[FS_6G_MED] LLM-based AI services
This contribution addresses Work Task 2 objective (d) of FS_6G_MED, which focuses on media communication for emerging AI services. The objective aims to:
The contribution notes that SA1 TR 22.870 contains over 60 AI-related use cases, many referencing "tokens" as basic units for Gen-AI models. While tokenized traffic over networks is not yet widely deployed, the fast-paced research warrants SA4's attention to elaborate these terms and establish a framework.
The document proposes a more generic architecture than the voice translation-specific model in TR 26.847. Key definitions:
The contribution presents a generic (M)LLM architecture (Figure X.1) with the following components:
Input Processing: - Tokenizer: Function that converts data of a particular modality into tokens (e.g., words, image patches) - Modality Encoder: AI/ML model that encodes tokens into token embeddings (e.g., OpenAI's CLIP for images and text)
Processing: - Combination Layer: Combines input token embeddings with contextual token embeddings, potentially using techniques like RAG for context window management
Output Processing: - Media Decoder/Generator: Processes LLM output token embeddings into desired format (e.g., natural language)
Tokens: Discrete units of information in a given modality (words in text, audio frames, image patches) representing meaningful components of AI/ML data with clearly defined boundaries.
Token embeddings (or embeddings): Dense numerical tensors encoding semantic properties, relationships, and contextual meaning of tokens. Transform discrete tokens into continuous mathematical spaces where semantic relationships can be computed through vector operations.
NOTE 1: Current popular AI applications do not generate network traffic composed of token embeddings. Feasibility of such transport using existing protocols is FFS.
NOTE 2: Modern LLM services charge based on number of tokens processed (outcome of modality encoding and combination layers), but user input consists of traditional media (text, images, audio) in the form of prompts.
NOTE 3: In current AI applications, all components on the server side (right of dashed line in architecture) run on the server.
The document proposes to discuss and agree on the generic architecture and definitions for LLM-based AI applications in Clause 3 as a basis for further work in the study.
[FS_6G_MED] Testbed for AI Media Services traffic characterization
This contribution from Qualcomm proposes a comprehensive testbed framework for characterizing traffic patterns and QoE metrics of generative AI services in the context of the FS_6G_MED study. The testbed addresses the need for quantitative characterization of AI-native media services under diverse network conditions, which is a key requirement for the Study on Media Aspects for 6G System.
The testbed provides end-to-end measurement capabilities for multiple AI service types: - Chat services - Streaming services - Agentic tool use - Image generation - Multimodal analysis - Real-time conversational AI
The testbed captures comprehensive performance metrics including: - Latency metrics: TTFT (Time To First Token), TTLT (Time To Last Token), latency percentiles - Traffic metrics: UL/DL bytes and ratios, burstiness - Performance metrics: Success rate, token rate, tool-call latency, streaming stall statistics - Protocol analysis: All pcap-enabled analysis capabilities
Deep visibility into protocol and payload behavior is provided through trace logging functionality, which can be enabled via TRACE_PAYLOADS=1. This enables generation of:
- WebRTC SDP samples
- Exact computer-use request/response payloads
The testbed follows an orchestrator-centric architecture with clear separation of concerns:
The framework is designed for easy extension:
- New scenarios: Create a class extending BaseScenario, register in scenarios/__init__.py, and add YAML entry in configs/scenarios.yaml
- New providers: Implement a client subclassing LLMClient and register in the orchestrator client factory
The testbed includes vLLM client support (clients/vllm_client.py) enabling evaluation of self-hosted models via OpenAI-compatible API, with the same metrics and logging pipeline as hosted providers.
configs/scenarios.yamlconfigs/profiles.yamlpython orchestrator.py --scenario chat_basic --profile 5g_urban --runs 10python orchestrator.py --scenario all --runs 5--capture-pcap--capture-l7The contribution includes preliminary evaluation results showing: - TTFT (Time To First Token) measurements across different scenarios - Average throughput measurements by scenario
Note: These initial results are presented as examples and are not intended for TR documentation.
The contribution proposes that SA4: - Agrees to adopt the proposed testbed as the baseline for AI traffic characterization evaluation - Documents the testbed in TR 26.870 (Study on Media Aspects for 6G System)
The contribution references: - [1] S4-260xxx: Generic Network Interface Emulator for Media Delivery Evaluation - [2] SP-251652: New SID on Media Aspects for 6G System (FS_6G_MED) - [3] 3GPP TR 22.870: Study on 6G Use Cases and Service Requirements - [4] 3GPP TR 26.998: Support of XR Services
[FS_6G_MED] Test scenarios for AI traffic characterization
This contribution from Qualcomm proposes test scenarios for characterizing AI traffic patterns in support of the 3GPP SA4 6G Media Study objectives. The work is based on AI-related use cases defined in TR 22.870 "Study on 6G Use Cases and Service Requirements", covering AI Agents, Large Language Models (LLMs), Generative AI, and real-time AI inference services.
A 6G AI Traffic Characterization Testbed has been developed to measure traffic characteristics of generative AI services, analyze agentic AI patterns, and evaluate QoE metrics under various network conditions.
The contribution identifies and categorizes relevant AI use cases from TR 22.870 into four main groups:
The contribution proposes 10 test scenarios with explicit mapping to TR 22.870 use cases:
| Scenario | Description | TR 22.870 Mapping | |----------|-------------|-------------------| | chat_basic | Basic single-turn LLM chat interaction | 6.11, 6.17, 6.22, 6.59 | | chat_streaming | Multi-turn chat with streaming responses | 6.11, 6.17, 6.26, 6.31, 6.59 | | shopping_agent | AI Agent with tool calling (MCP) | 6.6, 6.7, 6.8, 6.11 | | web_search_agent | Research agent with web search capability | 6.6, 6.13, 6.21 | | realtime_text | Real-time conversational AI via WebSocket | 6.3, 6.17, 6.22, 6.38, 6.49 | | realtime_audio | Audio-based real-time conversation | 6.17, 6.22, 6.38, 6.49 | | image_generation | Image generation using Generative AI | 6.26, 6.31, 6.33, 6.34, 6.50 | | multimodal_analysis | Multimodal input analysis (image + text) | 6.3, 6.15, 6.26, 6.28, 6.38, 6.50 | | video_streaming | Video upload for AI inference offloading | 6.28, 6.38, 6.50 | | computer_control_agent | Computer use agent via GUI automation | 6.8, 6.9, 6.21, 6.28 |
Addresses TR 22.870 clauses 6.11, 6.17, 6.22, and 6.31.
Key Metrics: - Time-to-First-Token (TTFT): Critical QoE metric for perceived responsiveness - Time-to-Last-Token (TTLT): Total response generation time - Token streaming rate: Throughput in tokens per second - Uplink/Downlink byte volumes: Traffic volume for network dimensioning
Addresses TR 22.870 clauses 6.6, 6.7, 6.8, and 6.11. Uses Model Context Protocol (MCP) for tool calling.
Key Metrics: - Agent loop factor: Number of API calls per user prompt (agentic iterations) - Tool call latency: Time for external tool execution - Multi-step task completion time: End-to-end task duration - Burstiness patterns: Peak-to-mean traffic ratio and ON/OFF periods
Addresses TR 22.870 clauses 6.49, 6.38, and 6.3 (low-latency requirements).
Key Metrics: - WebSocket/WebRTC connection setup time - Streaming chunk delivery patterns - Stall detection metrics (rate, duration) - Audio byte volumes and durations (for voice scenarios)
Addresses TR 22.870 clauses 6.26, 6.28, 6.31, 6.33, and 6.50.
Key Metrics: - Image generation latency and payload sizes - Multimodal input processing requirements - UL/DL asymmetry ratios for different content types - Video upload bandwidth for AI inference offloading (20-100 Mbps per clause 6.28) - Frame-level packet error tolerance characteristics
The contribution proposes: 1. Adopt the identified test scenarios as described in this contribution and implemented in the AI testbed 2. Document the relevant AI use cases from TR 22.870 in an Annex in TR 26.870
This contribution provides a comprehensive framework for AI traffic characterization in 6G systems by: - Systematically mapping 10 test scenarios to specific SA1 use cases from TR 22.870 - Defining scenario-specific metrics covering QoE (TTFT, TTLT), traffic patterns (burstiness, asymmetry), and performance (latency, throughput) - Introducing AI-specific traffic characteristics such as agent loop factors, token streaming rates, and agentic iteration patterns - Addressing diverse AI service types including conversational AI, agentic AI with tool calling, real-time inference, and generative media services - Providing a testbed-based approach for empirical traffic characterization to support network dimensioning and QoS specification for 6G AI services
pCR [FS_6G_MED] Considerations on Work Topic 1 Media Delivery requirements for intelligent immersive calling
This pCR proposes updates to clause 4.2 of TR 26.870, introducing new requirements related to intelligent immersive calling services for 6G media delivery.
The change is motivated by SA1's work on TR 22.870, which introduces use cases for immersive communication targeting aging populations. The use case envisions leveraging 6G technologies (particularly AI capabilities such as generative AI and multi-modal models) to enable operators to provide intelligent immersive calling services through various wearable and smart devices including: - AR/VR glasses - Smart glasses - Smart TVs - Smart watches
The pCR introduces a new service definition describing intelligent immersive calling as: - An AI-empowered immersive calling service utilizing generative AI and multi-modal models - A service that aggregates inputs from multiple smart devices and sensors (cameras, smart watches, AR/VR glasses) - A service that can be natively provided by operators
The following technical requirements are introduced:
High-Quality Video Support: 4K + HDR uplink video capability for intelligent immersive calling
Eye Tracking Support: Ability to support eye tracking across different types of smart devices (e.g., AR/VR glasses)
User Intention Understanding: Capability to understand user intention through voice and gesture inputs
Device-Aware QoE: Tiered QoE support that takes the specific device capabilities into account
IMS Extensions: Extensibility of IMS to support these new capabilities
Multi-Media Protocol Extensions: Review of protocol extensions for multi-media transporting
[FS_6G_MED]pCR on Embodied Video for 6G Media
This is a pCR (proposed Change Request) to 3GPP TR 26.870 introducing Embodied Video Internet (EVI) as a new use case for 6G Media studies. The document proposes adding a new clause 6.1 to the technical report, focusing on media requirements for embodied AI systems (robots, UAVs) that actively capture and process video in dynamic environments.
Core Concept: - Defines Embodied AI as integration of AI into physical systems enabling real-world interaction - Introduces paradigm shift from static/passive recording to dynamic/mobile/embodied sensing - Distinguishes between: - Old Paradigm: Fixed cameras with limited FOV and constrained coverage - New Paradigm: Mobile devices (robots, UAVs) as "mobile eyes and limbs" actively exploring environments
Definition: - Embodied Video: Use of 6G networks enabling intelligent agents to capture, process, and react to visual information in real-time within dynamic environments
Extracts and summarizes four relevant use cases from TR 22.870:
Technical Requirements: - Multi-camera systems (6-8 cameras) with concurrent multi-modal data streams (video, point clouds) - Three operational scenarios defined: - Scenario I: 6x 1080p @ 15Hz → 20 Mbps - Scenario II: 4x 1080p + 2x 4K @ 15/30Hz → 60 Mbps - Scenario III: 2x 1080p + 4x 4K @ 15/30Hz → 100 Mbps - Alternative: 4x 1080p + 2x 4K @ 60Hz - E2E RTT: 100-300ms - Compression ratio: 240:1 assumed - Distributed AI inference tasks: multi-modal perception, 3D digital twin modeling, trajectory planning
Media Requirements: - AI codec with error-tolerant capabilities (Grace method) - Real-time processing of high-resolution video and multi-modality data - High uplink data rate and low latency
Application Context: - Real-time infrastructure inspection (utility poles, guardrails) - Security surveillance - Network offloading for resource-intensive video analysis
Media Requirements: - Native integration of video analysis algorithms (object recognition, anomaly detection) - Low latency communication
System Architecture: - Embedded controllers for motion control (walking, grasping) - fast response - Network offloading for computing-intensive tasks (large AI models, control command generation)
KPI Requirements:
| Traffic Type | Message Size | Transfer Interval | Data Rate | E2E Latency | Reliability | |--------------|--------------|-------------------|-----------|-------------|-------------| | UL sensor data | 1250-12500 Bytes | 10 ms | 1-10 Mbps | 100-150 ms | 99.99% | | UL LiDAR | 345600 Bytes | 100 ms | 27.6 Mbps | 100-150 ms | 99.99% | | DL Control command | 625-12500 Bytes | 50 ms | 0.1-2 Mbps | - | - |
Technical Notes: - LiDAR: 10 Hz frame rate, 28800 points/frame, 12 bytes/point - E2E latency breakdown: ~40ms communication + ~100ms AI inference
Media Requirements: - Real-time processing of multi-modality data (video, audio, point clouds, LiDAR)
Operational Concept: - UAVs with built-in AI capabilities for enhanced perception, decision-making, control - Swarm deployment for full area coverage and complex task execution - Network offloading during local computing overload (e.g., HD 3D map generation)
Media Requirements: - Real-time processing of multi-modality data from multiple UAVs
| Use Case | Video Resolution | Data Rate | E2E Latency | Reliability | |----------|------------------|-----------|-------------|-------------| | Traffic surveillance | 1080p | ≥5 Mbps | <100 ms | >99.99% | | Traffic surveillance | 4K | >25 Mbps | <100 ms | >99.99% | | Urban management | 1080p | ≥5 Mbps | 20-100 ms | - | | Event security | 1K | ≥5 Mbps | ≤10 ms | - | | Event security | 4K | ≥25 Mbps | ≤10 ms | - | | Rural inspections | 4K | ≥25 Mbps | <100 ms | - |
| Use Case | Data Type | Data Rate | E2E Latency | |----------|-----------|-----------|-------------| | Topographic surveying | High-res video, LiDAR | ≥30 Mbps | 20-100 ms | | Reconstruction | 4K video | ≥50 Mbps | 20-100 ms | | Mine monitoring | Video, LiDAR, sensor | ≥30 Mbps | 20-100 ms | | Rural governance | High-res video, LiDAR | ≥30 Mbps | 20-100 ms |
Four Key Requirements Identified:
Grace method for better UX vs. traditional codecs
AI-native Video Protocol
New protocol design for AI-driven video systems
Low-latency Video Transmission
Critical for real-time embodied AI operations
QoE Model for Performance Measurement
This pCR establishes foundational requirements for supporting embodied AI systems in 6G media, addressing: - Multi-modal concurrent data streaming - Real-time AI inference offloading - High-reliability, low-latency video transmission - Novel QoE metrics for embodied video applications - AI-native codec and protocol requirements
The document bridges SA1 service requirements with SA4 media specifications, providing concrete KPIs and use case evidence for the FS_6G_MED study.
[FS_DCTC_eQoS_MED] Description of experimental approach and test setup for media transmission for AI inferencing
This Change Request (CR) to TR 26.823 v0.2.0 addresses the currently empty Clause 6.5.1 by providing detailed experimental approaches and test setups for evaluating media transmission for AI inferencing scenarios. The contribution is part of the Study on dynamically changing traffic characteristics and usage of enhanced QoS support in 5GS for media applications and services.
The document proposes two distinct experimental approaches corresponding to the AI inferencing in XR service scenario (Clause 5.6):
NOTE: Due to black box nature, QoE metrics identified in Clause 5.6.3 cannot be measured due to lack of appropriate Observation Points.
Similar to Approach #1: - Baseline: Wired network (ideal conditions) - 5G testing: Test channels or emulated 5G network - Network conditions: Nominal, cell-edge, multi-UE scenarios
NOTE: Test setup may be extended to mimic advanced AI-enabled AR devices (e.g., lightweight AR+AI glasses requiring remote rendering) by adding XR data (e.g., periodic pose information) to AI inference input data in uplink to evaluate impact on traffic characteristics.
| Aspect | Approach #1 (Commercial Apps) | Approach #2 (Standalone Platform) | |--------|-------------------------------|-----------------------------------| | Control | Black box | White box | | QoE Metrics | Not measurable | Measurable | | Traffic Characteristics | Measurable | Measurable | | Flexibility | Limited | High (customizable) | | Realism | High (actual commercial apps) | Medium (mimics commercial apps) | | Metadata Support | No | Yes (per-packet) |
6GMedia - work topic 2- Characteristics of AI-enabled applications
This contribution from InterDigital addresses work topic 2 of the 6GMedia study, focusing on key characteristics of XR and AI-enabled mobile applications and services. The document proposes use cases and elaborates on requirements for interoperable and widespread deployment.
The document identifies several representative use cases:
Key Observations: - AI-enabled applications are highly heterogeneous and multimodal, encompassing video, image, audio, text, haptics, and sensor data - Applications exchange AI/ML data including prompts, model parameters, and compressed/uncompressed intermediate data (embeddings)
Table 1 Analysis provides detailed mapping of: - AR: UL (video, audio, prompt, inference data) / DL (video, audio, dynamic 3D media, haptics, spatial descriptions) - requires MPEG haptics, scene description enhancements, dynamic mesh/gaussian splat codecs - Real-time Object Detection: Feature representations, MPEG-7 descriptors, MPEG FCM - Speech Recognition/Conversational AI: ULBC, tokens, embeddings - Model Learning/Updates: ONNX, GGUF, MPEG NNC formats - Avatar communication: Upcoming MPEG avatar, gaussian and mesh codecs - Context-aware recommendation: W3C Media Annotations, MPEG-7 descriptors
Proposals: - Proposition 1: SA4 should study support of additional media modalities and codecs/enhancements for 6G - Proposition 2: SA4 should define terminology for AI/ML data (features, tokens, embeddings, latent, intent) and study relevant AI representation formats and interchangeable formats/codecs - Observation 2: Some applications require remote AI-based Spatial Computing functions (TR 26.819) - Proposition 3: SA4 should identify and study spatial compute functions benefiting from off-device processing
Traffic Characteristics: - Applications are uplink-heavy with greatly varying characteristics across modalities - Continuous video capture results in high-rate, periodic uplink traffic - Audio/sensor data generates lower-rate, aperiodic, bursty transmissions - Traffic composition changes dynamically based on user behavior, interaction patterns, mobility, and environmental factors
Table 2 Analysis characterizes requirements: - AR, Real-time Object Detection, Avatar communication: High data rate, real-time latency, mid reliability, high need for QoE-based adaptation - Speech Recognition/Conversational AI, Context-aware Recommendation: Mid data rate, real-time latency, mid reliability, mid adaptation need - Model Learning/Updates: High data rate, non-real-time latency, mid reliability, low adaptation need
Key Observations: - Observation 3: Diversity of applications and modalities makes traffic characteristics evaluation/classification challenging - Observation 4: Temporal dependency and synchronization required between media modalities and AI data for real-time/delay-bound AI inference - Observation 5: Applications characterized by uplink-intensive, bursty/continuous, multi-modal traffic with diverse latency sensitivity and QoE impact - Observation 6: Current QoS frameworks lack application/context awareness, granularity, and adaptability for dynamic 6G network conditions
Proposals: - Proposition 4: SA4 should develop generic QoS and QoE mechanisms suitable across diverse traffic patterns - Proposition 5: SA4 should study QoS framework enhancements enabling finer granularity and context awareness - Proposal 6: SA4 should specify procedures for real-time QoE-based adaptation of multimodal media and define QoE metrics for real-time/delay-bound AI inference
Key Points: - Transport protocols (QUIC-based, HTTP/3-based) are rapidly evolving to suit AI-enabled use cases - These evolutions substantially impact traffic characteristics including latency, reliability, and resource utilization - Rel-19 SA2 specified techniques for delivering Media Related Information (MRI) when XRM traffic is end-to-end encrypted (QUIC) - TS 23.501 clause 5.37.9 specifies options for relaying MRI over N6 interface - Rel-18/19 SA4 specified solutions in TS 26.522 enabling RTP senders to transmit MRI using RTP header extensions
Proposals: - Observation 8: New transport protocols impact media transmission reliability, latency, and traffic characteristics - Proposal 7: SA4 should characterize impact of QUIC-based protocols on AI data delivery and traffic characteristics, especially for real-time/delay-bound applications - Observation 9: SA4 has specified RTP-based MRI solutions in TS 26.522 - Proposal 8: SA4 should study integration of SA2-defined QUIC-based transport extensions into media delivery architecture, leveraging FS_Q4RTC-MED study
Key Characteristics: - AI-enabled services deployed across smartphones, AI glasses, smartwatches, fitness devices, companion compute devices - Services involve continuous sensing, media capture/processing, on-device/distributed AI inference, and frequent network data exchange - Services are inherently multi-device with different devices contributing sensing, media, compute, display, or connectivity functions - Introduces QoS/QoE challenges for modality/format adaptation, AI processing coordination with partial/full offload, and traffic correlation across UEs
Figure 1 illustrates UE tethering where AI-enabled services are delivered across multiple user devices relying on a tethered UE for cellular connectivity and coordination.
Observations and Proposals: - Observation 7: AI-enabled services increasingly operate across heterogeneous multi-devices associated with same user; modalities and AI processing may be distributed - Observation 8: Existing system assumptions are UE-centric and don't address QoS/QoE requirements of multi-device scenarios - Proposal 8: SA4 should study impact of multi-devices on QoS and QoE framework - Observation 9: QoS enhancement and QoE-driven dynamic media adaptation need to operate across heterogeneous multi-devices - Proposition 9: SA4 should consider heterogeneous multi-devices for QoE metrics definition and QoS enhancement study for real-time/delay-bound AI inference
The document proposes to discuss and agree on all proposals as part of the 6GMedia study and document them in a new section 6.X of the TR. The contribution emphasizes three main areas requiring SA4 attention: 1. Support for heterogeneous and multimodal media types including AI/ML data 2. Enhanced QoS/QoE frameworks with finer granularity and context awareness 3. Multi-device scenario support for AI-enabled services
On SA4 work on AI traffic characteristics
This contribution from Apple addresses SA4's response to consultation requests from RAN2 and SA2 regarding AI traffic patterns and formats. The document establishes the context that SA4 must make application layer assumptions to develop traffic models for AI services, including LLMs and other AI agents. The paper argues for a specific approach to how these assumptions should be treated within the standardization process.
The document identifies multiple dimensions of AI traffic variation:
AI traffic is characterized as: - Bursty and unpredictable - Event-driven rather than steady-state streaming
To characterize traffic (latency, throughput, periodicity, burstiness), SA4 needs assumptions about data formats. However, the document notes: - Industry currently uses various transport methods - The domain is rapidly evolving - No clear interoperability requirement exists today - Traffic modeling targets deployment scenarios 5+ years in the future - Current leading formats may become obsolete by 6G deployment
The contribution proposes three key principles for SA4:
Non-normative Treatment: AI format assumptions for traffic modeling should be treated as guidance only, not as rigid normative standardization targets. SA4 should avoid normative work on AI formats (actual data packet structure) at this stage to prevent locking specifications into constraints that may not suit future technology evolution.
Focus on Traffic Characteristics: Work should concentrate on traffic characteristics (latency, throughput, periodicity, burstiness) rather than specific coding or file formats used to generate that traffic.
Continuous Review: SA4 should periodically review this approach as the AI traffic and format landscape continues its rapid evolution.
The main technical contribution is a strategic positioning paper that argues against premature normative standardization of AI data formats while supporting the development of traffic models based on reasonable assumptions. The paper advocates for a pragmatic approach that acknowledges the rapid evolution of AI technologies and focuses SA4 efforts on traffic characteristics that will inform network design rather than application-layer format specifications that may quickly become outdated.
6GMedia - AI terminology
This contribution addresses the need for standardized AI terminology in the context of 6GMedia work. The document recognizes that terminology such as tokens and embeddings lacks clarity across delegates and 3GPP working groups, and proposes definitions for AI representation formats to enable assessment of traffic characteristics and impact on SA4 specifications.
The document proposes comprehensive definitions for fundamental AI representation concepts:
The document provides a comprehensive mapping of representation types across different media modalities:
The document proposes distinguishing between: - Internal representation: Representation used by the model or agent for its internal process - Exchangeable/external representation: Format exchanged between two entities (e.g., UL or DL)
A matrix is provided mapping representation formats to internal/external usage, with most entries marked as FFS (For Further Study), except: - Learned based compressed representation: Not internal, external examples include JPEG AI, MPEG AI-PCC - Model exchange representation: Not internal, external examples include ONNX, NNEF, GGUF, NNC
The contribution proposes to include sections 1 to 3 in a relevant section of TR 26.870.
[FS_6G_MED] Consideration on Media Delivery Architecture
The contribution proposes a fundamental shift in 6G media delivery architecture from parallel network usage to functional decomposition and cooperative delivery across heterogeneous networks. Key aspects include:
Media content is decomposed into constituent elements, with each delivered through the most appropriate network:
The architecture enables semantic-aware delivery where:
The architecture applies to:
6G evolves from direct traffic accommodation to:
The 6G media architecture: - Supports coexistence of heterogeneous networks for media services - Separates transmission methods per media element with intelligent integration - Simultaneously supports massive simultaneous viewing and hyper-personalized interactive media - Applicable across multicast/broadcast, streaming, XR, and next-generation immersive media
The proposal maps to existing TS 22.870 requirements:
References performance requirements for various media services on UAM aircraft including: - 8K video live broadcast (100 Mbps uplink, 200ms latency, 95% reliability) - Video streaming (4-100 Mbps depending on resolution, 100ms latency, 95% reliability) - Remote controller through HD video (≥25 Mbps uplink, 100ms latency, 99% reliability) - Video conferencing (25 Mbps bidirectional, 100ms latency, 99% reliability) - Immersive multimedia services/cloud gaming (100-500 Mbps downlink, 50ms latency, 99% reliability)
All services specified for up to 1000m altitude AGL in urban/rural/scenic areas.
overview of inputs to RAN2#133 on AI traffic characteristics
This document provides a summary of contributions submitted to RAN2#133 regarding AI traffic characteristics. Following RAN#110 plenary's assignment of RAN-2 to lead AI traffic characteristics work in RAN and coordinate with SA WG4, this overview aims to align SA4 and RAN-2 work at an early stage. The document explicitly recommends prioritizing discussion around key dependencies identified by RAN-2.
Multiple contributions converge on the following AI traffic characteristics:
Several contributions propose categorization based on timing: - Real-time vs Non-real-time: Most common distinction - Interactive vs Non-interactive: Request/response patterns
Multiple contributions distinguish:
- AI codec traffic: Native AI representation formats
- Non-AI codec traffic: Traditional encoding methods
- Type 1: Real-time AI application with non-AI codec
- Type 2: Real-time AI application with AI codec
- Type 3: Non-real-time AI application
Peng Cheng Lab (R2-2600153) proposes detailed service classes: - Service Class A: Generative AI and AI Agent Traffic (Token-Streaming Inference) - Service Class B: Perception/Analytics AI (Uplink-Intensive Inference), including Split Inference - Service Class C: Federated/Distributed Learning and Training Traffic (Bulk, Synchronized Uploads) - Composite Class D: AI-Enhanced Immersive Communication (XR + Digital Twin + AI Components)
Strong consensus on prioritizing: - Uplink enhancements: Primary focus for Rel-20 - Non-real-time applications: Particularly chatbot/GenAI use cases - Burstiness and unpredictability handling: Leveraging XR Phase 4 work - AI traffic awareness in RAN: Enable service-aware handling
Broader scope proposed for 6G: - Real-time uplink and downlink: Full bidirectional support - Unified framework: Comprehensive AI traffic handling - Native AI communication: AI-native RAN traffic support - Flexible QoS: Dynamic adaptation to AI traffic patterns - Downlink non-real-time: Extended coverage beyond Rel-20
Multiple contributions request SA4 input on:
Visibility of tokens to RAN
Packet-level characteristics:
Packet importance variability
Data compression characteristics: Impact on traffic patterns
Multi-modality aspects: Synchronization requirements and characteristics
Several contributions explicitly reference or request coordination on:
Requests for SA4 input on:
Several contributions propose formal coordination:
Nvidia, Offino, China Telecom, Spreadtrum/UNISOC, Panasonic, Lenovo, CMCC et al., Apple, Nokia, Fujitsu, Samsung, HONOR, Peng Cheng Lab, OPPO et al., Sharp, CATT, vivo
Key aspects: Importance differentiation, error tolerance, dependency, compression, RAN visibility, PDU mapping
Nearly universal recognition of bursty, aperiodic traffic requiring specific RAN enhancements
Strong consensus on Rel-20 focus for uplink mobile AI traffic with burstiness, unpredictability, and interactive characteristics
Fraunhofer, Meta/Qualcomm et al., ZTE, Peng Cheng Lab, Samsung, Lenovo, HONOR, Nokia
Key aspects: Synchronization, MMSID usage, multi-device scenarios
Offino, China Telecom, Spreadtrum/UNISOC, Panasonic, Lenovo, CMCC et al., NTT/Docomo, Fujitsu, Samsung, CATT, vivo, Huawei/HiSilicon
Key aspects: Variable tolerance, task-dependent, token-specific, importance-based
Nvidia, Nokia, ZTE, Peng Cheng Lab, Xiaomi, Huawei/HiSilicon
Key aspects: Context-aware flow, L2 scheduling, UE-assisted coordination
ZTE, Meta/Qualcomm et al., Samsung, Nokia, Apple, HONOR, Huawei/HiSilicon
Key aspects: Flexible adaptation, constrained latency, relative priorities
The document recommends taking into account the explicit dependencies from RAN-2 and prioritizing discussion around these key dependencies, particularly: