All Summaries - Table View

Meeting: TSGS4_135_India | Agenda Item: 11.1

25 documents found

Back to Agenda Card View
TDoc Number Source Title Summarie
Qualcomm Incorporated (Rapporteur)
[FS_6G_MED] Work Plan for Media Aspects for 6G System

Study on Media Aspects for 6G System - Work Plan

1. Introduction and Background

This document presents the work plan for the Feasibility Study on "Media Aspects for 6G System" (FS_6G_MED), approved at SA4#134 (S4-252142) and SA plenary #110 (SP-251652).

Study Objectives

The study aims to:

  1. Document work topics in detail, covering:
  2. WT#1: Media Delivery Architecture
  3. WT#2: 6G Media
  4. WT#3: Media Aspects related to SA2 topics
  5. WT#4: Media for ubiquitous access
  6. WT#5: Trusted and private media communication

  7. Identify dependencies with other WGs and collect information on relevant developments within 3GPP and externally

  8. Map work topics to basic functions and develop high-level call flows based on existing media delivery architectures and 6G design concepts

  9. Identify gaps and opportunities, recommending either:

  10. Further study or normative work (stage-2/stage-3)
  11. Candidate solutions to address issues

  12. Coordinate with SA1, SA2, SA3, SA5, SA6 and external organizations (SVTA, CTA WAVE, ISO/IEC JTC1 SC 29, 5G-MAG, MSF, Khronos, IETF)

Timeline

  • For information: TSG SA#114 (Dec-26)
  • For approval: TSG SA#115 (Mar-27)
  • Rapporteur: Elmira Ramazanirend, Vodafone

2. Context and Justification

2.1 Motivation

The study addresses the need for: - CAPEX/OPEX reduction and monetization opportunities - New services and experiences in the 6G era - System simplification and integration of new technologies - Support for new use cases: ISAC, XR/immersive communication, AI-based services - Improved end-user QoE across diverse devices and network conditions

2.2 Work Topics Details

WT#1: Media Delivery Architecture

Study media delivery architecture aspects for 6G based on TS 26.501, TS 26.506 and new 6G architecture developments. Key aspects: - Accommodation of new 6G use cases by current 5G media delivery architecture - Identification of reusable components from 5G and earlier generations - Architecture simplification for improved deployability/implementability - Further harmonization between streaming and conversational services - Collection of existing/emerging content delivery protocols - Alignment with SA2 6G design concepts - Accommodation of commercially relevant media services

WT#2: 6G Media

Identify trends and expected services related to media, including immersive and AI-related media. Sub-topics:

a) End-to-end service quality: Study aspects for defining end-to-end service quality for media services, including capturing, rendering, and QoE metrics definition

b) Traffic characteristics: Study and identify traffic characteristics of media services and use cases from TR 22.870 to support 6G radio and service architecture design

c) Immersive media formats: Collect, categorize and characterize (3C) emerging media formats (including different media types) for 6G XR/immersive media services, building on TR 26.956

d) Media communication for emerging AI services: Study AI representation formats and traffic characteristics for AI-related media services (agents, multi-modal LLMs, diffusion models), identifying gaps in QoS requirements, dynamic traffic characteristics, or AI-representation format definitions

WT#3: Media Aspects related to SA2 topics

Study media-related impacts from SA2 study topics:

a) AI for 6G: Media-related impacts from "AI for 6G (e.g. AI agent, framework)" aligned with SA2 WT#3

b) Integration of Sensing and Communication: Media-related impacts aligned with SA2 WT#4

c) Data handling: Media-related impacts on data collection, distribution, processing, storage, access and exposure, considering access control/user consent and privacy, aligned with SA2 WT#5

d) Computing: Media-related impacts on computing support for UE and application servers, aligned with SA2 WT#6

Note: Analysis may confirm no impact on SA4 specifications. Topics may be updated based on SA2 decisions.

WT#4: Media for ubiquitous access

Study aspects and opportunities for media services on ubiquitous networks including NTN and other low bit-rate/low power scenarios beyond speech. Focus on supported bitrates, functionalities, delays, power consumption and other design vectors, considering FS_ULBC study information.

WT#5: Trusted and private communication for media

Study aspects and opportunities for trusted and private media communication in applications including generative AI or agent-to-agent communication, covering end-to-end workflows, authentication, trust and exploring 6G's role.

Note: Coordination with SA3 expected on authentication and trust-related topics.

2.3 Supporting Companies

The study has broad industry support from 50+ companies including operators (Vodafone, Deutsche Telekom, AT&T, Orange, China Mobile, NTT), vendors (Ericsson, Nokia, Huawei, Samsung, Qualcomm, MediaTek, Apple), content providers (Dolby, Sony, Tencent, Bytedance), and research organizations.

2.4 Dependencies on Other WGs

SA1

  • SP-241152: New SID on 6G Use Cases and Service Requirements
  • TR 22.870 v1.0.1: Study on 6G Use Cases and Service Requirements

SA2

  • TR 23.801-01 v0.3.0: Study on Architecture for 6G System; Stage 2
  • SP-251633: Revised SID for FS_6G_ARC
  • S2-2511308: Detailed FS_6G_ARC Work Task scopes
  • Note: SA2 6G study completion planned for March 2027 (same as this study); may need postponement to SA#116 to address dependencies

3. Work Organization

3.1 Work Topic Leads and Contributors

A work plan tracking sheet is maintained online at: https://docs.google.com/spreadsheets/d/1AHXc41lTVAJ84ENKfi2GgmpGx26hqHSoNQ7JKxnBNBo/edit?usp=sharing

Snapshot to be provided in published document showing: - Topic title - Lead - TR 26.870 clause - Completion percentage - Rel-21 normative work decisions - Stage-2 impact - Contributors

Current overall progress: 0%

3.2 Working Methods

To be determined.

4. Detailed Work and Time Plan

SA4#134 (Nov 2025, Dallas) - Target: 0%

  • Agree Study Item
  • Initial discussion of time plan
  • Initial discussion of draft TR

SA#110 (Dec 2025, Baltimore)

  • Approve Study Item
  • Assign rapporteurs

3GPP SA4 AHG Telco (Jan 15, 2026) - Target: 3%

Host: Qualcomm, 16:00-17:30 CET - Agree initial work and time plan - Agree skeleton TR 26.870 for SA4#135 submissions - Agree initial working procedures - Prepare initial thoughts for work topics - Identify initial contributors - Submission deadline: Jan 13, 16:00 CET

SA4#135 (Feb 9-13, 2026, Goa) - Target: 10%

  • Agree work and time plan
  • Agree skeleton TR 26.870
  • Agree initial working procedures
  • Identify leads and contributors for work topics
  • Address baseline assumptions (SA1 use cases/requirements, new media trends, principal ideas)
  • Document scope and objectives of work topics in detail
  • Progress TR 26.870
  • Communicate with other 3GPP WGs and external organizations

3GPP SA4 AHG Telco (Feb 25, 2026)

Host: Qualcomm, 16:00-17:30 CET - Preparation for extended AHG meeting on FS_6G_MED - Identify common themes for workshop inputs - Submission deadline: Feb 23, 16:00 CET

SA#111 (Mar 10-13, 2026, Fukuoka)

  • No actions

3GPP SA4 AHG Telco (March 16 & 23, 2026) - Extended AHG Meeting

Host: Qualcomm, 15:00-17:00 CET - Address baseline assumptions clustered in common themes - Identify moderators/leads to summarize common topics - Submission deadline: March 19, 15:00 CET - Note: 5G-MAG workshop on media energy consumption scheduled March 19

SA4#135e-bis (Apr 13-17, 2026, online) - Target: 20%

  • Summary of workshop on 6G Media Topics
  • Progress work topics in detail
  • Start identifying dependencies with other WGs
  • Collect information on developments within 3GPP and externally
  • Progress TR 26.870
  • Communicate with other WGs and external organizations

SA4#136 (May 11-15, 2026, Montreal) - Target: 35%

  • Progress work topics in detail
  • Progress identifying dependencies and collecting information
  • Progress TR 26.870
  • Communicate with other WGs and external organizations

SA#112 (Jun 9-12, 2026, Singapore)

  • No actions

SA4#137-e (Aug 24-28, 2026, online) - Target: 50%

  • Progress work topics in detail
  • Progress identifying dependencies
  • Start mapping work topics to basic functions and develop high-level call flows
  • Start identifying potential gaps and opportunities
  • Progress TR 26.870
  • Communicate with other WGs and external organizations

SA#113 (Sep 15-18, 2026, Madrid)

  • No actions

SA4#138 (Nov 16-20, 2026, Calgary) - Target: 65%

  • Complete work topics documentation
  • Progress identifying dependencies
  • Progress mapping to basic functions and call flows
  • Start identifying gaps and opportunities
  • Start drafting Conclusions
  • Agree TR 26.870 v1.0.0
  • Communicate with other WGs and external organizations

SA#114 (Dec 8-11, 2026, Boston)

  • Present TR 26.870 v1.0.0 for information

SA4#139 (Feb 22-26, 2027, Korea) - Target: 90%

  • Complete work topics documentation
  • Complete identifying dependencies
  • Complete mapping to basic functions and call flows
  • Complete identifying gaps and opportunities
  • Agree on Conclusions
  • Agree TR 26.870 v2.0.0
  • Communicate with other WGs and external organizations

SA#115 (Mar 16-19, 2027, Europe)

  • Present TR 26.870 v2.0.0 for approval
  • Note: As SA2 FS_6G_ARC completes at this time, may require extension until SA#116
VODAFONE Group Plc
TR skeleton for FS_6G_MED

3GPP TR 26.870 - Study on Media Aspects for 6G System (Release 20)

Document Overview

This is an early-stage Technical Report (v0.0.1) establishing the framework for studying media-related aspects in 6G mobile networks. The document is in skeleton form, defining the structure and scope for investigating media opportunities and gaps in the context of 6G systems.

Scope and Objectives

The study aims to:

  • Support 6G studies in other working groups with media-related aspects
  • Identify media-related industry trends from operators, third-party providers, and verticals that may impact 6G media architectures
  • Improve existing services and support new services to meet 6G system requirements
  • Build upon service requirements from SA1 (TR 22.870, TS 22.abc) and architectural enhancements from SA2 (TR 23.801-01)

The conclusions will form the basis for further detailed studies and normative work.

Document Structure

Section 4: Preliminaries

The document establishes three foundational areas:

  1. Assumptions - Common architecture assumptions based on SA2 decisions and existing functions from earlier generations
  2. Requirements - Architectural and media-related requirements from SA1, including associated use cases
  3. Existing Media Services - Review of media services from 4G and 5G, assessing their relevancy and deployment status

Section 5: New Trends and Expected Services

This section will identify media-related industry trends from: - Operators - Third-party providers - Verticals

These trends will inform the impact on 6G media architectures.

Work Topics (Section 6)

The study defines five initial work topics, each structured with: - Description - Key Issues - Context and External Factors - Potential Solutions - Mapping of Issues to Solutions - Conclusions

Work Topic #1: Media Delivery Architecture

Focus on architectural aspects of media delivery in 6G systems.

Work Topic #2: 6G Media

Investigation of new media capabilities and services specific to 6G.

Work Topic #3: Media Aspects Related to SA2 Topics

Coordination with SA2 architectural work to ensure media aspects are properly addressed.

Work Topic #4: Media for Ubiquitous Access

Addressing media delivery across diverse access scenarios in 6G.

Work Topic #5: Trusted and Private Communication for Media

Security and privacy considerations for media services.

Work Topic #X: [Placeholder]

Additional work topics to be defined.

Consolidated Findings and Recommendations

  • Section 7 will consolidate findings from all work topics
  • Section 8 will provide recommendations for follow-up work
  • Annex A will collect supplementary background information for selected work topics

Key References

The study builds upon: - TR 22.870: Study on 6G Use Cases and Service Requirements - TR 23.801-01: Study on Architecture for 6G System Stage 2 - TS 22.ABC: 6G System Requirements (to be replaced with normative specification) - TS 26.501: 5G Media Streaming (5GMS) architecture - TS 26.506: 5G Real-time Media Communication Architecture

Current Status

This is the initial skeleton version (0.0.1) from February 2026, SA4#135. All technical content sections contain editor's notes indicating that detailed content is yet to be developed. The document establishes the framework for comprehensive study of media aspects in 6G systems, with work to be populated in subsequent versions.

Qualcomm Incorporated (Rapporteur)
[FS_6G_MED] Some considerations on ways of working

Summary of S4-260057: Ways of Working for FS_6G_MED

Introduction and Background

This document proposes ways of working for the Feasibility Study on "Media Aspects for 6G System" (FS_6G_MED), which was agreed at SA4#134 (S4-252142) and approved by SA plenary #110 (SP-251652). The study aims to deliver TR 26.870 by TSG SA#115 (Mar-27), with an information milestone at TSG SA#114 (Dec-26).

The study encompasses five work topics: - WT#1: Media Delivery Architecture - WT#2: 6G Media - WT#3: Media Aspects related to SA2 topics - WT#4: Media for ubiquitous access - WT#5: Trusted and private media communication

Basic Assumptions

The document establishes several foundational principles:

  • Monitoring Other WGs: Track work and timelines in other working groups to identify impacts on SA4-defined services
  • 6G Media is 5G Media Unless Agreed Differently: Improvements, new features, or changes to 5G media require motivation (e.g., new service requirements, gap analysis, optimization potentials). This ensures no disruption to existing services while enabling improvements for 6G
  • Reuse of Other 5G Media Studies: Collaboration with ongoing or completed 5G media studies and relevant organizations is encouraged
  • Work Topic Prioritization: Not all work topics have equal priority. Prioritization based on requests from other WGs, dependencies, service requirements, expected load, and company support

Work Towards SA4 Terms of Reference and DNA

The study should align with SA4's Terms of Reference (SP-241362) and create value within 3GPP and the broader ecosystem. Key objectives include:

  • Addressing market needs, deployment feasibility, sustainability, innovation platform, monetization opportunities, cost-consciousness
  • Timeliness in execution
  • Developing specifications against meaningful KPIs for media services
  • Implementability (test, evaluation, code, reference software)
  • Collaboration with industry partners and other SDOs (IETF, MPEG, Khronos)
  • Developer-friendly outputs (APIs, code, examples, git-environments)

Organization of Work

The document proposes a flexible organizational approach:

  • Initial Phase: Start by collecting input and documenting proposals opportunistically, allowing for both holistic and key-issue-based documentation
  • Rapid Response Capability: Structure work to enable quick prioritization if SA plenary or other WGs require rapid progress (e.g., for AI traffic)
  • Two Main Approaches:
  • Key Issues: Identify concrete open issues or optimization potential with candidate solutions (suitable for WT1 and WT3)
  • Larger-scoped Work Topics: Address broad scope as sub-studies within the overall 6G media study (suitable for WT4, WT5, and potentially WT2)
  • Documentation: Each work topic may use a dedicated Annex in TR 26.870 to collect agreements
  • Early TR Development: Initiate TR with sections on assumptions, requirements, existing media services, and industry trends

TR 26.870 Structure and Rationale

The proposed TR structure includes:

  • Introduction and scope
  • Clauses addressing assumptions, use cases and requirements (from SA1), and existing media services
  • Sections for industry trends and references to other studies
  • Main body divided into work topics (clustered by key issues or objectives)
  • Clause for consolidated findings and recommendations
  • Annexes for specific work topics if needed

The rationale emphasizes: - Opportunistic and flexible approach - Acceptance of slide decks and workshop-style contributions - Accommodation of both flat organization (key issues) and detailed, objective-driven sections - Serving as a baseline for future work, not a gating factor - Prioritization and flexibility, especially for topics with external pressure

Main Work Topics and Initial Priorities

Work Topic 1: Media Delivery Architecture

Priority: Low to Medium - Initial work based on new needs or SA2 aspects not yet studied - Monitoring existing or new dependencies on SA2 - Some work in 5G Advanced addresses enhancements (FS_AMD_Ph2)

Work Topic 2: 6G Media

Priority: Highest - Highest complexity and breadth - Includes traffic characteristics, QoE, and other aspects - Requires significant focus and input to define scope and approach

Work Topic 3: Media Aspects Related to SA2 Topics

Priority: Minimal unless clear dependency identified - Monitor progress in SA2 - Key issues to monitor in TR 23.801-01: - Key Issue #20: Integrated Sensing and Communication (collection and transport of sensing data) - Key Issue #21: 6G data framework (data storage, retrieval, quality, latency, volume) - Key Issue #22: 6G Computing Support

Work Topic 4: Media for Ubiquitous Access

Priority: Treated as near-separate study - Leverage ULBC work - Focus on data rates, scheduling, and implications for media services over NTN (GEO and LEO)

Work Topic 5: Trusted and Private Media Communication

Priority: Medium - Requires definition of key questions and scoping - May evolve as separate study - Contribution-driven priority - Should be driven by use cases

Handling AI Traffic Characteristics

AI traffic characteristics are identified as requiring higher priority:

  • Accelerated Action Required: Due to interest from SA plenary and other WGs; SA4 expected to take the lead for media-related aspects
  • Coordination Essential: Active coordination with SA1 (requirements) and SA2 (architecture)
  • Deliverable: Define "AI formats" (tokens, embeddings, latents, compressed representations) and develop framework for AI traffic characteristics
  • First version ready for SA plenary reporting within a few months
  • Requires commitment from multiple contributors
  • Sharing Results: Results and frameworks should be shared with other WGs and SA plenary for generalization/adaptation
  • Network Independence: Traffic characteristics should be developed independent of access network to identify opportunities for future networks
  • Realistic Applications: Analysis should cover:
  • Generative AI services (LLMs, image/video generation)
  • Agentic AI patterns (multi-step tool calling, tool server workflows)
  • Different AI applications, formats, AI codecs, and protocols
  • QoE/User Experience under emulated network conditions (latency, loss, bandwidth)
  • Measurements of existing applications under various network conditions
  • Subjective and objective quality measures (SA4 core strength and differentiator)
  • Input Welcome: Inputs for AI Traffic Characteristics as part of WT2 are welcome to SA4#135

Larger Themes

To market SA4 and 3GPP work for 6G, larger themes and KPIs should be identified. Example themes include:

  • User Experience-based services and applications
  • Networking AI-media services for improved experiences
  • Trusted media services and applications
  • Developer and implementer-friendly specifications

Input for new or larger themes is welcomed.

Proposals

The document proposes to:

  1. Agree on basic assumptions in clause 2 and document in TR 26.870
  2. Use SA4 TOR as guidance for FS_6G_MED study work
  3. Agree on organization of work based on clause 4 (covered by work plan in S4aP26002)
  4. Agree on TR 26.870 skeleton based on rationales in clause 5 (separate contribution S4aP26004)
  5. Agree on main work topics and priorities provided in clause 6
  6. Agree on handling of AI traffic characteristics based on clause 7
  7. Invite contributions on larger themes for 6G, using clause 8 as starting point
Qualcomm Korea
[FS_6G_MED] Preliminaries: assumptions and requirements

Summary of S4-260058: Preliminaries - Assumptions and Requirements for 6G Media Delivery

Overview

This change request introduces foundational content for the new FS_6G_MED (6G Media Delivery) study in TR 26.870. It establishes the preliminary assumptions, requirements framework, and baseline for existing media services that will inform the 6G media architecture work.

Main Technical Contributions

References Section

Adds comprehensive normative and informative references including: - Core 6G specifications (TR 22.870, TR 23.801-01, TS 22.ABC) - Existing 5G media specifications (TS 26.501, TS 26.506, TS 26.511, TS 26.512, TS 26.114, TS 26.117, TS 26.510, TS 26.517, TS 26.502, TS 26.143)

Clause 4.1: Assumptions

Architecture Assumptions (from SA2)

Establishes baseline architectural assumptions carried forward from TR 23.801-01:

  1. SBA Framework: 5GC Service-Based Architecture as starting point
  2. Single Core Network: 6G RAN connects to single 6G CN type (standalone architecture)
  3. RAN/CN Split: Maintains 5GS RAN and CN functionality split
  4. No Duplication: Avoids functionality duplication between 6G RAN and 6G CN
  5. IMS for Real-time Services: MMTel voice/video services provided by IMS
  6. Native TN/NTN Support: Both Terrestrial and Non-Terrestrial Networks natively supported

Key Issues Baseline

Identifies SA2 key issues serving as baseline for 6G media delivery: - Key Issue #3: Network Slicing - Key Issue #4: User Plane Architecture - Key Issue #5: QoS Framework - Key Issue #7: Network Exposure - Key Issue #11: Non-3GPP access support - Key Issue #12: Voice Services - Key Issue #15: Messaging Services - Key Issue #19: 6G Network for AI - Key Issue #20: Integrated Sensing and Communication - Key Issue #21: 6G data framework - Key Issue #22: 6G Computing Support - Key Issue #23: 6G NTN Support

Media-Specific Assumptions

Establishes 5G media specifications as starting points:

  1. Common Architecture: TS 26.501 (5GMS) and TS 26.506 (5G RTC) architectures as baseline
  2. Streaming Codecs/Formats: TS 26.511 as starting point for 6G media streaming
  3. Streaming Protocols: TS 26.512 as baseline for 6G media streaming protocols
  4. RTC Protocols/Codecs/Formats: TS 26.114 as baseline for Voice Services
  5. Messaging Codecs/Formats: TS 26.143 as baseline for Messaging Services

Clause 4.2: Requirements

Placeholder for architectural and media-related requirements from SA1, including use cases and requirements.

Clause 4.3: Existing Media Services

Media Services Classification

Defines 3GPP media services structure across working groups: - SA1: Service definition - SA2: Architectural support - SA4: Media technologies (protocols, codecs, formats, QoE)

Distinguishes between: - Full media services: Complete 3GPP-defined services - Media service enablers: Components enabling interoperability for third-party services

Relevant Media Services (Clause 4.3.2)

Identifies existing services for 6G consideration:

  1. IMS-Based Multimedia Telephony and Communication Services
  2. SA1: Service requirements
  3. SA2: System architecture for multimedia services
  4. SA4: TS 26.114 (codecs, formats, protocols, QoE, telephony acoustics)

  5. XR (AR/VR/MR) Media Services

  6. SA1: Use cases and Stage-1 requirements
  7. SA2: 5G XR/media architecture improvements (multi-modal QoS, information exposure)
  8. SA4: XR traffic characteristics, formats, codecs, streaming frameworks

Media Service Enablers (Clauses 4.3.3 and 4.3.4)

Lists existing 3GPP media service enabler components:

  • Real-time Communication beyond IMS
  • 5G Media Streaming: TS 26.501, TS 26.510, TS 26.511, TS 26.512, TS 26.117
  • Messaging Media Profiles: TS 26.143
  • MBS User Services: TS 26.502, TS 26.517
  • Formal MSE: TS 26.565 (Split Rendering MSE)

References TR 26.857 for formalized Media Service Enabler framework.

Editorial Notes

Multiple editor's notes indicate areas requiring further development: - Additional assumptions may be added - Requirements clause to be populated with SA1 inputs - Additional media services and enablers to be added - Existing media services status (relevancy/deployments) to be identified

Qualcomm Korea
[FS_6G_MED] Considerations on Work Topic 4: Ubiquitous access

Summary of S4-260059: Considerations on Work Topic 4 - Ubiquitous Access

Document Overview

This contribution from Qualcomm introduces initial considerations for Work Topic #4 (WT#4) "Media for Ubiquitous Access" in the FS_6G_MED study (TR 26.870). The document serves as a starting point for discussions on media services support in ubiquitous networks, particularly Non-Terrestrial Networks (NTN) and low bit-rate/low power scenarios beyond speech.

Main Technical Contributions

Work Topic Scope and Objectives

The document clarifies the primary focus of WT#4: - Study aspects and opportunities for media services on ubiquitous networks including NTN - Address low bit-rate/low power scenarios beyond speech - Identify supported bitrates, functionalities, delays, power consumption and other design vectors - Take into account information collected in the FS_ULBC study

Study Methodology

The contribution proposes a structured approach with the following elements:

  1. Documentation of work topics in detail, particularly their relation to media delivery
  2. Identification of dependencies with other working groups and collection of relevant developments within 3GPP and externally
  3. Mapping to basic functions based on existing media delivery architectures (5GMS, 5G RTC) and SA2 architectures
  4. Gap analysis to identify potential solutions requiring further study or normative work
  5. Coordination with other 3GPP groups (SA1, SA2, SA3, SA5, SA6, RAN) and external organizations (SVTA, CTA WAVE, ISO/IEC JTC1 SC29, 5G-MAG, MSF, Khronos, IETF)

References Section Updates

The CR adds relevant normative and informative references: - TR 22.870 (SA1 6G use cases and service requirements) - TR 23.801-01 (SA2 6G architecture study) - TS 26.501 (5GMS architecture) - TS 26.506 (5G RTC architecture) - Multiple existing SA4 specifications (TS 26.114, 26.117, 26.502, 26.510, 26.511, 26.512, 26.517) - ULBC study reference

Technical Content Structure

High-level Description

Focuses on studying aspects and opportunities for media services on ubiquitous networks, with emphasis on: - Non-Terrestrial Networks - Low bit-rate/low power scenarios beyond speech - Identification of supported bitrates, functionalities, delays, power consumption - Leveraging FS_ULBC study findings

Potentially Relevant Use Cases and Requirements

Placeholder for SA1 use cases (to be completed)

Potentially Relevant 6G Architecture Key Issues

Identifies three key issues from TR 23.801-01: - Key Issue #4: User Plane Architecture - Key Issue #23: Support of 6G NTN - Key Issue #24: Analyse 5GS IoT features and solutions

Initial Key Issues for Study

The document proposes two initial key issues as starting points:

  1. Channel Characteristics: What are bitrate ranges, latencies and loss characteristics of relevant 3GPP Non-Terrestrial Networks?

  2. Service Performance: How would existing services and applications perform under such channel conditions?

Editorial Notes

The document includes several editor's notes indicating: - More content to be added to the key issues section - Need to complete use cases and requirements by checking SA1 work - The subsection ordering may be adapted as appropriate for specific content

Dependencies and Coordination

The study explicitly requires coordination with: - 3GPP groups: SA1 (TR 22.870), SA2 (TR 23.801-01), RAN (TR 38.960), SA3, SA5, SA6 - External organizations: SVTA, CTA WAVE, ISO/IEC JTC1 SC 29, 5G-MAG, Metaverse Standards Forum, Khronos, IETF

IIT Bombay, Free Stream Technologies, One media 3.0
[FS_6G_MED] Requirements and associated use cases

Summary of 3GPP Change Request S4-260060

Document Overview

This pseudo-CR proposes draft content for Sub-clause 4.2 of TR 26.870 (FS_6G_MED), establishing preliminary requirements for the 6G media study. The contribution identifies media-related requirements derived from SA1 TR 22.870 use cases covering System and Operational Aspects (Clause 5) and AI (Clause 6).

Main Technical Contributions

1. References Update (Clause 2)

Adds normative and informative references essential for 6G media requirements:

  • New Normative References:
  • TR 22.870: 6G Use Cases and Service Requirements
  • TR 23.801-01: Architecture for 6G System Stage 2
  • TS 26.501: 5GMS General description and architecture
  • TS 26.506: 5G Real-time Media Communication Architecture
  • TS 22.ABC: 6G System Requirements

  • New Informative References:

  • TS 22.101: Service principles
  • TS 22.261: 5G system service requirements
  • TS 22.173: IMS Multimedia Telephony Service
  • TS 22.153: Multimedia priority service

2. Requirements Framework (Clause 4.2)

Establishes structure for media-related requirements with an editor's note indicating revision pending SA1 consolidated 6G requirements completion.

3. System and Operational Aspects Requirements (Clause 4.2.1)

3.1 Non-3GPP Access Support (Clause 4.2.1.1)

Based on TR 22.870 Clause 5.3, identifies requirements for interoperability across access technologies:

  • Requirement 1: Support user access to network services via 3GPP and/or non-3GPP access (e.g., WLAN, Wireline)
  • Requirement 2: Support 5G system requirements for non-3GPP access
  • Requirement 3: Support mobility between 6G 3GPP and non-3GPP access with minimal user experience impact (QoS, QoE), subject to operator policy
  • Requirement 4: Provide means for UEs to determine appropriate access technology (3GPP vs non-3GPP) when congestion detected, subject to operator policy (initial access only)

Key Application: Leveraging non-3GPP technologies (ATSC, DVB) for multicast/broadcast delivery, complemented by 3GPP unicast for repair and UE join procedures.

3.2 Interworking with Legacy Systems (Clause 4.2.1.2)

Based on TR 22.870 Clause 5.4.2, identifies media-related capabilities requiring continuity from 5G:

  • Multimedia communication services (TS 22.101, TS 22.261, TS 22.173)
  • Broadcast and Multicast Services (TS 22.261)
  • Multimedia Priority Service (MPS) (TS 22.153)

3.3 Network Aspects (Clause 4.2.1.3)

Based on "Enhanced Network Service Awareness" use case (TR 22.870 Clause 5.9.8), addressing challenges of 6G services with multiple data streams (video, voice, text, sensor data, AI, sensing) having varied traffic patterns and QoS requirements:

  • [PR 5.9.8.2-1]: Support authorized 3rd party service providers to provide service characteristics information for each traffic flow component to the 6G network (subject to operator policy)
  • [PR 5.9.8.2-2]: Support mechanisms to dynamically adjust and optimize network resources based on service characteristics, including predicted changes (subject to operator policy)
  • [PR 5.9.8.2-3]: Provide appropriate charging support for differentiated services per media component (e.g., SLA requirements)

4. AI-Related Media Requirements (Clause 4.2.2)

Derived from multiple AI use cases in TR 22.870 Clause 6:

4.1 Network-Assisted Video-Based AI Inference (Clause 6.28)

  • [PR 6.28.6-1]: Support means for authorized 3rd party to provide expected communication performance requirements (e.g., max/average packet error rate per video frame) for enhanced user experience (subject to operator policy and 3rd party agreement)

4.2 UE-Network Collaboration with AI Capabilities (Clause 6.31)

  • [PR 6.31.6-1]: Network or application enablement layer shall manage and coordinate AI tasks considering AI workload offloading into Service Hosting Environment

4.3 AI-Assisted Multi-Modal Communication Service (Clause 6.42)

  • [PR 6.42.6-1]: Support media transformations in multi-modal communication services, including:
  • Image to video
  • 2D video to 3D video/avatar media
  • Text to video and vice-versa

(Subject to operator policy and user consent; transformations via AI capabilities of 3rd party or 6G network including IMS)

4.4 Real-Time Video Super-Resolution Service (Clause 6.50)

  • [PR 6.50.6-1]: Manage and coordinate network operations (AI model training/selection, computing resource selection, communication performance monitoring) upon receiving combined 3GPP service requests (subject to operator policy)
  • [PR 6.50.6-2]: Support mechanism to guarantee user experience when providing combined 3GPP services (e.g., 6G AI service + communication service) (subject to operator policy)
  • [PR 6.50.6-3]: Monitor performance (e.g., AI model inference accuracy) and report to 3rd party (subject to operator policy)

Key Technical Themes

  1. Multi-access convergence: Seamless integration of 3GPP and non-3GPP access for media delivery
  2. Dynamic resource management: Fine-grained, per-component QoS handling for complex multi-stream services
  3. AI integration: Network support for AI-enhanced media processing, including offloading, transformation, and super-resolution
  4. 3rd party collaboration: Framework for authorized service providers to interact with network for optimized media delivery
  5. Legacy continuity: Maintaining 5G media capabilities while extending for 6G scenarios
Qualcomm Korea
[FS_6G_MED] Considerations on Work Topic 1: Media Delivery Architecture

Summary of S4-260061: Considerations on Work Topic 1 - Media Delivery Architecture

Document Overview

This contribution from Qualcomm provides initial considerations and structure for Work Topic #1 (Media Delivery Architecture) in the FS_6G_MED study item. The document proposes text for TR 26.870 v0.0.1, establishing the foundation for studying media delivery architecture aspects for 6G systems.

Main Technical Contributions

1. High-Level Description and Scope

The contribution establishes that the work topic will study a harmonized media delivery architecture for 6G based on: - TS 26.501 (5G Media Streaming architecture) - TS 26.506 (5G Real-time Media Communication Architecture) - New 6G architecture developments from TR 23.801-01

Key aspects to be studied include: - Assessment of whether current 5G media delivery architecture functionalities accommodate new 6G use cases - Identification of reusable and improvable components from 5G and earlier generations - Architecture simplification for improved deployability and implementability - Further harmonization of media delivery architecture for streaming and conversational services - Collection and enablement of relevant existing and emerging content delivery protocols in 6G - Alignment with SA2's 6G design concepts - Accommodation of commercially relevant media services and evolving standardization activities

2. Dependencies and Context

Relevant SA2 Key Issues from TR 23.801-01: - Key Issue #2: SBA framework - Key Issue #3: Support of Network Slicing in the 6G system - Key Issue #4: User Plane Architecture - Key Issue #5: QoS Framework for 6G - Key Issue #7: Network Exposure - Key Issue #17: Migration and Interworking

External coordination needed with: - SA1 (use cases and requirements - TR 22.870) - SA2 (architecture - TR 23.801-01) - RAN (TR 38.960) - External organizations: SVTA, CTA WAVE, ISO/IEC JTC1 SC 29, 5G-MAG, Metaverse Standards Forum, Khronos, IETF

3. Media Application Service Models

The contribution proposes documenting several conceptual models to define how applications can benefit from the media delivery architecture:

3.1 Generalized Media Application Service Model

Defines an 8-point model describing: - Service Data Flows traversing the User Plane between Media Client and Media AS at reference point M4 - Support for bidirectional media content flow - Multiplexing of Application Data Flows onto Service Data Flows - Different service location endpoints during a session - Mapping to different deployed servers (physical or virtual) - Mapping to one or multiple physical network interfaces based on ANDSP/URSP - Traversal of different Data Networks and Access Networks - Service Data Flow migration due to mobility

3.2 Downlink Media Streaming Application Service Model

Provides a structured table defining: - AS instance: An AS instance in the Media Delivery System deployment - Content Hosting Configuration: Corresponding to a single Provisioning Session in the AF - Distribution Configuration: Modeling a service location exposed by the AS instance - Downlink media streaming session: Modeling a session that may span multiple content items - Service Data Flow: An HTTP connection at reference point M4d (IP 5-tuple) - Application Data Flow: Series of media segment download requests, with support for multiplexing multiple HTTP requests on the same connection (e.g., different DASH Adaptation Sets on HTTP/3)

3.3 Uplink Media Streaming Application Service Model

Provides similar structured table for uplink with: - 5GMSu AS instance: Edge AS instance in the 5GMS System deployment - Content Publishing Configuration: Corresponding to a Provisioning Session in the 5GMSu AF - Contribution Configuration: Modeling a service location exposed by the AS instance - Uplink media streaming session: Modeling a session that may span multiple content items - Service Data Flow: HTTP connection at reference point M4u (IP 5-tuple) - Application Data Flow: Series of media segment upload requests with multiplexing support

Key technical details: - Finest granularity visible to 5GMS AS is the Service Data Flow (HTTP connection) - Service Data Flows associated with sessions via media delivery session identifier in HTTP request headers (per TS 26.512 clause 6.2.3.6) - Service location (HTTP authority and URL path) enables association with Distribution/Contribution Configuration

3.4 Real-time Communication Application Service Model

Placeholder for future completion.

4. Key Issues Identified

Two initial key issues are identified for study:

  1. Should the media delivery architecture for streaming and real-time communication services be harmonized or separated?

  2. What relevant existing and emerging content delivery protocols would map to the 5G media delivery architecture, and what extensions or simplifications can be done for 6G Media Delivery?

5. References

The contribution adds relevant normative and informative references including: - TR 22.870 (SA1 6G use cases) - TR 23.801-01 (SA2 6G architecture) - TS 26.501, 26.506 (5G media delivery architectures) - Multiple TS 26.5xx series specifications - Placeholder for TS 22.ABC (6G System Requirements)

Editorial Notes

The document includes several editor's notes indicating areas requiring future work: - Completion of potentially relevant use cases and requirements from SA1 - Possible addition of more application models - Completion of real-time communication application service model - Addition of more key issues

HuaWei Technologies Co., Ltd
pCR [FS_6G_MED] Considerations on Work Topic 1: Media Delivery requirements for intelligent immersive calling

Summary of pCR [FS_6G_MED] Considerations on Work Topic 1: Media Delivery Requirements for Intelligent Immersive Calling

Document Information

  • Source: Huawei
  • Document Number: S4-260080
  • Meeting: TSG-SA4 Meeting #135, February 9-13, 2026, Goa, India
  • Specification: 3GPP TR 26.870
  • Purpose: Approval

Introduction and Rationale

This pCR proposes updates to clause 4.2 of TR 26.870, introducing new requirements for intelligent immersive calling services. The contribution is motivated by SA1's work on TR 22.870, which identified use cases for immersive communication leveraging 6G technologies and AI capabilities. The focus is on enabling wearable devices (AR/VR glasses) and smart devices (smart TVs, watches) to provide intelligent immersive calling services, particularly targeting the aging population.

Main Technical Contributions

Use Case: Intelligent Immersive Calling Services

Core Concept

The use case describes one-to-one communication with multiple devices and media rendering capabilities (split-rendering or spatial computing rendering), where multi-modality data (video/text/audio/avatar) is transmitted. The service is empowered by AI capabilities including generative AI and multi-modal models, taking input from various smart devices and sensors.

User Story 1: Medical Consultation Scenario

  • Participants: User A (elderly patient) and User B (medical examiner)
  • Setup: User A uses smart TV, cameras, medical sensors, and smart watches in her living room; User B uses VR glasses
  • Media Flow:
  • Uplink: Smart TV and cameras send video data to 6G system
  • Processing: System performs transcoding from video to immersive media codecs to reconstruct User A's living room
  • Downlink: Rendered immersive media sent to User B's VR glasses
  • Additional data: Smart watches and medical sensors collect and transmit medical data (blood pressure, heart rate)
  • Data presentation: Medical data transformed according to user intent (e.g., text-to-video) but not rendered if user chooses not to share

User Story 2: Eye-Contact Enablement

  • Challenge: VR glasses cover part of user's face
  • Solution:
  • Camera in VR glasses collects eye tracking data
  • Network receives eye tracking data and user's facial picture
  • Real-time face rendering performed by network
  • Rendered video sent downlink to other user
  • Result: User A can see User B's face with proper eye contact on her smart TV

User Story 3: Intent Understanding

  • Capability: User conveys intentions via voice/gestures
  • Processing:
  • Device captures media data and sends to network
  • Network interprets data using AI technologies (e.g., LLM)
  • System fulfills user intention
  • Example: User B requests medical data (blood pressure, heart rate) via voice or gesture; 6G system interprets intent and renders scene with requested medical data; further UI adjustments possible through gesture or voice

User Story 4: Multi-Party Scenario

  • Participant: User C (User A's son) joins using smart AR glass
  • Capability: Can adjust video stream from right to left camera via gestures for monitoring purposes

Requirements Identified

The following technical requirements are introduced for intelligent immersive calling/conferencing:

  1. Support for 4K + HDR uplink video for intelligent immersive calling
  2. Eye tracking capability support
  3. User intention understanding capability (via voice/gesture)
  4. Tiered QoE support taking device capabilities into account
  5. IMS extension support - protocol extensions should be possible
  6. Multi-media transporting - review protocol for extensions

Technical Implications

The use case demonstrates requirements for: - Multi-device coordination and synchronization - Real-time video transcoding to immersive media formats - AI-based processing for face rendering, intent recognition, and scene reconstruction - Multi-modal data integration (video, audio, sensor data, biometric data) - Device-aware QoE management - Network-based rendering and processing capabilities - Protocol extensions to support new modalities and interaction paradigms

Huawei Tech.(UK) Co.. Ltd
Media related real-time AI traffic Characteristics

Summary of S4-260094: Media Related Real-Time AI Traffic Characteristics

Document Overview

This is a pseudo Change Request (pCR) from Huawei/HiSilicon to TSG-SA WG4 Meeting #135, proposing to add a new clause on end-to-end real-time multi-modal AI traffic characteristics to a 6G media-related TR. The document follows the methodology established in TR 26.926 for traffic modeling and quality evaluation.

Main Objective

The document aims to characterize AI traffic for 6G use cases in real-time video conferencing and robotics by defining end-to-end architecture, procedures, content coding models, and delivery mechanisms for real-time AI inference applications.

Technical Contributions

End-to-End Architecture (Clause 6.2.6.X.1)

  • Core Concept: Multimodal Large Language Models (MLM) incorporating different AI encoders/decoders for various modalities (text, image, video, audio)
  • Architecture Components:
  • UE/client implements AI encoding and packetization
  • Application Server (AS) implements AI decoding
  • Media-related AI service request/response model
  • Key Innovation: Introduction of "native AI data units" - a new media format generated by AI encoders that can be used for media reconstruction, generation, and comprehension
  • Compatibility Handling: AI decoder at AS may be needed if UE's AI encoder is not compatible with AS's AI model; otherwise, encoded data can be processed directly

Basic Procedures (Clause 6.2.6.X.2)

The document defines a 10-step call flow: 1. UE connects and provides supported AI encoder information 2. AS configures AI model and corresponding decoder 3. Operational flow includes: - Media data collection and AI encoding at UE - Packetization using native or customized packet format - Transmission to AS - Optional AI decoding at AS (if compatibility required) - Media-related response generation and transmission back to UE - Response decoding and presentation at UE

Content Coding Model (Clause 6.2.6.X.3)

Two types of AI encoders are defined:

Type 1: Reconstruction-Oriented AI Encoders

  • Examples: DVC, GRACE codec
  • GRACE Codec Details:
  • Input: 2×H×W×3 tensors (two consecutive frames)
  • Encoder: Analyzes inter-frame differences (motion vectors and residuals), maps to compact latent representation
  • Resilience mechanism: Latent randomly split into multiple chunks, individually entropy coded to prevent error propagation
  • Decoder: Entropy decoding, latent reorganization, lost chunks set to zeros, graceful quality degradation without cliff effect

Type 2: AI Model Processing-Oriented Encoders

  • Examples: VILA-U, Liquid, Chameleon, Emu3, VQGAN
  • Processing Flow:
  • Pre-processing to predefined sizes (256×256 or 512×512 pixels, RGB)
  • Feature extraction via CNN or Transformer layers
  • Quantization to AI data units
  • Joint optimization with associated AI decoder
  • Benefits:
  • Distributed AI workload with privacy-sensitive offloading
  • Direct AI model processing without decompression/re-encoding
  • Reduced data size, latency, and bandwidth
  • Unified format for multiple modalities

Content Delivery Model (Clause 6.2.6.X.4)

  • Protocol Selection: RTP over UDP for real-time delivery
  • Packetization Approaches:

For Reconstruction-Oriented Encoders:

  • Latent chunks treated as NALUs
  • NALU aggregation or fragmentation for MTU (typically 1500 bytes)
  • Customized NALU headers for AI codec characteristics
  • Standard RTP/UDP/IP header structure

For AI Model Processing Encoders:

  • AI data units grouped as payload with customized payload header
  • Group size determined by protocol overhead and integration efficiency
  • AI data unit group limited to single IP packet size

AI Transmission Characteristics (Clause 6.2.6.X.5)

Three key characteristics identified:

1. Data Bursts and Periodicity

  • Burst pattern linked to intrinsic framerate of multi-modal media
  • Uplink: Periodicity matches video frame rate
  • Downlink: Related to AI model inference speed
  • Data rate depends on AI encoder output dimension and quantization parameters

2. Low Latency Requirements

  • Tight end-to-end latency for conferencing and robotics
  • Network latency budget constrained by AS processing time for large AI models

3. Error Resilience

  • Packet success rate requirement linked to AI service characteristics
  • Error Tolerance Examples:
  • Autoregressive models: Can predict missing AI data units; reasonable quality maintained even with data unit loss
  • GRACE codec: Trained for error resilience; maintains good SSIM with packet errors
  • GenAI applications: High error rates (≤20%) tolerable with UE-side recovery

Differentiated Importance

  • Cross-modality: Image AI data units more error-tolerant than text
  • Intra-modality: Positional importance for image data units (preceding units more critical than subsequent ones)

Example KPIs for GenAI Applications

| Traffic Type | Burst Size | Max Latency | Service Bit Rate | Delay | Payload Error Rate | |--------------|------------|-------------|------------------|-------|-------------------| | Image GenAI | 15 KB | 15 ms | 8 Mbps | 20 ms | ≤20% | | Video GenAI | 1.5 MB | 100 ms | 120 Mbps | 20 ms | ≤20% | | Chatbot | 0.5 KB | 20 ms | 200 Kbps | 30 ms | ≤20% |

Evaluation Methodology (Clause 6.2.6.X.6)

  • Simulation of packet loss and jitter
  • P-Trace derivation from RTP header information (Sequence Number, Timestamp, Marker Bit)
  • Packet size obtained at UDP layer
  • Packet arrival time recorded at receiving port
  • AI service-specific quality evaluation based on successful task completion

Summary and Network Implications (Clause 6.2.6.X.7)

The document concludes that AI traffic characteristics can be leveraged in 3GPP networks to improve transmission efficiency: - RAN awareness of latency requirements, packet arrival patterns, error tolerance, and differentiated importance - Enhanced operations: Improved scheduling and HARQ operations - System capacity: Potential to increase supported number of UEs

References Added

The document adds seven new normative/informative references including: - TR 22.870 (6G Use Cases) - TR 26.926 (Traffic Models and Quality Evaluation) - Various academic papers on neural codecs (GRACE, Liquid, DVC) - RP-253288 on AI services for 6G

Huawei Tech.(UK) Co.. Ltd
Neural Network Based Video Codec Architecture and Support for Error Resilience

Summary of S4-260095: Neural Network Based Video Codec Architecture and Support for Error Resilience

Document Overview

This contribution proposes documenting neural network-based codec (NNC) architectures and their error resilience capabilities in the 6G Media study (FS_6G_MED). The document focuses on two specific NNC implementations: DVC and GRACE codecs, highlighting their potential relevance for 6G deployments targeting 2030.

Main Technical Contributions

DVC Codec Architecture

The document describes the DVC (Deep Video Compression) codec proposed by Guo Lu et al. (2019), which represents a hybrid approach to neural network-based video coding:

Key Architecture Features: - Replaces traditional video coding components with neural network equivalents while maintaining the overall predictive coding architecture - Uses CNN models for optical flow estimation in motion estimation and compression - Implements neural network-based motion compensation to generate predicted frames - Maintains functional similarity between traditional and NNC components

Joint Optimization Approach: The codec jointly trains/optimizes multiple components: - Motion estimation - Motion compensation - Residual compression - Quantization and bit-rate estimation

Performance: - Achieves competitive results with H.264 and H.265 - Publicly available source code and research paper - Similar approaches adopted in industry (Deep Render codec in FFMPEG and VLC)

GRACE Codec and Error Resilience Extensions

The document presents GRACE codec (Yihua Cheng et al. 2025) as an extension of DVC with enhanced error resilience:

Channel-Aware Training: - Jointly trains encoder and decoder under simulated packet loss conditions - Enables codec awareness of specific loss patterns - Implements channel-aware source coding design

Technical Implementation: - Encodes each frame as a tensor split into independently decodable sub-tensors - Uses arithmetic coding mapped to packets - Tested across wide range of loss rates - Includes lighter profiles (GRACE-lite) for mobile devices

Performance Validation: - User study with 240 crowdsourced participants - Tested 61 videos under realistic conditions - Used Google GCC to emulate WebRTC congestion control - Channel conditions: LTE and broadband traces (0.2-8 Mbps, 100ms end-to-end delay) - MOS scores up to 38% better than H.264/H.265 with AL-FEC and error concealment

Key Performance Improvements: - Exceptional reduction in tail latency - Reduced non-rendered frames - Reduced stalls per second - Improved video smoothness

Hardware Requirements: - Original GRACE: NVIDIA A40 GPU (31.2-51.2 fps) - GRACE-lite: Real-time capable on current mobile devices

Identified Limitations

Content Specificity: - NNC performance may be content-specific due to training data dependencies

Reconstruction Challenges: - Potential reconstruction failures due to non-bit-exact arithmetic operations in GPU frameworks - Issues with floating-point arithmetic and convolution operations - Currently under discussion in SC29 (media standards organization) - Identified as potential key enabler requiring resolution for future NNC codec adoption

Proposals

The document makes two specific proposals:

  1. Documentation Request: Document NNC features and their application to error-resilient AI traffic in the 6G MED TR under 6G Media (based on clauses 2 and 3)

  2. Use Case Consideration: Include the use case of NNC with channel-aware source coding training in AI traffic characteristics

Text Proposal Structure

The contribution includes specific text proposals for: - Change 1: Addition of two references to the normative references section - Change 2: New clause 6.2.4.X under Work topic #2d (AI Traffic Characteristics) containing the technical description of DVC and GRACE codecs, including architecture diagrams and performance characteristics

Huawei Tech.(UK) Co.. Ltd
Survey of Native AI formats for multi-modal AI

Survey of Native AI Formats for Multi-modal AI

1. Introduction

This document surveys AI native formats for addressing generic AI-related tasks including generation, comprehension, information retrieval, and recommendation in advanced multimedia use cases (multi-modal AI).

Key Differences from 5G Work

  • Broader scope: Covers generic applications beyond detection/segmentation/tracking, focusing on reconstruction, comprehension, recommendation, and information retrieval
  • Multi-Modal LLM support: Enables use of Multi-Modal Large Language Models
  • Generic multi-modal formats: Combines text, image, video, and audio modalities
  • Alternative split inference approach: Uses AI native format generation and AI pre-training instead of model-splitting

Standardization Considerations

The document acknowledges that standardization of such formats is challenging due to: - Constantly evolving field - Task-specific nature of AI native formats

However, SA4 should: - Document and study these formats in FS_6G_MED - Track progress in this area - Understand characteristics and QoS requirements relevant to 3GPP networks - Consider these formats when analyzing AI traffic characteristics

2. AI Processing and Related Native AI Formats

Overview

Recent advances in AI, particularly Large Language Models (LLMs) and Multi-Modal LLMs, enable new applications in generation, comprehension, information retrieval, and recommendation. Multi-modal LLMs require AI-related pre-processing to create AI native formats.

Reasons for AI Split Processing and Native AI Formatting

  1. Workload distribution: Offload privacy-sensitive and computationally intensive parts (similar to 5G split inferencing)
  2. LLM/MLM compatibility: Enable input suitable for auto-regressive models (discrete information in vectors)
  3. Modality combination: Merge text, image, video into relevant features
  4. Data reduction: Reduce data size, latency, and bandwidth requirements
  5. Task optimization: Optimize for different tasks (reconstruction vs. comprehension)

General AI Processing Architecture

The document presents a comprehensive survey based on [Jian Jia et al. 2025], extended with 2025 techniques, showing:

Input → Encoder → Latent Vector (z) → Quantization → Decoder → Output

With supervision feedback loop for training.

Encoder Techniques

Transformer [Vaswani et al 2017]

  • Attention-based model with significant performance enhancement
  • Text: Processed by tokenization then fed to transformers
  • 2D input: Segmented into patches, treated as sequences
  • 3D input: Sliced temporally, 2D patches represented as 3D tubes [Wang et al 2024b]
  • Handles large parameter sizes efficiently
  • Increasingly popular

Convolutional Neural Network (CNN) [O'Shea 2015]

  • Popular for image feature extraction (e.g., UNet [Ronneberger et al 2015])
  • Used for audio intermediate formats
  • Extends to video via 3D-CNN incorporating temporal dimension

Multi-Layer Perceptron (MLP)

  • Earlier architecture for embeddings in recommender systems
  • Used for latent space mapping [Rajput et al. 2023; Singh et al. 2024]

Decoder Processing

  • Applies related transforms for reconstruction
  • May include different models for specific tasks (generation, recommendation, information retrieval)
  • Can be jointly optimized with encoder

Supervision

  • Minimizes error between decoder output and encoder input
  • For reconstruction: Uses metrics like L2 norm distance
  • For comprehension/information retrieval: Requires additional ground truth information

Application Types

  1. Generation: Reconstruction or generating related content
  2. Comprehension: Understanding input (textual description, labeling)
  3. Information Retrieval: Retrieve related documents using semantic features
  4. Recommendation: Provide recommendations (mainly based on historical behavior)

Different applications use different decoder models and encoder processing, making native AI formats often task-specific.

Quantization Techniques

Following [Jian Jia et al. 2025], the survey identifies:

  • Vector Quantization (VQ): Vanilla vector quantization [Juang and Gray, 1982] using minimum distance codebook entry
  • Level Wise Quantization (RQ): Quantization error based on current level (smaller error for smaller values)
  • Group-wise Quantization (GRVQ): Splits vector into sub-components, quantizes separately
  • Lookup Free Quantization (LFQ): Quantization without specific lookup table
  • Finite Scalar Quantization (FSQ): Projects vector to few dimensions for rounded representation

Note: Some tokenizers may not use quantization and rely on floating-point arithmetic.

AI-Based Codecs

Native AI formats have been used to develop codecs: - JPEG AI [ISO/IEC 6048-1] - Deep Render codec (from InterDigital, available on FFMPEG and VLC platforms)

3. Survey of AI Processing for Native AI Formats

Comprehensive Survey Table

The document provides an extensive table (Table 1) surveying AI pre-processing techniques with the following characteristics:

Image Modality Techniques

  • VQVAE [Van Den Oord et al., 2017]: CNN encoder, VQ, Generation
  • VQGAN [Esser et al., 2021]: CNN encoder, VQ, Generation
  • ViT-VQGAN [Yu et al., 2022]: Transformer encoder, VQ, Generation
  • RQVAE [Lee et al., 2022]: CNN encoder, RQ, Generation
  • LQAE [Liu et al., 2023]: CNN encoder, VQ, Generation
  • SEED [Ge et al., 2024]: Transformer encoder, VQ, Generation & Comprehension
  • TiTok [Yu et al., 2024b]: Transformer encoder, VQ, Generation
  • Spectral image tokenizer [Esteves et al 2025]: Transformer encoder, VQ, Generation
  • Subobject-level [Chen et al 2024]: Transformer/CNN encoder, VQ, Generation
  • One-d-piece [Miwa et al 2025]: Transformer encoder, VQ, Generation & Comprehension
  • Semhitok [Chen et al 2025]: Transformer encoder, LFQ, Generation & Comprehension
  • Ming-univision [Z Huang et al]: Transformer encoder, N/A, Generation & Comprehension
  • SetTok [Geng et al 2025]: Transformer encoder, LFQ, Generation
  • UniTok [Chuofan Ma et al 2020]: CNN encoder, LFQ, Generation & Understanding
  • GloTok [Zhao et al 2025]: Transformer encoder, VQ, Generation
  • GaussianToken [Jiajun et al 2025]: CNN encoder, VQ, Generation
  • CAT [Shen et al 2025]: Transformer/CNN encoder, VQ, Generation
  • OpenAI CLIP: CNN encoder, N/A, Generation & Comprehension
  • JPEG AI [ISO/IEC 6048-1]: CNN/Transformer encoder, VQ, Reconstruction & Comprehension

Image & Video Modality Techniques

  • MAGVIT [Yu et al., 2023]: CNN encoder, VQ, Generation
  • MAGVIT-v2 [Yu et al., 2024a]: CNN encoder, LFQ, Generation
  • OmniTokenizer [Wang et al., 2024b]: Transformer encoder, VQ, Generation
  • SweetTokenizer [Tan et al., 2024]: Transformer encoder, VQ, Generation & Comprehension
  • Atoken [J Lu et al 2025]: Transformer encoder, FSQ, Generation & Comprehension
  • HieraTok [Chen et al 2025]: Transformer encoder, VQ, Generation

Video Modality Techniques

  • Cosmos [NVIDIA, 2025]: CNN/Transformer encoder, FSQ, Generation
  • VidTok [Tang et al., 2024]: CNN encoder, FSQ, Generation
  • Video-LaViT [Jin et al., 2024b]: Transformer encoder, VQ, Generation & Comprehension
  • Grace [Cheng et. al 2024]: CNN encoder, VQ, Reconstruction
  • MPEG FCM [Eimond et al 2025]: CNN encoder, VQ, Comprehension

Multi-Modal (Image/Audio/Text) Techniques

  • TEAL [Yang et al., 2023b]: Transformer encoder, VQ, Comprehension
  • AnyGPT [Zhan et al., 2024]: Transformer encoder, VQ, Generation & Comprehension
  • LaViT [Jin et al., 2024c]: Transformer encoder, VQ, Generation & Comprehension
  • ElasticTok [Yan et al., 2024]: Transformer encoder, VQ/FSQ, Generation & Comprehension
  • Chameleon [Team, 2024]: CNN/Transformer encoder, VQ, Generation & Comprehension
  • ShowO [Xie et al., 2024]: CNN/Transformer encoder, LFQ, Generation & Comprehension

Audio Modality Techniques

  • SoundStream [Zeghidour et al., 2021]: CNN encoder, RQ, Generation
  • HiFiCodec [Yang et al., 2023a]: CNN encoder, GRVQ, Generation
  • RepCodec [Huang et al., 2024]: CNN/Transformer encoder, RQ, Comprehension
  • SpeechTokenizer [Zhang et al., 2024]: CNN/Transformer encoder, RQ, Generation & Comprehension
  • NeuralSpeech-3 [Ju et al., 2024]: CNN/Transformer encoder, VQ, Generation & Comprehension
  • iRVQGAN [Kumar et al., 2024]: CNN encoder, RQ, Generation

Text-Based Recommendation Systems

  • TIGER [Rajput et al., 2023]: MLP encoder, RQ, Recommendation
  • SPM-SID [Singh et al., 2024]: MLP encoder, RQ, Recommendation
  • TokenRec [Qu et al., 2024]: MLP encoder, VQ, Recommendation
  • VQ-Rec [Hou et al., 2023]: MLP encoder, RQ, Recommendation
  • LC-Rec [Zheng et al., 2024]: MLP encoder, RQ, Recommendation
  • LETTER [Wang et al., 2024c]: MLP encoder, RQ, Recommendation
  • CoST [Zhu et al., 2024]: MLP encoder, RQ, Recommendation
  • ColaRec [Wang et al., 2024d]: MLP encoder, VQ, Recommendation
  • SEATER [Si et al., 2024]: MLP encoder, VQ, Recommendation
  • QARM [Luo et al., 2024]: MLP encoder, VQ, Recommendation

Text-Based Information Retrieval

  • DSI [Tay et al., 2022]: Transformer encoder, VQ, Information Retrieval
  • Ultron [Zhou et al., 2022]: Transformer encoder, RQ, Information Retrieval
  • GenRet [Sun et al., 2024]: Transformer encoder, VQ, Information Retrieval
  • LMINDEXER [Jin et al., 2024a]: Transformer encoder, VQ, Information Retrieval
  • RIPOR [Zeng et al., 2024]: Transformer encoder, RQ, Information Retrieval

4. Conclusions and Proposals

Proposals

a) AI Traffic Characteristics: Take this information into account when developing an overview of AI traffic characteristics with native AI format or codec besides options for traditional codec.

b) 6G Split Inferencing: Consider that split operation may include AI processing/formatting in addition to traditional model splitting considered in 5G.

c) TR Update: Add text and diagram based on clause 2 to TR for FS_6G_MED.

Proposed Change Requests

Change 1: References Addition

Add references [x1] through [x10] to the TR, including key papers on: - Discrete tokenizers survey - JPEG AI standard - Quantization techniques - Transformer architectures - CNN architectures - Specific implementations

Change 2: New Clause 6.2.4.X - Native AI Formats

Add comprehensive new clause under "AI Traffic Characteristics" covering: - Overview of multi-modal AI and native formats - Reasons for AI split processing - Encoder techniques (Transformer, CNN, MLP) - Decoder processing - Supervision methods - Application types - Quantization techniques - AI-based codecs

This clause provides the technical foundation for understanding native AI formats in the context of 6G media services.

Huawei Tech.(UK) Co.. Ltd
Embodied AI use case and related requirements

Comprehensive Summary of S4-260097: Embodied AI Use Case and Requirements

1. Introduction and Background

This document builds upon previous work from SA4#134 (S4-251826) and TR 22.870 clause 6.28, which established the importance of embodied AI for the FS_6G_MED study. The paper represents a paradigm shift from static observation sensors (fixed cameras with limited fields of view) to mobile embodied sensors (robots, UAVs) that actively interact with and explore physical environments. This shift is aligned with recent industry developments including NVIDIA's Isaac GR00T project and ITU-T SG21 workshop discussions.

The core use case involves devices equipped with multiple cameras capturing and uploading multi-modal concurrent data streams (video, point clouds) for network-based AI inference supporting tasks like multi-modal perception, 3D digital twin modeling, trajectory planning, and task orchestration across educational, home, industrial, and hazardous environments.

2. Example Embodied AI Tasks

The document provides detailed descriptions of four state-of-the-art embodied AI tasks based on current research:

2.1 Task Example 1: Explore and Explain

An autonomous agent explores previously unknown environments while providing natural language descriptions at key moments. The approach uses: - Curiosity-based exploration using forward/inverse dynamics models with neural network embeddings - Surprisal value (L2 norm between predicted and actual embeddings) as reward function - Speaker policy triggered by depth or curiosity thresholds - Transformer-based captioning model with self-attention

Evaluation metrics: Average surprisal score, coverage measure (intersection with ground-truth semantic classes), diversity score for consecutive captions.

2.2 Task Example 2: Spot the Difference

Agent identifies differences between an outdated map and current environment state, combining exploration with spatial reasoning.

Evaluation metrics: - Percentage of navigable area seen (Seen%) - Detection accuracy (Acc%) - Intersection over Union (IoU) for changed elements - Separate IoU+ (added objects) and IoU- (removed objects) - mAcc and mIoU (computed only on visited space)

2.3 Task Example 3: Indoor Exploration

Fundamental task for acquiring spatial information using deep reinforcement learning with intrinsic rewards (curiosity, novelty, coverage). Architecture comprises: - CNN-based mapper - Pose estimator - Hierarchical navigation policy

Evaluation metrics: IoU between reconstructed and ground-truth maps, map accuracy (m²), area seen (AS), free/occupied space metrics (FIoU, OIoU, FAS, OAS), mean positioning error.

2.4 Task Example 4: Vision and Language Navigation (VLN)

Agent navigates to target destination guided only by natural language instructions, using: - 360° panorama encoding in 12×3 grid with 2048-dimensional feature maps - Attention mechanisms for instruction interpretation - Low-level actions (rotate, tilt, step ahead)

Evaluation metrics: Navigation error (NE), oracle success rate (OSR), success rate (SR), success rate weighted by path length (SPL).

3. Key Observations from Task Analysis

Observation 1: AI processing may occur at cloud/server, requiring transmission of either raw visual data (with standard compression) or pre-processed data (embeddings).

Observation 2: Cloud-based implementation requires low latency connectivity and error resilience for real-time navigation and environmental interaction.

Observation 3: Evaluation methods are highly task-dependent with different metrics for different tasks.

4. Deployment Scenarios and Rationale for Cloud Offloading

The document identifies specific scenarios where cloud-based AI processing is preferable:

  • Hazardous environments: Keep robots simple/light to reduce vulnerability
  • Industrial environments: Centralize AI processing for multiple robots using similar processing
  • Home settings: Centralize processing at cloud or local gateway for multiple coupled devices
  • Educational settings: Centralized AI models serving many students/learners

5. Transmission Formats and Network Requirements

5.1 Current Requirements from TR 22.870

For 6-8 cameras using 3GPP codecs (e.g., HEVC): - Peak data rates: 20-100 Mbit - Direction: Uplink - Characteristics: Bursty, ultra-low latency

Observation 4: Offloaded embodied AI may demand uplink bit-rates of 20-100 Mbit.

5.2 Alternative Transmission Options

The document presents three transmission format categories:

| Transmission Format | UE Requirements | Network Requirements | |---------------------|-----------------|----------------------| | 3GPP codec (HEVC) | Support HEVC encoding and transmission | ~20-100 Mbit peak, bursty, uplink, ultra-low latency | | Standardized Feature map/codec (MPEG VCM/FCM, JPEG AI) | Support standard-based feature/image codec | Unknown peak bit-rate, bursty, ultra-low latency uplink | | Proprietary/open source (embeddings, tokenizers) | Compute representation in software and transmit | Unknown, bursty, ultra-low latency uplink, efficient transmission support needed |

Observation 5: More investigation of proprietary and standardized feature map codecs is needed to support this use case.

6. Proposed Text for TR (CHANGE 1)

The document proposes new clause 4.2.2.X "Requirements for embodied AI" incorporating: - Summary of example tasks (explore and explain, spot the difference, indoor exploration, vision and language navigation) - All five observations regarding cloud processing scenarios, latency requirements, task-dependent evaluation, uplink bit-rate demands, and codec investigation needs - Complete transmission format comparison table - Rationale for cloud offloading in different scenarios

7. Proposals

  1. Take embodied AI requirements into account in FS_6G_MED, particularly real-time AI inference requirements
  2. Document the simplified embodied AI use case and related requirements based on the proposed text in clause 8

Technical Contributions Summary

This document makes significant contributions by: - Providing concrete, research-backed examples of embodied AI tasks with detailed technical descriptions - Establishing task-specific evaluation methodologies and metrics - Identifying network requirements for cloud-offloaded embodied AI (20-100 Mbit uplink, ultra-low latency, error resilience) - Analyzing alternative transmission formats beyond traditional video codecs (feature maps, embeddings) - Justifying cloud-based processing for specific deployment scenarios - Proposing specific text additions to the TR for FS_6G_MED study

Huawei Tech.(UK) Co.. Ltd
demonstration of real-time ai codec transmission in WebRTC

Summary of S4-260098: Demonstration of Real-Time AI Codec Transmission in WebRTC

Document Overview

Source: Huawei, HiSilicon
Meeting: SA4 #135, Goa, India (9-13 Feb 2026)
Work Item: FS_6G_MED / Rel-20
Purpose: Demonstration of AI codec for real-time AI traffic over WebRTC

Main Technical Contribution

This document presents a practical demonstration of end-to-end AI media delivery using WebRTC, specifically implementing an AI-codec video streaming system with RTP. The demonstration proves the feasibility of real-time AI codec-based traffic transmission over WebRTC infrastructure.

Implementation Framework

Tools and Components

The implementation utilizes three key tools:

  • aiortc: Python-native WebRTC/ORTC library serving as the foundational media transport framework
  • Wireshark: Network protocol analyzer for capturing and inspecting RTP traffic traces for performance auditing
  • clumsy: Network simulation utility for injecting controlled packet loss and jitter for resilience testing

Technical Implementation Steps

Step 1: AI Video Codec Registration in aiortc

  • Extended the original aiortc framework which only supported legacy codecs (VP8, H264)
  • Registered new AI video codec including:
  • Codec name
  • Encoding function
  • Decoding function
  • Enabled codec recognition during SDP negotiation
  • Mapped encoding/decoding functions to transmitter and receiver operations

Step 2: Custom RTP Payload Format Design

Encoding Process: - Video converted to bits frame-by-frame through encoder neural network processing and entropy encoding - Codec-specific metadata carried in RTP Payload Header

Payload Format Structure: [[Latent Shape | Hyperprior Byte Length | Latent Byte Length] | [Hyperprior Bytes | Latent Bytes]]

Payload Components: - Latent Shape: Shape of the latent representation - Hyperprior Byte Length: Length of hyperprior parameter bytes (used for probability distributions in entropy coding) - Latent Byte Length: Length of latent representation bytes

Step 3: RTP Packing, Transmission, and Unpacking

Transmission Side: - Large payloads fragmented due to MTU limitations - aiortc automatically appends standard RTP Header to each fragment - RTP packets transmitted with congestion control

Reception Side: - RTP packets buffered and reorganized per frame by aiortc - Packets parsed according to agreed format - Video frame restoration through entropy decoding and decoder neural network processing - Error resilient codec compensates for potential packet loss

Step 4: Traffic Trace Analysis

Testing Methodology: - Random packet loss simulated using clumsy software - Wireshark captures received packets at receiver - Analysis based on RTP Header fields: Timestamps, Sequence Numbers, Marker Bits

Traffic Characteristics Analyzed: - Packet loss situation per frame - Performance of restored video frames - Packet size distribution - Packet arrival patterns - Packet success rate requirements

Demo Implementation Details

Current Implementation Status: - Actual AI codec deployed (preliminary version) - Uses bmshj2018_factorized model [R1] instead of Grace for moderate fps on CPU - Low-resolution video used due to computational constraints - End-to-end link feasibility proven

Demo Versions Provided: 1. With packet loss: Simulated using clumsy; RTP retransmission enabled; packet loss causes slight stuttering (error recovery not yet implemented) 2. Without packet loss: Clean transmission demonstration

Proposals

  1. Take this approach into account as it demonstrates real-time AI codec-based traffic over WebRTC
  2. Consider the feasibility of this approach for generating traces for real-time AI traffic

Reference

[R1] https://arxiv.org/abs/1802.01436 (bmshj2018_factorized model)

Nokia
[FS_6G_MED] Discussion on AI traffic trends

Discussion on AI Traffic Trends for 6G Media

1. Introduction

This contribution addresses Work Task 2b (Traffic characteristics) and 2d (Media communication for emerging AI services) from the 6G Media study, focusing on AI media traffic analysis. The document provides insights into popular AI applications, their traffic generation patterns, and proposes organizational structure for the TR clauses.

2. Impact of AI Applications on Mobile Traffic

2.1 General Observations

  • AI-powered applications are emerging as contributors to mobile data traffic
  • As of 2025, AI-related network traffic across mobile carriers is still in preliminary phases
  • Expected to become significant contributor to 6G network traffic as adoption increases

2.2 Current AI Traffic Types

Four broad categories of consumer AI applications identified:

  1. Chat and conversation: Text-based chats with general-purpose chatbots (e.g., ChatGPT) and voice conversations with AI services. Includes task-specific use cases (scene recognition, solving handwritten math problems). Some use cases take images as conditioning input, increasing UL data volumes and rates.

  2. Document generation: Creation of longer texts and formatted documents (PDFs, presentations). Prompts include text, voice, documents, and images.

  3. Image generation: Creation of images from scratch based on prompts and AI-powered image manipulation. Entertainment-driven adoption among younger demographics. Heavy traffic impact on network.

  4. Video generation: AI-based video creation. Throughput-intensive in DL, while image inputs drive relatively high UL volumes.

Key Technical Characteristics: - Cloud-based AI inferencing creates bursts in uplink traffic - Uses existing web-based protocols (e.g., WebRTC for live audio/video) - Existing codecs (AVC, HEVC) used for encoding before transport - Text and images are base-64 encoded and encapsulated in JSON (OpenAI API, Gemini API) - Agentic AI apps becoming more common

2.3 AI Traffic Trends

Shifting UL/DL Ratios: - Uplink data growing faster than downlink traffic - Driven by conditioning inputs (images) transmitted to AI inference factories - Data volume spread per app session documented (Figure 1 reference)

Rising Data Volumes: - Multi-modal, user-friendly experiences increasing overall traffic - Users "talking with their data" and interacting with AI assistants - Sharing photos and videos from smartphones to refine prompts

Sensitivity to Latency: - Conversational AI services respond non-linearly to extended latency - Application-level reaction to network conditions varies by application - Example case study: AI-app responded linearly for inserted latency up to 0.5s, then non-linear response begins. At ~1.5s inserted latency, response time grew by almost twice the inserted latency (Figure 2 reference)

Agentic AI Opportunities: - AI agents can shift inference loads and network traffic away from peak hours - Operating in scheduled, off-peak cycles ensures results ready when needed while avoiding congestion

Current Traffic Statistics: - AI traffic constitutes 0.06% of total traffic in observed mobile network - 74% downlink, 26% uplink

2.4 Agentic AI and Traffic Impact

Architecture: - LLM-driven autonomous agent architecture with LLM as core reasoning engine - Additional components for planning, memory management, and interaction with external tools - Multi-agent systems with collaborative reasoning, persistent memory, and autonomous decision-making

Operational Characteristics: - Agentic tasks span multiple steps: data search, analysis, document generation in defined formats - Example: PDF-format travel plans including flights, accommodation, meeting schedules, budget limitations - Tested AI agents typically operated 10-20 minutes - Data volumes roughly in line with other AI apps analyzed - More data-rich outputs, partially offset by interim step results not sent to smartphone

Protocols for Agentic Communication:

  1. Remote Procedure Calls (RPC): Run tasks on remote servers

  2. Model Context Protocol (MCP): Open-source standard for connecting AI applications (e.g., LLMs like Claude, ChatGPT) to external systems (local files, databases), tools (search engines, calculators), and workflows. Uses JSON-RPC 2.0 as underlying RPC protocol.

  3. Agent2Agent (A2A) Protocol: Open standard enabling seamless communication and collaboration between AI agents to solve complex tasks. Complementary to MCP. Provides standard methods and data structures for agent-to-agent communication over HTTPS, irrespective of underlying implementation. MCP can expose AI agents as tools to other agents, while A2A provides inter-agent communication.

NOTE: Agentic AI apps, like other AI apps, typically use existing transport protocols (e.g., HTTPS for A2A, JSON-RPC for MCP) and data types (e.g., encoded audio, video, text) for data exchange over the network.

3. Proposals

The document proposes the following agreements:

  1. Add Clause 2 content to TR 26.870 clause 6.2 as a basis for further work

  2. Take into account that current AIML traffic reuses existing protocols and formats (i.e., audio, video, text over HTTP, RTP, etc.)

  3. Agree to prioritize characterization of existing popular AI apps and provide initial analysis to SA by June 2026

4. References

  • [1] Nokia: "The impact of AI-apps on mobile networks", Nov 2025
  • [2] Ericsson: "GenAI data traffic today – Ericsson Mobility Report", June 2025
  • [3] Model Context Protocol documentation
  • [4] Agent2agent Protocol documentation
Nokia
[FS_6G_MED] LLM-based AI services

LLM-based AI Services for 6G Media Study

Introduction

This contribution addresses Work Task 2 objective (d) of FS_6G_MED, which focuses on media communication for emerging AI services. The objective aims to:

  • Collect and study AI representation formats and traffic characteristics for AI-related media services
  • Examine use cases including agents, multi-modal large language models, and diffusion models
  • Identify gaps in 3GPP specifications (e.g., QoS requirements, dynamic traffic characteristics, AI-representation formats)

The contribution notes that SA1 TR 22.870 contains over 60 AI-related use cases, many referencing "tokens" as basic units for Gen-AI models. While tokenized traffic over networks is not yet widely deployed, the fast-paced research warrants SA4's attention to elaborate these terms and establish a framework.

Generic Workflow and Architecture for LLMs

Background on LLMs and MLLMs

The document proposes a more generic architecture than the voice translation-specific model in TR 26.847. Key definitions:

  • Large Language Models (LLM): AI systems capable of processing and generating natural language, based on transformer architecture with self-attention
  • Generative Pre-trained Transformers (GPT): Type of LLM forming the basis of modern AI systems (ChatGPT, Gemini, Deepseek, Claude)
  • Multimodal Large Language Models (MLLM): Models processing multiple input/output modalities (text, images, audio, video) with learned cross-modal alignment

Proposed Generic Architecture

The contribution presents a generic (M)LLM architecture (Figure X.1) with the following components:

Input Processing: - Tokenizer: Function that converts data of a particular modality into tokens (e.g., words, image patches) - Modality Encoder: AI/ML model that encodes tokens into token embeddings (e.g., OpenAI's CLIP for images and text)

Processing: - Combination Layer: Combines input token embeddings with contextual token embeddings, potentially using techniques like RAG for context window management

Output Processing: - Media Decoder/Generator: Processes LLM output token embeddings into desired format (e.g., natural language)

Key Definitions

Tokens: Discrete units of information in a given modality (words in text, audio frames, image patches) representing meaningful components of AI/ML data with clearly defined boundaries.

Token embeddings (or embeddings): Dense numerical tensors encoding semantic properties, relationships, and contextual meaning of tokens. Transform discrete tokens into continuous mathematical spaces where semantic relationships can be computed through vector operations.

Important Notes

NOTE 1: Current popular AI applications do not generate network traffic composed of token embeddings. Feasibility of such transport using existing protocols is FFS.

NOTE 2: Modern LLM services charge based on number of tokens processed (outcome of modality encoding and combination layers), but user input consists of traditional media (text, images, audio) in the form of prompts.

NOTE 3: In current AI applications, all components on the server side (right of dashed line in architecture) run on the server.

Abbreviations

  • LLM: Large Language Model
  • MLLM: Multimodal Large Language Model
  • RAG: Retrieval-Augmented Generation

Proposal

The document proposes to discuss and agree on the generic architecture and definitions for LLM-based AI applications in Clause 3 as a basis for further work in the study.

Qualcomm Atheros, Inc.
[FS_6G_MED] Testbed for AI Media Services traffic characterization

Summary of S4-260114: Testbed for AI Media Services Traffic Characterization

Introduction and Motivation

This contribution from Qualcomm proposes a comprehensive testbed framework for characterizing traffic patterns and QoE metrics of generative AI services in the context of the FS_6G_MED study. The testbed addresses the need for quantitative characterization of AI-native media services under diverse network conditions, which is a key requirement for the Study on Media Aspects for 6G System.

The testbed provides end-to-end measurement capabilities for multiple AI service types: - Chat services - Streaming services - Agentic tool use - Image generation - Multimodal analysis - Real-time conversational AI

Key Technical Capabilities

Supported Metrics

The testbed captures comprehensive performance metrics including: - Latency metrics: TTFT (Time To First Token), TTLT (Time To Last Token), latency percentiles - Traffic metrics: UL/DL bytes and ratios, burstiness - Performance metrics: Success rate, token rate, tool-call latency, streaming stall statistics - Protocol analysis: All pcap-enabled analysis capabilities

Trace Logging

Deep visibility into protocol and payload behavior is provided through trace logging functionality, which can be enabled via TRACE_PAYLOADS=1. This enables generation of: - WebRTC SDP samples - Exact computer-use request/response payloads

Architecture and Implementation

Modular Design

The testbed follows an orchestrator-centric architecture with clear separation of concerns:

  • orchestrator.py: Coordinates scenario runs, applies network profiles, handles retries, and generates reports
  • scenarios/*: Implements traffic patterns for different AI service types (chat, agent, direct search, realtime, multimodal, image, video, computer use)
  • clients/*: Provides provider adapters for OpenAI®, Gemini®, DeepSeek® (OpenAI-compatible), and vLLM for self-hosted models
  • netem: External dependency on the proposed common network emulator module [1]
  • capture/*: Provides L3/L4 pcap capture and L7 capture via mitmproxy
  • analysis/*: Logs to SQLite, computes 3GPP-aligned metrics, and generates plots

Extensibility

The framework is designed for easy extension: - New scenarios: Create a class extending BaseScenario, register in scenarios/__init__.py, and add YAML entry in configs/scenarios.yaml - New providers: Implement a client subclassing LLMClient and register in the orchestrator client factory

Self-Hosted Model Support

The testbed includes vLLM client support (clients/vllm_client.py) enabling evaluation of self-hosted models via OpenAI-compatible API, with the same metrics and logging pipeline as hosted providers.

Usage and Configuration

Configuration Management

  • Scenarios and models configured in configs/scenarios.yaml
  • Network profiles configured in configs/profiles.yaml

Execution Options

  • Single scenario: python orchestrator.py --scenario chat_basic --profile 5g_urban --runs 10
  • Full matrix: python orchestrator.py --scenario all --runs 5
  • Enable L3/L4 capture: --capture-pcap
  • Enable L7 capture: --capture-l7

Initial Results

The contribution includes preliminary evaluation results showing: - TTFT (Time To First Token) measurements across different scenarios - Average throughput measurements by scenario

Note: These initial results are presented as examples and are not intended for TR documentation.

Proposal

The contribution proposes that SA4: - Agrees to adopt the proposed testbed as the baseline for AI traffic characterization evaluation - Documents the testbed in TR 26.870 (Study on Media Aspects for 6G System)

References

The contribution references: - [1] S4-260xxx: Generic Network Interface Emulator for Media Delivery Evaluation - [2] SP-251652: New SID on Media Aspects for 6G System (FS_6G_MED) - [3] 3GPP TR 22.870: Study on 6G Use Cases and Service Requirements - [4] 3GPP TR 26.998: Support of XR Services

Qualcomm Atheros, Inc.
[FS_6G_MED] Test scenarios for AI traffic characterization

Summary of S4-260115: Test Scenarios for AI Traffic Characterization based on SA1 Use Cases

1. Introduction

This contribution from Qualcomm proposes test scenarios for characterizing AI traffic patterns in support of the 3GPP SA4 6G Media Study objectives. The work is based on AI-related use cases defined in TR 22.870 "Study on 6G Use Cases and Service Requirements", covering AI Agents, Large Language Models (LLMs), Generative AI, and real-time AI inference services.

A 6G AI Traffic Characterization Testbed has been developed to measure traffic characteristics of generative AI services, analyze agentic AI patterns, and evaluate QoE metrics under various network conditions.

2. Relevant SA1 Use Cases from TR 22.870

The contribution identifies and categorizes relevant AI use cases from TR 22.870 into four main groups:

2.1 AI Agent Communication Use Cases

  • Clause 6.6: 6G AI Agent collaboration with third-party AI using LLM
  • Clause 6.7: AI Agents communication (multi-group task-oriented communication)
  • Clause 6.8: 6G system assisted AI Agent service
  • Clause 6.9: Collaborative AI Agents
  • Clause 6.11: Built-in Intelligent Communication Assistant

2.2 Generative AI and LLM Use Cases

  • Clause 6.13: Retrieval Augmented Generation for LLM
  • Clause 6.26: Optimizing user experience for GenAI applications
  • Clause 6.31: UE-Network collaboration with AI capabilities (LLM task offloading)
  • Clause 6.33: AI text-to-video generation supported by computing
  • Clause 6.59: 6G provided communication service for AI traffic

2.3 Real-time AI Inference Use Cases

  • Clause 6.3: End-to-end AI for connected cars
  • Clause 6.17: Intelligent communication assistant
  • Clause 6.22: Intelligent calling services
  • Clause 6.38: AI for disability support (real-time video/audio analysis and enhancement)
  • Clause 6.49: 6GS providing low-latency AI inference service

2.4 Computing and Resource Exposure Use Cases

  • Clause 6.2: Optimizing 6G infrastructure utilisation via resource exposure
  • Clause 6.24: Distributed 6G network for AI computing
  • Clause 6.28: Network-assisted video-based AI inference task offloading for mobile embodied AI
  • Clause 6.34: 6G computing support for AI model inference
  • Clause 6.50: Real time video super-resolution service

3. Test Scenarios for AI Traffic Characterization

The contribution proposes 10 test scenarios with explicit mapping to TR 22.870 use cases:

| Scenario | Description | TR 22.870 Mapping | |----------|-------------|-------------------| | chat_basic | Basic single-turn LLM chat interaction | 6.11, 6.17, 6.22, 6.59 | | chat_streaming | Multi-turn chat with streaming responses | 6.11, 6.17, 6.26, 6.31, 6.59 | | shopping_agent | AI Agent with tool calling (MCP) | 6.6, 6.7, 6.8, 6.11 | | web_search_agent | Research agent with web search capability | 6.6, 6.13, 6.21 | | realtime_text | Real-time conversational AI via WebSocket | 6.3, 6.17, 6.22, 6.38, 6.49 | | realtime_audio | Audio-based real-time conversation | 6.17, 6.22, 6.38, 6.49 | | image_generation | Image generation using Generative AI | 6.26, 6.31, 6.33, 6.34, 6.50 | | multimodal_analysis | Multimodal input analysis (image + text) | 6.3, 6.15, 6.26, 6.28, 6.38, 6.50 | | video_streaming | Video upload for AI inference offloading | 6.28, 6.38, 6.50 | | computer_control_agent | Computer use agent via GUI automation | 6.8, 6.9, 6.21, 6.28 |

4. Relevant Metrics for the Selected Scenarios

4.1 Chat and Conversational AI Scenarios

Addresses TR 22.870 clauses 6.11, 6.17, 6.22, and 6.31.

Key Metrics: - Time-to-First-Token (TTFT): Critical QoE metric for perceived responsiveness - Time-to-Last-Token (TTLT): Total response generation time - Token streaming rate: Throughput in tokens per second - Uplink/Downlink byte volumes: Traffic volume for network dimensioning

4.2 AI Agent Scenarios

Addresses TR 22.870 clauses 6.6, 6.7, 6.8, and 6.11. Uses Model Context Protocol (MCP) for tool calling.

Key Metrics: - Agent loop factor: Number of API calls per user prompt (agentic iterations) - Tool call latency: Time for external tool execution - Multi-step task completion time: End-to-end task duration - Burstiness patterns: Peak-to-mean traffic ratio and ON/OFF periods

4.3 Real-time AI Scenarios

Addresses TR 22.870 clauses 6.49, 6.38, and 6.3 (low-latency requirements).

Key Metrics: - WebSocket/WebRTC connection setup time - Streaming chunk delivery patterns - Stall detection metrics (rate, duration) - Audio byte volumes and durations (for voice scenarios)

4.4 Generative AI Media Scenarios

Addresses TR 22.870 clauses 6.26, 6.28, 6.31, 6.33, and 6.50.

Key Metrics: - Image generation latency and payload sizes - Multimodal input processing requirements - UL/DL asymmetry ratios for different content types - Video upload bandwidth for AI inference offloading (20-100 Mbps per clause 6.28) - Frame-level packet error tolerance characteristics

5. Proposal

The contribution proposes: 1. Adopt the identified test scenarios as described in this contribution and implemented in the AI testbed 2. Document the relevant AI use cases from TR 22.870 in an Annex in TR 26.870

Technical Contributions Summary

This contribution provides a comprehensive framework for AI traffic characterization in 6G systems by: - Systematically mapping 10 test scenarios to specific SA1 use cases from TR 22.870 - Defining scenario-specific metrics covering QoE (TTFT, TTLT), traffic patterns (burstiness, asymmetry), and performance (latency, throughput) - Introducing AI-specific traffic characteristics such as agent loop factors, token streaming rates, and agentic iteration patterns - Addressing diverse AI service types including conversational AI, agentic AI with tool calling, real-time inference, and generative media services - Providing a testbed-based approach for empirical traffic characterization to support network dimensioning and QoS specification for 6G AI services

HuaWei Technologies Co., Ltd
pCR [FS_6G_MED] Considerations on Work Topic 1 Media Delivery requirements for intelligent immersive calling

Summary of pCR [FS_6G_MED] Considerations on Work Topic 1: Media Delivery requirements for intelligent immersive calling

Document Information

  • Source: Huawei
  • Specification: 3GPP TR 26.870
  • Meeting: TSG-SA4 Meeting #135 (February 2026, Goa, India)
  • Document Number: S4-260080
  • Type: Change Request for Approval

Main Purpose

This pCR proposes updates to clause 4.2 of TR 26.870, introducing new requirements related to intelligent immersive calling services for 6G media delivery.

Background and Rationale

The change is motivated by SA1's work on TR 22.870, which introduces use cases for immersive communication targeting aging populations. The use case envisions leveraging 6G technologies (particularly AI capabilities such as generative AI and multi-modal models) to enable operators to provide intelligent immersive calling services through various wearable and smart devices including: - AR/VR glasses - Smart glasses - Smart TVs - Smart watches

Technical Contributions

New Clause 4.2.1: The Intelligent Immersive Calling Service

The pCR introduces a new service definition describing intelligent immersive calling as: - An AI-empowered immersive calling service utilizing generative AI and multi-modal models - A service that aggregates inputs from multiple smart devices and sensors (cameras, smart watches, AR/VR glasses) - A service that can be natively provided by operators

Requirements for Intelligent Immersive Calling

The following technical requirements are introduced:

  1. High-Quality Video Support: 4K + HDR uplink video capability for intelligent immersive calling

  2. Eye Tracking Support: Ability to support eye tracking across different types of smart devices (e.g., AR/VR glasses)

  3. User Intention Understanding: Capability to understand user intention through voice and gesture inputs

  4. Device-Aware QoE: Tiered QoE support that takes the specific device capabilities into account

  5. IMS Extensions: Extensibility of IMS to support these new capabilities

  6. Multi-Media Protocol Extensions: Review of protocol extensions for multi-media transporting

China Mobile Com. Corporation
[FS_6G_MED]pCR on Embodied Video for 6G Media

Summary of 3GPP Technical Document S4-260161

Document Overview

This is a pCR (proposed Change Request) to 3GPP TR 26.870 introducing Embodied Video Internet (EVI) as a new use case for 6G Media studies. The document proposes adding a new clause 6.1 to the technical report, focusing on media requirements for embodied AI systems (robots, UAVs) that actively capture and process video in dynamic environments.

Main Technical Contributions

1. Introduction to Embodied Video Internet (Clause 6.1.1)

Core Concept: - Defines Embodied AI as integration of AI into physical systems enabling real-world interaction - Introduces paradigm shift from static/passive recording to dynamic/mobile/embodied sensing - Distinguishes between: - Old Paradigm: Fixed cameras with limited FOV and constrained coverage - New Paradigm: Mobile devices (robots, UAVs) as "mobile eyes and limbs" actively exploring environments

Definition: - Embodied Video: Use of 6G networks enabling intelligent agents to capture, process, and react to visual information in real-time within dynamic environments

2. SA1 Use Case Analysis (Clause 6.1.2)

Extracts and summarizes four relevant use cases from TR 22.870:

Use Case 6.28: Network-assisted Video-based AI Inference Task Offloading

Technical Requirements: - Multi-camera systems (6-8 cameras) with concurrent multi-modal data streams (video, point clouds) - Three operational scenarios defined: - Scenario I: 6x 1080p @ 15Hz → 20 Mbps - Scenario II: 4x 1080p + 2x 4K @ 15/30Hz → 60 Mbps - Scenario III: 2x 1080p + 4x 4K @ 15/30Hz → 100 Mbps - Alternative: 4x 1080p + 2x 4K @ 60Hz - E2E RTT: 100-300ms - Compression ratio: 240:1 assumed - Distributed AI inference tasks: multi-modal perception, 3D digital twin modeling, trajectory planning

Media Requirements: - AI codec with error-tolerant capabilities (Grace method) - Real-time processing of high-resolution video and multi-modality data - High uplink data rate and low latency

Use Case 6.19: AI-based Video Analysis

Application Context: - Real-time infrastructure inspection (utility poles, guardrails) - Security surveillance - Network offloading for resource-intensive video analysis

Media Requirements: - Native integration of video analysis algorithms (object recognition, anomaly detection) - Low latency communication

Use Case 6.48: Service Robot for Power Grid

System Architecture: - Embedded controllers for motion control (walking, grasping) - fast response - Network offloading for computing-intensive tasks (large AI models, control command generation)

KPI Requirements:

| Traffic Type | Message Size | Transfer Interval | Data Rate | E2E Latency | Reliability | |--------------|--------------|-------------------|-----------|-------------|-------------| | UL sensor data | 1250-12500 Bytes | 10 ms | 1-10 Mbps | 100-150 ms | 99.99% | | UL LiDAR | 345600 Bytes | 100 ms | 27.6 Mbps | 100-150 ms | 99.99% | | DL Control command | 625-12500 Bytes | 50 ms | 0.1-2 Mbps | - | - |

Technical Notes: - LiDAR: 10 Hz frame rate, 28800 points/frame, 12 bytes/point - E2E latency breakdown: ~40ms communication + ~100ms AI inference

Media Requirements: - Real-time processing of multi-modality data (video, audio, point clouds, LiDAR)

Use Case 6.11: Intelligent UAV Swarms

Operational Concept: - UAVs with built-in AI capabilities for enhanced perception, decision-making, control - Swarm deployment for full area coverage and complex task execution - Network offloading during local computing overload (e.g., HD 3D map generation)

Media Requirements: - Real-time processing of multi-modality data from multiple UAVs

3. External Evidence - UAV Inspection Use Cases (Clause 6.1.3)

Table 2.1.3-1: UAV Inspection Requirements

| Use Case | Video Resolution | Data Rate | E2E Latency | Reliability | |----------|------------------|-----------|-------------|-------------| | Traffic surveillance | 1080p | ≥5 Mbps | <100 ms | >99.99% | | Traffic surveillance | 4K | >25 Mbps | <100 ms | >99.99% | | Urban management | 1080p | ≥5 Mbps | 20-100 ms | - | | Event security | 1K | ≥5 Mbps | ≤10 ms | - | | Event security | 4K | ≥25 Mbps | ≤10 ms | - | | Rural inspections | 4K | ≥25 Mbps | <100 ms | - |

Table 2.1.3-2: UAV 3D Mapping Requirements

| Use Case | Data Type | Data Rate | E2E Latency | |----------|-----------|-----------|-------------| | Topographic surveying | High-res video, LiDAR | ≥30 Mbps | 20-100 ms | | Reconstruction | 4K video | ≥50 Mbps | 20-100 ms | | Mine monitoring | Video, LiDAR, sensor | ≥30 Mbps | 20-100 ms | | Rural governance | High-res video, LiDAR | ≥30 Mbps | 20-100 ms |

4. Consolidated Requirements for 6G Media (Clause 6.1.4)

Four Key Requirements Identified:

  1. AI Codec Technology
  2. Error-tolerant capabilities within frames
  3. Grace method for better UX vs. traditional codecs

  4. AI-native Video Protocol

  5. New protocol design for AI-driven video systems

  6. Low-latency Video Transmission

  7. Critical for real-time embodied AI operations

  8. QoE Model for Performance Measurement

  9. User-centric parameters:
    • Multi-stream data types
    • Data quality
    • Accuracy and reliability of feedback results
  10. Network-centric parameters:
    • Network delivery speed
    • Latency
    • End-to-end packet loss
    • Network usage

Technical Significance

This pCR establishes foundational requirements for supporting embodied AI systems in 6G media, addressing: - Multi-modal concurrent data streaming - Real-time AI inference offloading - High-reliability, low-latency video transmission - Novel QoE metrics for embodied video applications - AI-native codec and protocol requirements

The document bridges SA1 service requirements with SA4 media specifications, providing concrete KPIs and use case evidence for the FS_6G_MED study.

InterDigital Pennsylvania
[FS_DCTC_eQoS_MED] Description of experimental approach and test setup for media transmission for AI inferencing

Summary of S4-260230: Experimental Approach and Test Setup for Media Transmission for AI Inferencing

Document Overview

This Change Request (CR) to TR 26.823 v0.2.0 addresses the currently empty Clause 6.5.1 by providing detailed experimental approaches and test setups for evaluating media transmission for AI inferencing scenarios. The contribution is part of the Study on dynamically changing traffic characteristics and usage of enhanced QoS support in 5GS for media applications and services.

Main Technical Contributions

Two Experimental Approaches Proposed

The document proposes two distinct experimental approaches corresponding to the AI inferencing in XR service scenario (Clause 5.6):

  1. Experimental approach #1: Based on commercially available generative AI applications
  2. Experimental approach #2: Based on a standalone generative AI platform for traffic measurement associated with QoE metrics

Experimental Approach #1: Using Commercially Available Generative AI Applications

Experimental Methodology

  • Black box approach: Uses commercially available generative AI applications (e.g., OpenAI ChatGPT™, META AI™, Google Gemini™) to measure and characterize media transmission traffic in uplink and downlink
  • Focus: Interactive or delay-bound AI inference responses as described in typical implementation and end-to-end procedures
  • Example use cases:
  • Meta AI glasses "ask Meta AI about what you see" functionality (voice + periodic images uplink, audio response downlink)
  • OpenAI ChatGPT™ voice mode
  • Google Gemini™ live mode with camera feed

Measurement Procedure

  • Baseline measurement: First conducted on wired network (ideal network conditions)
  • 5G network testing: Uses test channels or emulated 5G network for deterministic, controllable, and reproducible conditions
  • Network conditions considered: Nominal, cell-edge, multi-UE scenarios with system load

Test Setup

  • Observation Points (OPs):
  • UE side: 5G network emulator ingress of AI inference input data and egress of AI inference output data
  • UPF side: 5G network emulator egress of AI inference input data and ingress of AI inference output data
  • Measurement tool: Open-source network protocol analyzer (e.g., Wireshark) for IP packet analysis
  • Purpose: Measurements at both ingress and egress highlight network performance impact on traffic characteristics

Limitation

NOTE: Due to black box nature, QoE metrics identified in Clause 5.6.3 cannot be measured due to lack of appropriate Observation Points.


Experimental Approach #2: Using Standalone Generative AI Platform

Experimental Methodology

  • White box approach: Standalone client/server generative AI platform mimicking commercially available applications
  • Client side: Generation and transmission of AI inference input data
  • Server side: AI inference, generation and transmission of AI inference output data
  • Key advantage: Control over open-source AI model and generation/reception of AI inference data enables QoE metrics measurement (Clause 5.6.3) in addition to traffic characteristics

QoE Measurement Capabilities

  • Per-packet metadata insertion: Enables latency QoE metrics measurement (e.g., time-to-first-token, end-to-end latency)
  • Packet marking: Uplink packets (questions) and corresponding downlink packets (inference responses) marked with metadata for correlation

Measurement Procedure

Similar to Approach #1: - Baseline: Wired network (ideal conditions) - 5G testing: Test channels or emulated 5G network - Network conditions: Nominal, cell-edge, multi-UE scenarios

Test Setup

Client Side (Custom Application, e.g., Unity 3D)

  • Generates and transmits input data (prompt/text, image) with metadata
  • Receives AI inference output data with metadata
  • Renders AI inference results

Server Side (Custom AI-enabled Application, e.g., Unity 3D)

  • Receives AI inference input data with metadata
  • Performs AI inferencing using open-source standalone Multi-Modal LLM (e.g., Llava model from LlamaSharp based on llama.cpp)
  • Collects QoE metrics
  • Transmits AI inference output data with metadata

Observation Points

  • UE and UPF side OPs: For uplink/downlink traffic characteristics (as in Approach #1)
  • Client and server-side OPs: For QoE metrics measurement

Variant Configuration

NOTE: Test setup may be extended to mimic advanced AI-enabled AR devices (e.g., lightweight AR+AI glasses requiring remote rendering) by adding XR data (e.g., periodic pose information) to AI inference input data in uplink to evaluate impact on traffic characteristics.


Key Technical Differentiators

| Aspect | Approach #1 (Commercial Apps) | Approach #2 (Standalone Platform) | |--------|-------------------------------|-----------------------------------| | Control | Black box | White box | | QoE Metrics | Not measurable | Measurable | | Traffic Characteristics | Measurable | Measurable | | Flexibility | Limited | High (customizable) | | Realism | High (actual commercial apps) | Medium (mimics commercial apps) | | Metadata Support | No | Yes (per-packet) |

InterDigital New York
6GMedia - work topic 2- Characteristics of AI-enabled applications

Summary of S4-260234: 6GMedia - Characteristics of AI-enabled Applications

Introduction

This contribution from InterDigital addresses work topic 2 of the 6GMedia study, focusing on key characteristics of XR and AI-enabled mobile applications and services. The document proposes use cases and elaborates on requirements for interoperable and widespread deployment.

Use Cases

The document identifies several representative use cases:

  • AR applications: Require AI-based Spatial Computing functions (segmentation, semantic perception) for virtual content insertion in real environments (TR 26.819)
  • Personalized interactive immersive guided tour: Requires AI inference for proper virtual content placement in fast-evolving real environments (TR 22.870, clause 9.12)
  • Video/image analysis: Requires remote AI processing with adaptive upstream video quality adjustments (TR 22.870, clause 6.28)
  • Conversational services: Uses AI for real-time translation and media transformation (TR 22.870, clause 6.42)
  • Context-aware recommendation: Uses generative AI for environment-related queries (TR 22.870, clause 6.3)
  • AI model training/transfer/update: Requires transmission of AI data including training data, models, and inference data (TR 22.870, clause 6.25)

Technical Contributions

3.1 Heterogeneous and Multimodal Mobile Applications and Services

Key Observations: - AI-enabled applications are highly heterogeneous and multimodal, encompassing video, image, audio, text, haptics, and sensor data - Applications exchange AI/ML data including prompts, model parameters, and compressed/uncompressed intermediate data (embeddings)

Table 1 Analysis provides detailed mapping of: - AR: UL (video, audio, prompt, inference data) / DL (video, audio, dynamic 3D media, haptics, spatial descriptions) - requires MPEG haptics, scene description enhancements, dynamic mesh/gaussian splat codecs - Real-time Object Detection: Feature representations, MPEG-7 descriptors, MPEG FCM - Speech Recognition/Conversational AI: ULBC, tokens, embeddings - Model Learning/Updates: ONNX, GGUF, MPEG NNC formats - Avatar communication: Upcoming MPEG avatar, gaussian and mesh codecs - Context-aware recommendation: W3C Media Annotations, MPEG-7 descriptors

Proposals: - Proposition 1: SA4 should study support of additional media modalities and codecs/enhancements for 6G - Proposition 2: SA4 should define terminology for AI/ML data (features, tokens, embeddings, latent, intent) and study relevant AI representation formats and interchangeable formats/codecs - Observation 2: Some applications require remote AI-based Spatial Computing functions (TR 26.819) - Proposition 3: SA4 should identify and study spatial compute functions benefiting from off-device processing

3.2 QoS Granularity and QoE-driven Dynamic Media Adaptation

Traffic Characteristics: - Applications are uplink-heavy with greatly varying characteristics across modalities - Continuous video capture results in high-rate, periodic uplink traffic - Audio/sensor data generates lower-rate, aperiodic, bursty transmissions - Traffic composition changes dynamically based on user behavior, interaction patterns, mobility, and environmental factors

Table 2 Analysis characterizes requirements: - AR, Real-time Object Detection, Avatar communication: High data rate, real-time latency, mid reliability, high need for QoE-based adaptation - Speech Recognition/Conversational AI, Context-aware Recommendation: Mid data rate, real-time latency, mid reliability, mid adaptation need - Model Learning/Updates: High data rate, non-real-time latency, mid reliability, low adaptation need

Key Observations: - Observation 3: Diversity of applications and modalities makes traffic characteristics evaluation/classification challenging - Observation 4: Temporal dependency and synchronization required between media modalities and AI data for real-time/delay-bound AI inference - Observation 5: Applications characterized by uplink-intensive, bursty/continuous, multi-modal traffic with diverse latency sensitivity and QoE impact - Observation 6: Current QoS frameworks lack application/context awareness, granularity, and adaptability for dynamic 6G network conditions

Proposals: - Proposition 4: SA4 should develop generic QoS and QoE mechanisms suitable across diverse traffic patterns - Proposition 5: SA4 should study QoS framework enhancements enabling finer granularity and context awareness - Proposal 6: SA4 should specify procedures for real-time QoE-based adaptation of multimodal media and define QoE metrics for real-time/delay-bound AI inference

3.3 New Protocols

Key Points: - Transport protocols (QUIC-based, HTTP/3-based) are rapidly evolving to suit AI-enabled use cases - These evolutions substantially impact traffic characteristics including latency, reliability, and resource utilization - Rel-19 SA2 specified techniques for delivering Media Related Information (MRI) when XRM traffic is end-to-end encrypted (QUIC) - TS 23.501 clause 5.37.9 specifies options for relaying MRI over N6 interface - Rel-18/19 SA4 specified solutions in TS 26.522 enabling RTP senders to transmit MRI using RTP header extensions

Proposals: - Observation 8: New transport protocols impact media transmission reliability, latency, and traffic characteristics - Proposal 7: SA4 should characterize impact of QUIC-based protocols on AI data delivery and traffic characteristics, especially for real-time/delay-bound applications - Observation 9: SA4 has specified RTP-based MRI solutions in TS 26.522 - Proposal 8: SA4 should study integration of SA2-defined QUIC-based transport extensions into media delivery architecture, leveraging FS_Q4RTC-MED study

3.4 Multi-Device Scenarios

Key Characteristics: - AI-enabled services deployed across smartphones, AI glasses, smartwatches, fitness devices, companion compute devices - Services involve continuous sensing, media capture/processing, on-device/distributed AI inference, and frequent network data exchange - Services are inherently multi-device with different devices contributing sensing, media, compute, display, or connectivity functions - Introduces QoS/QoE challenges for modality/format adaptation, AI processing coordination with partial/full offload, and traffic correlation across UEs

Figure 1 illustrates UE tethering where AI-enabled services are delivered across multiple user devices relying on a tethered UE for cellular connectivity and coordination.

Observations and Proposals: - Observation 7: AI-enabled services increasingly operate across heterogeneous multi-devices associated with same user; modalities and AI processing may be distributed - Observation 8: Existing system assumptions are UE-centric and don't address QoS/QoE requirements of multi-device scenarios - Proposal 8: SA4 should study impact of multi-devices on QoS and QoE framework - Observation 9: QoS enhancement and QoE-driven dynamic media adaptation need to operate across heterogeneous multi-devices - Proposition 9: SA4 should consider heterogeneous multi-devices for QoE metrics definition and QoS enhancement study for real-time/delay-bound AI inference

Conclusion

The document proposes to discuss and agree on all proposals as part of the 6GMedia study and document them in a new section 6.X of the TR. The contribution emphasizes three main areas requiring SA4 attention: 1. Support for heterogeneous and multimodal media types including AI/ML data 2. Enhanced QoS/QoE frameworks with finer granularity and context awareness 3. Multi-device scenario support for AI-enabled services

Apple Inc.
On SA4 work on AI traffic characteristics

Summary of S4-260269: On SA4 Work on AI Traffic Characteristics

1. Introduction and Background

This contribution from Apple addresses SA4's response to consultation requests from RAN2 and SA2 regarding AI traffic patterns and formats. The document establishes the context that SA4 must make application layer assumptions to develop traffic models for AI services, including LLMs and other AI agents. The paper argues for a specific approach to how these assumptions should be treated within the standardization process.

2. General AI Traffic Characterization

2.1 Scenario Diversity

The document identifies multiple dimensions of AI traffic variation:

  • Text-to-Text: Current dominant scenario with small data packets (prompts) and token-based responses
  • Multimodal (Images/Video): Significantly larger prompt data, e.g., photo uploads
  • Interactive: Frequent, closely correlated exchanges requiring real-time performance (conversations, gaming)
  • User Type: Agentic vs. human users impacting traffic patterns
  • Model Formats: While most LLMs use HTTPS with RESTful APIs and JSON payloads, the data schema varies significantly across different models (references provided to OpenAI, Claude, and Gemini APIs)

2.2 Traffic Nature

AI traffic is characterized as: - Bursty and unpredictable - Event-driven rather than steady-state streaming

2.3 Modeling Requirements

To characterize traffic (latency, throughput, periodicity, burstiness), SA4 needs assumptions about data formats. However, the document notes: - Industry currently uses various transport methods - The domain is rapidly evolving - No clear interoperability requirement exists today - Traffic modeling targets deployment scenarios 5+ years in the future - Current leading formats may become obsolete by 6G deployment

3. Proposed Approach

The contribution proposes three key principles for SA4:

  1. Non-normative Treatment: AI format assumptions for traffic modeling should be treated as guidance only, not as rigid normative standardization targets. SA4 should avoid normative work on AI formats (actual data packet structure) at this stage to prevent locking specifications into constraints that may not suit future technology evolution.

  2. Focus on Traffic Characteristics: Work should concentrate on traffic characteristics (latency, throughput, periodicity, burstiness) rather than specific coding or file formats used to generate that traffic.

  3. Continuous Review: SA4 should periodically review this approach as the AI traffic and format landscape continues its rapid evolution.

Key Technical Contribution

The main technical contribution is a strategic positioning paper that argues against premature normative standardization of AI data formats while supporting the development of traffic models based on reasonable assumptions. The paper advocates for a pragmatic approach that acknowledges the rapid evolution of AI technologies and focuses SA4 efforts on traffic characteristics that will inform network design rather than application-layer format specifications that may quickly become outdated.

InterDigital New York
6GMedia - AI terminology

6GMedia – AI Terminology for Media Communication Services

Overview

This contribution addresses the need for standardized AI terminology in the context of 6GMedia work. The document recognizes that terminology such as tokens and embeddings lacks clarity across delegates and 3GPP working groups, and proposes definitions for AI representation formats to enable assessment of traffic characteristics and impact on SA4 specifications.

Main Technical Contributions

1. Core AI Representation Terminology

The document proposes comprehensive definitions for fundamental AI representation concepts:

Token Definitions

  • Token (general): Unit of representation processed sequentially or symbolically by a model, which can be discrete (hard token) or continuous (soft token)
  • Hard token: Discrete symbolic representation selected from a predefined vocabulary or codebook, processed as an atomic unit within a model's input or output sequence. Has explicit identifier without partial or weighted selection. Examples include text symbols, discrete audio or visual codewords
  • Soft token: Continuous vector representation that functionally replaces or augments a hard token, not corresponding to a single discrete vocabulary element. Typically learned and processed similarly to hard tokens

Other Core Representations

  • Embedding: Continuous vector representation encoding semantic, perceptual, or structural characteristics of content for comparison, retrieval, alignment, or conditioning. Not inherently part of a token sequence
  • Latent representation (latent): Continuous representation capturing compressed, abstract, or generative state during processing or synthesis
  • Hierarchical latent representation: Latent representation structured across multiple levels, with higher levels capturing coarse/global/semantic information and lower levels capturing fine-grained/local/residual information

2. Media-Specific and Exchange Format Terminology

Compression and Exchange Formats

  • Learned based media compression (representation): Discrete, syntax-defined coded form derived from latent representation after quantization and entropy coding (e.g., JPEG AI, MPEG AI-PCC)
  • Model exchange representation: Defines network topology, operators, parameters, and data types for interoperable deployment across distributed systems (e.g., ONNX, NNEF, GGUF)

Inference-Related Representations

  • Feature/descriptor: Structured representation of content characteristics, extracted for description, comparison, indexing, or retrieval (e.g., MPEG-7 descriptors)
  • Inference results: Output representation produced by AI system as result of applying model to input data, conveying decision, prediction, estimation, or detection outcome. May include labels, scores, confidence values, coordinates, masks, or structured outputs (e.g., W3C Media Annotations)
  • Intermediate data: Representation of partially processed information exchanged between inference components, enabling continuation of inference by downstream entity (includes intermediate coded representation, feature representation or descriptors). References TR 26.927 definition

3. Applicability Matrix to Media Types/Modalities

The document provides a comprehensive mapping of representation types across different media modalities:

Text

  • Prevalent method: Hard tokens
  • Hard tokens: Words, subwords, characters, punctuation symbols
  • Soft tokens: Learned prompt or prefix vectors controlling task or style
  • Embeddings: Vector encoding semantic meaning of sentence or document
  • Latent representation: Internal hidden states during text generation or transformation
  • Learned compression: Text compression remains predominantly symbolic with limited use of learned latent

Audio

  • Prevalent method: Latents + embeddings
  • Hard tokens: Discrete speech units, phoneme identifiers, quantized audio codes
  • Soft tokens: Continuous conditioning vectors controlling speaker, emotion, or prosody
  • Embeddings: Vector encoding speaker identity, acoustic similarity, or musical features
  • Latent representation: Audio latents capturing timbre, rhythm, or spectral structure
  • Learned compression: Applicable

Image

  • Prevalent method: Latents
  • Hard tokens: Discrete visual codewords or quantized image patches
  • Soft tokens: Learned visual token vectors participating in image models
  • Embeddings: Vector encoding image content, objects, or style
  • Latent representation: Image latents used in generative or transformational models
  • Learned compression: Applicable (prevalent)

Video

  • Prevalent method: Latents (hierarchical)
  • Hard tokens: Discrete frame-level or spatiotemporal codewords
  • Soft tokens: Continuous vectors controlling temporal or contextual behavior
  • Embeddings: Vector encoding semantic or perceptual characteristics of video segment
  • Latent representation: Spatiotemporal latents capturing motion and scene dynamics
  • Learned compression: Applicable (prevalent)

Multimodal

  • Prevalent method: Mixed (tokens + embeddings + latents)
  • Hard tokens: Aligned discrete tokens from multiple media types
  • Soft tokens: Continuous token-equivalent vectors enabling cross-modal conditioning
  • Embeddings: Shared vector representation aligning content across modalities
  • Latent representation: Individual or multimodal latents supporting joint generation or transformation
  • Learned compression: Usually applied per modality but could be applied on shared latents

4. Internal vs. External Representation Framework

The document proposes distinguishing between: - Internal representation: Representation used by the model or agent for its internal process - Exchangeable/external representation: Format exchanged between two entities (e.g., UL or DL)

A matrix is provided mapping representation formats to internal/external usage, with most entries marked as FFS (For Further Study), except: - Learned based compressed representation: Not internal, external examples include JPEG AI, MPEG AI-PCC - Model exchange representation: Not internal, external examples include ONNX, NNEF, GGUF, NNC

Proposal

The contribution proposes to include sections 1 to 3 in a relevant section of TR 26.870.

LG Electronics Inc.
[FS_6G_MED] Consideration on Media Delivery Architecture

Summary of 3GPP Change Request S4-26xxxx

Document Information

  • Source: LG Electronics Inc.
  • Title: Consideration on Media Delivery Architecture
  • Specification: 3GPP TR 26.807 (FS_6G_MED)
  • Purpose: Proposal for Agreement

Main Technical Contribution

Multi-network Cooperative Media Delivery Architecture (New Clause 6.1.x)

Core Concept

The contribution proposes a fundamental shift in 6G media delivery architecture from parallel network usage to functional decomposition and cooperative delivery across heterogeneous networks. Key aspects include:

  • Integration of heterogeneous networks: Broadcast, mobile communication, fixed IP, and satellite networks operate as a unified media delivery system
  • Role-based network assignment: Each network assumes distinct roles based on content characteristics
  • Non-3GPP networks as equal partners: Broadcast and other non-3GPP networks treated as equal delivery resources rather than auxiliary means

Functional Decomposition Approach

Media content is decomposed into constituent elements, with each delivered through the most appropriate network:

  • Common content (e.g., main video stream of live sports): Delivered via broadcast/multicast networks for efficiency during massive simultaneous viewing
  • Personalized elements (e.g., advertisements, UI, viewpoint selection, commentary): Delivered via unicast/IP networks on per-user basis
  • Interactive data and control components: Handled through appropriate low-latency paths

Media-Aware Network Operation

The architecture enables semantic-aware delivery where:

  • Networks recognize media content meaning and characteristics
  • Dynamic adjustment between broadcast/unicast and fixed/mobile based on:
  • Scale of concurrent consumption
  • Latency sensitivity
  • Energy efficiency requirements
  • User mobility
  • AI-based control functions perform real-time optimization
  • Cooperation with user devices and service providers

Application Scenarios

The architecture applies to:

  • Streaming-based media services: Traditional video delivery with personalization
  • XR and spatial media services:
  • Base video streams/spatial information via broadcast paths
  • User-specific viewpoint changes and interaction data via low-latency 6G unicast
  • Achieves both scalability and immersion

6G Network Role Evolution

6G evolves from direct traffic accommodation to:

  • Central orchestrator of heterogeneous media delivery ecosystem
  • Higher-level platform providing integrated control over diverse network resources
  • Intelligent network composer automatically selecting optimal network combinations based on media characteristics

Standardization Considerations

  • Need for common interfaces enabling improved accessibility to non-3GPP standards (including broadcast systems like DVB and ATSC)
  • Addresses limitations of 5G approach which focused primarily on parallel network usage
  • Moves beyond independent treatment of broadcast and mobile systems

Conclusions (Clause 6.1.x.2)

The 6G media architecture: - Supports coexistence of heterogeneous networks for media services - Separates transmission methods per media element with intelligent integration - Simultaneously supports massive simultaneous viewing and hyper-personalized interactive media - Applicable across multicast/broadcast, streaming, XR, and next-generation immersive media

Related Requirements (Clause 6.1.x.3)

The proposal maps to existing TS 22.870 requirements:

Clause 5.4.2 - Legacy Services Support

  • Broadcast and Multicast Services (ref TS 22.261)

Clause 5.9.8 - Enhanced Network Service Awareness

  • [PR 5.9.8.2-1]: 3rd party service provider information sharing on service characteristics per traffic flow component
  • [PR 5.9.8.2-2]: Dynamic network resource adjustment based on service characteristics and predicted changes
  • [PR 5.9.8.2-3]: Charging support for differentiated services per media component

Clause 11.2 - UAM Aircraft Communications

References performance requirements for various media services on UAM aircraft including: - 8K video live broadcast (100 Mbps uplink, 200ms latency, 95% reliability) - Video streaming (4-100 Mbps depending on resolution, 100ms latency, 95% reliability) - Remote controller through HD video (≥25 Mbps uplink, 100ms latency, 99% reliability) - Video conferencing (25 Mbps bidirectional, 100ms latency, 99% reliability) - Immersive multimedia services/cloud gaming (100-500 Mbps downlink, 50ms latency, 99% reliability)

All services specified for up to 1000m altitude AGL in urban/rural/scenic areas.

Huawei Tech.(UK) Co.. Ltd
overview of inputs to RAN2#133 on AI traffic characteristics

Overview of RAN2#133 Inputs on AI Traffic Characteristics

Document Purpose and Context

This document provides a summary of contributions submitted to RAN2#133 regarding AI traffic characteristics. Following RAN#110 plenary's assignment of RAN-2 to lead AI traffic characteristics work in RAN and coordinate with SA WG4, this overview aims to align SA4 and RAN-2 work at an early stage. The document explicitly recommends prioritizing discussion around key dependencies identified by RAN-2.

Key Traffic Characteristics Identified Across Contributions

Common Traffic Patterns

Multiple contributions converge on the following AI traffic characteristics:

  • Bursty and aperiodic nature: Nearly universal observation across contributions
  • Uplink-heavy traffic: Particularly emphasized for mobile AI applications
  • Unpredictable bandwidth requirements: Dynamic and variable data rates
  • Small packet sizes: Frequent transmission of small data units
  • Multi-modal traffic: Synchronization requirements across different modalities
  • Asymmetrical traffic patterns: Different characteristics for UL vs DL
  • Error tolerance: Variable across different AI applications and data types
  • Token-based communication: Specific characteristics for tokenized AI traffic

Latency Characteristics

  • Delay-sensitive traffic: Strict end-to-end latency requirements
  • Low latency for initial packets: Critical for interactive applications
  • Variable packet delay budgets: Dependent on application type
  • Interactive with elastic latency: Some flexibility in certain scenarios

AI Traffic Categorization Approaches

By Real-Time Requirements

Several contributions propose categorization based on timing: - Real-time vs Non-real-time: Most common distinction - Interactive vs Non-interactive: Request/response patterns

By AI Codec Usage

Multiple contributions distinguish: - AI codec traffic: Native AI representation formats - Non-AI codec traffic: Traditional encoding methods - Type 1: Real-time AI application with non-AI codec - Type 2: Real-time AI application with AI codec
- Type 3: Non-real-time AI application

By Service Class

Peng Cheng Lab (R2-2600153) proposes detailed service classes: - Service Class A: Generative AI and AI Agent Traffic (Token-Streaming Inference) - Service Class B: Perception/Analytics AI (Uplink-Intensive Inference), including Split Inference - Service Class C: Federated/Distributed Learning and Training Traffic (Bulk, Synchronized Uploads) - Composite Class D: AI-Enhanced Immersive Communication (XR + Digital Twin + AI Components)

By Use Case

  • Agentic (continuous) vs Non-agentic (bursty): Meta/Qualcomm et al. (R2-2600480)
  • Chatbot, Live AI, AI assistant: Ericsson (R2-2600885)
  • Intermediate data type: From TR 26.927 (NEC R2-2600552)

By Data Type

  • Training data, Model data, Inference data: CATT (R2-2600242)
  • Token vs non-token: Multiple contributions
  • Modality-based importance vs sequence-based importance: Offino (R2-2600853)

Release Strategy: 5G Rel-20 vs 6G

5G Rel-20 Focus

Strong consensus on prioritizing: - Uplink enhancements: Primary focus for Rel-20 - Non-real-time applications: Particularly chatbot/GenAI use cases - Burstiness and unpredictability handling: Leveraging XR Phase 4 work - AI traffic awareness in RAN: Enable service-aware handling

6G Scope

Broader scope proposed for 6G: - Real-time uplink and downlink: Full bidirectional support - Unified framework: Comprehensive AI traffic handling - Native AI communication: AI-native RAN traffic support - Flexible QoS: Dynamic adaptation to AI traffic patterns - Downlink non-real-time: Extended coverage beyond Rel-20

QoS and RAN Enhancement Proposals

QoS Mechanisms

  • Dynamic QoS support: Constrained latency handling (ZTE R2-2600164)
  • Flexible QoS framework: 6G requirement for AI traffic adaptation
  • Context-aware traffic flow: Enable RAN awareness (Nvidia R2-2600925)
  • Enhanced reliability: Beyond current 5G capabilities (Samsung R2-2600389)
  • PDU Set concept reuse: Leverage XR mechanisms (vivo R2-2600074)

Uplink Enhancements

  • Irregular burst support: Handle unpredictable UL patterns
  • Delay-bound data bursts: Resource-efficient handling without over-provisioning
  • Small packet transmission in RRC inactive: Efficiency improvement
  • UE-assisted uplink reporting prediction: Proactive resource allocation
  • Multi-modal synchronization for uplink: Coordinate different data streams

Scheduling and Resource Management

  • Service awareness at L2: Enable intelligent scheduling decisions
  • UE-based coordination: Context and dependency awareness
  • Error-tolerant token transmission: Exploit AI traffic characteristics
  • Token importance differentiation: Priority-based handling
  • Downlink scheduling enhancement: Network-side optimizations

Multi-modality Support

  • Multi-modal synchronization: Beyond MMSID for QoS control
  • PDU Set binding for AI traffic: Token set/burst handling
  • Dependency structure handling: Inter-stream coordination

Power Efficiency

  • Energy savings for continuous agentic AI: Long-duration applications
  • Tethering and multi-device support: Multi-access scenarios
  • RRC state optimization: Balance latency and power consumption

Explicit SA4 Dependencies and Coordination Requests

Traffic Characteristics Clarification

Multiple contributions request SA4 input on:

  1. Token communication characteristics:
  2. Token importance levels and granularity
  3. Error tolerance properties
  4. Token-to-PDU mapping
  5. Dependency between tokens
  6. Whether tokenization increases/decreases data size
  7. Visibility of tokens to RAN

  8. Packet-level characteristics:

  9. Packet delay budget
  10. Packet size distributions
  11. Packet arrival rates and patterns (streaming vs bursts)
  12. Packet error rate tolerance
  13. Packet importance variability

  14. Data compression characteristics: Impact on traffic patterns

  15. Multi-modality aspects: Synchronization requirements and characteristics

AI Codec Study Coordination

Several contributions explicitly reference or request coordination on:

  • TR 26.847 alignment: Token communication definitions
  • AI representation format clarification: Scope and characteristics
  • AI codec vs non-AI codec traffic: Differentiation and handling
  • Timeline and scope of SA4 AI codec study: Critical for RAN-2 planning
  • Trace data provision: For derivation of packet size, arrival rate, delay budget, success rate

Service Type and Application Clarification

Requests for SA4 input on:

  • Service types definition: Categories and characteristics
  • Use case traffic patterns: Specific application behaviors
  • Intermediate data characteristics: From TR 26.927
  • End-to-end latency requirements: Impact on RAN design
  • Traffic encryption: Whether packets are encrypted at application layer

PDU Set and Annotation

  • PDU Set annotation: Importance and token information
  • PDU Set binding for AI traffic: Token set/burst definitions
  • PDU Set handling: AI-specific requirements

Specific Questions to SA4

  1. China Telecom (R2-2600685): More details on tokens and service types
  2. Spreadtrum/UNISOC (R2-2600673): Token-to-PDU mapping, importance granularity definition, UE processing requirements
  3. Panasonic (R2-2600757): Packet delay budget, packet size, error tolerance of token traffic
  4. Lenovo (R2-2600745): Confirm AI traffic characteristics including data compression, error tolerance, token importance, multimodality, burstiness, unpredictability
  5. CMCC et al. (R2-2600965): Whether token traffic characteristics align with TR 26.847
  6. Apple (R2-2600446): Input on token communication, delay budget, relative priority
  7. Nokia (R2-2600315): PDU Set annotation, importance and token information for AI traffic
  8. Fujitsu (R2-2600347): Tokenized AI feedback
  9. Samsung (R2-2600389): PDU Set handling of AI-related traffic
  10. HONOR (R2-2600515): Mobile AI arrival patterns (streaming or bursts) and corresponding characteristics
  11. NEC (R2-2600552): Intermediate data traffic characteristics from TR 26.927
  12. Peng Cheng Lab (R2-2600153): PDU Set binding for AI traffic, dependency structure, traffic model input
  13. OPPO et al. (R2-2600206): Timeline and scope of SA4 study, whether token/packet characteristics are in scope, AI representation format clarification
  14. Sharp (R2-2600183): Token traffic characteristics support
  15. CATT (R2-2600242): Burst traffic confirmation, end-to-end latency impact, encryption status, importance and error tolerance modeling, RAN visibility of tokens
  16. vivo (R2-2600074): Burst characteristics, end-to-end latency, traffic encryption, error tolerance, AI token characteristics, token visibility to RAN
  17. Huawei/HiSilicon (R2-2600148): Trace data for packet characteristics, whether AI codec apps have error tolerance and variable packet importance

Liaison and Coordination Proposals

Several contributions propose formal coordination:

  • OPPO et al. (R2-2600206): Inform SA4 about RAN-2 decisions and progress, get timeline/scope information
  • vivo (R2-2600074): Inform SA4 that RAN-2 leads AI traffic work in RAN
  • Peng Cheng Lab (R2-2600153): Send LS to SA2/SA4 to clarify service awareness points
  • Ericsson (R2-2600885): Coordinate with SA (not only SA4 but SA in general)

Divergent Views and Open Issues

Traffic Model Approach

  • Qualcomm et al. (R2-2600138): Adopt XR traffic models for real-time, MBB models for non-real-time
  • AT&T (R2-2600890): Proposes text-based conversational GenAI traffic model (suggests RAN1 scope)
  • Ericsson (R2-2600885): Cautions against optimizing for specific AI applications

Scope and Prioritization

  • MediaTek (R2-2600901): Stop referring to tokenizer, enhance UL, wait for AI codec study in SA4
  • CATT (R2-2600242): Prioritize network inference in RAN-2
  • Lenovo (R2-2600745): Rel-20 XR Phase 4 focus on uplink, unified framework in 6G
  • Hanbat Univ (R2-2600409): Include AI native RAN traffic and RedCap

Error Tolerance Determination

  • NTT/Docomo (R2-2600978): RAN-2 to proactively determine error tolerance based on AI task and source data type, with concrete test results provided
  • Multiple others: Request SA4 to define error tolerance characteristics

XR Relationship

  • Ericsson (R2-2600885): Need to understand difference between XR and AI traffic
  • Fujitsu (R2-2600347): Need gap analysis from XR
  • Multiple others: Propose reusing XR mechanisms (PDU Set, traffic models)

Technical Contributions Summary by Topic

Token Communication (17 contributions)

Nvidia, Offino, China Telecom, Spreadtrum/UNISOC, Panasonic, Lenovo, CMCC et al., Apple, Nokia, Fujitsu, Samsung, HONOR, Peng Cheng Lab, OPPO et al., Sharp, CATT, vivo

Key aspects: Importance differentiation, error tolerance, dependency, compression, RAN visibility, PDU mapping

Burst Traffic Handling (20+ contributions)

Nearly universal recognition of bursty, aperiodic traffic requiring specific RAN enhancements

Uplink Enhancement (15+ contributions)

Strong consensus on Rel-20 focus for uplink mobile AI traffic with burstiness, unpredictability, and interactive characteristics

Multi-modality (8 contributions)

Fraunhofer, Meta/Qualcomm et al., ZTE, Peng Cheng Lab, Samsung, Lenovo, HONOR, Nokia

Key aspects: Synchronization, MMSID usage, multi-device scenarios

Error Tolerance (12 contributions)

Offino, China Telecom, Spreadtrum/UNISOC, Panasonic, Lenovo, CMCC et al., NTT/Docomo, Fujitsu, Samsung, CATT, vivo, Huawei/HiSilicon

Key aspects: Variable tolerance, task-dependent, token-specific, importance-based

Service Awareness (6 contributions)

Nvidia, Nokia, ZTE, Peng Cheng Lab, Xiaomi, Huawei/HiSilicon

Key aspects: Context-aware flow, L2 scheduling, UE-assisted coordination

Dynamic QoS (7 contributions)

ZTE, Meta/Qualcomm et al., Samsung, Nokia, Apple, HONOR, Huawei/HiSilicon

Key aspects: Flexible adaptation, constrained latency, relative priorities

Recommendations

The document recommends taking into account the explicit dependencies from RAN-2 and prioritizing discussion around these key dependencies, particularly:

  1. Token communication characteristics and RAN visibility
  2. Packet-level traffic characteristics (size, arrival patterns, delay budgets)
  3. Error tolerance properties and importance differentiation
  4. AI codec study timeline and scope alignment
  5. PDU Set binding and annotation for AI traffic
  6. Multi-modality synchronization requirements
  7. Traffic model inputs and trace data provision

Total Summaries: 25 | PDFs Available: 25