All Summaries

Meeting: TSGS4_135_India | Agenda Item: 10.5

17 documents found

Back to Agenda Table View
Huawei Tech.(UK) Co.. Ltd
Title

Network, QoS and UE Considerations for client side inferencing AIML/IMS

Network, QoS and UE Considerations for Client Side Inferencing AIML/IMS

1. Introduction

This contribution addresses network-related issues in the previously discussed call flow for client/UE side inferencing (S4aR260004a). The main concerns relate to steps 12-16 of the draft call flow, which involve model download and deployment for UE-based AI inferencing.

2. Network Related Issues

2.1 Model Size

Problem Identification: - TR 26.927 indicates models are approximately 40 MB (Table 6.6.2-1) - Current publicly available models for practical use cases are significantly larger (100+ GB) - Example: Hunyuan Image generation model set is 169 GB (available on Hugging Face) - Simple language models (e.g., single language translation) are approximately 100 MB

Required Action: Details on supported model sizes and required response times need to be defined.

2.2 Network QoS Support

Problem Identification: - For real-time request-response (500 ms or even 1000 ms), current mobile networks cannot support required bit-rates - Example calculation: 100 GB model with 1000 ms response time requires ~800 Gbps - Such bit-rates are not realistic in current mobile networks

Required Actions: - Define supported model size and transfer time requirements - Identify appropriate QoS profile (5QI) - If no suitable 5QI exists, request SA2 to update 5QI specifications for this use case

2.3 Compression and UE Support

Problem Identification: - TR 26.927 details NN compression with 2-20% compression ratios - Even with compression, resulting bit-rates remain infeasible for mobile networks - No UE capabilities for NN codec support have been defined - Cannot assume UE support for such capabilities

Required Action: Clarify whether NNC is required for client-side inferencing and document related requirements.

2.4 Protocol Support Issue

Problem Identification: - S4aR260004a mentions HTTP for download - HTTP/TCP is suboptimal for large, quick data downloads due to: - TCP slow start - Congestion control introducing additional latency - Tail latency from head-of-line blocking

Proposed Solutions: - Consider alternative protocols: - RTP protocol with 3GPP burst QoS - QUIC (has bindings to 5G XRM framework for improved QoS support) - Leverage 3GPP XRM QoS support for bursty data transfer (HTTP/3 with QUIC or RTP)

2.5 Caching and Bandwidth Wastage

Problem Identification: - Current call flow indicates model download for every request - No explicit caching or model update mechanism - Results in: - Huge bandwidth wastage - Impossible network bit-rate requirements in current mobile networks

Required Action: Include model updates and caching mechanisms in call flow rather than requesting new model from network each time.

3. Suggested Way Forward

The contribution emphasizes that the intention is not to exclude UE inferencing (as agreed for the work item), but to clarify limitations and requirements before agreeing to a CR detailing such call flows.

Proposed Actions:

  1. Scope Limitation: Add note that client-side inferencing only works for simple cases:
  2. Explicitly exclude complex VLM/LLM
  3. Define maximum model size limits
  4. Specify applicable use cases for smaller models

  5. Latency Requirements: Clarify end-to-end latency requirements and derive required bit-rate/latency and loss profiles

  6. Protocol Clarification: Clarify correct protocol usage (typically not HTTP/TCP) to support the use case with required latency

  7. SA2 Coordination: Ask SA2:

  8. How such bursts can be supported
  9. Whether new QoS profile is needed or if existing profiles suffice

  10. Codec Support: Clarify required neural network codec support (if any) for the UE

  11. Caching Mechanism: Add caching and model update mechanisms in call flow to avoid downloading model for each task


Nokia
Title

[AI_IMS-MED] AI/ML media processing and task updating

Summary of S4-260112: AI/ML Media Processing and Task Updating

Document Overview

This contribution proposes updates to AI/ML media processing procedures and task updating call flows for IMS Data Channel (DC) applications. It builds upon TR 26.927 and TS 23.228 Annex AC, incorporating agreements from SA4#134 (S4-252075) and addressing feedback from SA2's reply LS on AIML for Media.

Main Technical Contributions

1. Refinement of AI/ML Task Processing Call Flows

Issues Identified with TR 26.927

  • Architectural ambiguity: Split media processing location (UE vs network) was unclear
  • DC AS introduction timing: Not properly specified after ADC establishment
  • Confusing step numbering: Parallel options (5a, 5b) caused confusion
  • MRF references: MRF should be removed as SA2 clarified it doesn't play a role in Data Channel (removed from TS 23.228)
  • Incomplete task update procedures: Steps 9-10 lacked detail on how UE updates AI/ML inference tasks

Updated Call Flow Structure (Steps 1-23)

The revised flow incorporates common call flows agreed in S4-252075:

Initial Setup (Steps 1-13): - UE1 registers to IMS with AI/ML capability indication - MMTel session establishment between UE1 and UE2 - Bootstrap Data Channel (BbDC) establishment between UE1 and MF - DCSF creates DC application list based on: - Subscription list filter - UE static capabilities - Application list includes AI service information (e.g., intelligent translation service) - UE1 downloads application list and selects application - Application Data Channel (AaDC) establishment between UE1 and DC AS - Task selection and AI/ML model selection

Media Processing Execution (Steps 14-16): - Media session runs over MMTel session - UE1 executes selected task and transmits input media streams - Network runs inference and forwards processed streams to UE2 (or UE1, or both depending on application) - Different alternatives supported based on inference location (local/remote/split)

2. Task Reselection and Update Mechanisms

Task Reselection (Step 17)

  • Trigger: New actions in applications or other triggers during session
  • Process: UE1 reselects tasks from previously downloaded task metadata
  • Flow: Returns to step 10 (task selection from app manifest)

Task Update (Steps 17-23)

  • Use Case: New requirements during running IMS session not fulfilled by downloaded tasks
  • Example: New callee (UE3) joins call speaking new language requiring additional translation

Update Procedure: - Step 17: UE1 sends UPDATE Task request over ADC with: - Task ID - New parameters - Start time (when to apply new parameters) - Optional additional parameters

  • Steps 18-19:
  • MF checks request and reconfigures task
  • MF may reject invalid requests
  • MF may establish new application DC or media flows if needed
  • MF may stop existing flows no longer needed
  • MF forwards UPDATE Task request to DC AS if needed
  • DC AS reconfigures task according to new parameters

Alternative Execution Paths (Steps 20-22): - Alt a - Local Inference: - DC AS sends UPDATE Task response (including new models) to UE1 via MF - UE1 runs updated inference task locally

  • Alt b - Remote/Split Inference:
  • DC AS sends UPDATE Task response to UE1 via MF
  • UE1 transmits media streams to network for inference
  • Network runs inference and forwards processed streams to UE2

  • Step 23: Remote UE (UE2) informed when task updates impact it

3. Task Control Messages

3.1 START Task Message

Purpose: Request to start an inference task (for split or remote inference)

Message Content: - id: Message identifier - type: "urn:3gpp:aiml:start-task" - task_id: Task identifier (e.g., "speech-to-speech-translation") - parameters: Task-specific parameters (e.g., inputLanguage, outputLanguage) - input: Protocol and media stream identifier (mid from SDP) - output: Protocol and media stream identifier - timestamp: Timestamp of request

Response Message: - task_session_id: Unique identifier for specific task instance - response_code: Status (e.g., "200 OK") - Echoes task_id and parameters

Media Stream Identification: Uses "mid" identifier from RFC 8843 as included in SDP offer/answer. Multiple RTP streams identified by comma-separated mid values.

3.2 UPDATE Task Message

Purpose: Update existing task that has already been started (requires prior 200 OK response to START Task)

Use Cases: - Update model, parameters, input or output of existing task - Indicate new input/output stream (e.g., new UE added to call)

Message Content: - id: Message identifier - type: "urn:3gpp:aiml:update-task" - task_id: Task identifier - task_session_id: References task from START Task Response - parameters: Updated parameters - output: Updated output stream information - timestamp: Timestamp of request

Response Message: - task_session_id: Same as in request - response_code: Status indication - Confirms task_id

Key Technical Clarifications

Inference Location Flexibility

The specification supports three inference deployment models: 1. Local inference: AI model downloaded and executed in UE 2. Remote inference: Inference executed in network (MF) 3. Split inference: Inference split between UE and network

Message Exchange Protocol

  • Task control messages exchanged over Application Data Channel (AaDC)
  • Messages use structured format with JSON-like syntax
  • Unique identifiers (task_session_id) maintain task context across updates

Network Entity Roles

  • DCSF: Creates and filters DC application list based on subscription and UE capabilities
  • MF: Manages media flows, coordinates with DC AS, executes inference tasks
  • DC AS: Provides AI applications and models, reconfigures tasks
  • MRF: Explicitly removed from procedures (per SA2 clarification)

Editorial Notes

  • Network functional entity for inference task execution depends on SA2's reply LS
  • Further details on message formats to be provided in future contributions

Samsung Electronics Iberia SA
Title

[AIML_IMS-MED] Base CR for TR 26.114

3GPP Technical Document Summary: CR 0607 to TS 26.114

Document Information

  • CR Number: 0607
  • Specification: TS 26.114 v19.2.0
  • Category: B (addition of feature)
  • Release: Rel-20
  • Work Item: AIML_IMS-MED
  • Source: Samsung Electronics Iberia SA

Purpose and Rationale

This Change Request introduces stage 3 specifications for AI/ML processing capabilities in IMS services. The CR addresses the missing technical specifications for AI/ML data delivery and signaling mechanisms required to support AI/ML-enhanced IMS services in Release 20.

Main Technical Contributions

1. References, Terms, and Abbreviations (Clauses 2, 3.1, 3.3)

Updates to include AI/ML-specific terminology, definitions, and abbreviations relevant to IMS services. Specific content marked as Editor's Notes for future completion.

2. New Annex AC: AI/ML Assisted Media Processing for MTSI

A comprehensive new normative annex is introduced covering all aspects of AI/ML integration with MTSI.

AC.1 Introduction

Provides introductory material on AI/ML capabilities in IMS services.

AC.2 Terminal Architecture

Defines updates to terminal architecture to accommodate: - Inference engine - AI/ML models - Intermediate data handling

AC.3 End-to-End Reference Architecture

Potential updates to the end-to-end reference architecture for AI/ML support. Notes indicate possible liaison requirements with SA2.

3. AI/ML Call Flows (AC.4)

AC.4.1 AI/ML Model Delivery for Device Inferencing

Detailed 15-step call flow for AI/ML model delivery and execution:

Key Steps: 1. Session Establishment: MMTel service establishment 2. Bootstrap Data Channel (BDC) Setup: Established between UE and MF per TS 23.228 3. Application Discovery: UE requests application list via HTTP over BDC 4. Application List Creation: DCSF generates user-specific DC application list with metadata including: - Generic app information (description, ID, URL) - AI-specific information (AI feature tags, task descriptions) 5. Application Selection: User selects app based on AI service descriptions 6-9. Application Download: Selected AI application downloaded from DCSF via MF to UE, including AI task metadata (task manifest) 10. Task Selection: User presented with AI task list and selects desired tasks 11. Model Request: Selected tasks and models communicated to MF via: - BDC: HTTP GET with task/model URLs - ADC: AI Model Selection Request with model URNs 12. Model Retrieval: MF fetches AI models from either: - 12a: DCAR via DCSF - 12b: DC AS 13. Model Download: UE downloads AI models from MF via: - BDC: HTTP response with AI models as resource - ADC: AI Model Selection Response with model data 14. Inference Execution: Tasks executed on UE 15. Task Reselection: User/UE may reselect tasks during session using received metadata

Open Issues Identified: - Whether MF needs to understand AI task semantics (FFS) - Application types that can be handled - Large model handling mechanisms

AC.4.2 Network Inferencing

Placeholder for network-based inference scenarios.

AC.4.3 Split Inferencing

Placeholder for distributed inference scenarios across UE and network.

4. AI/ML Capabilities (AC.5)

Defines capabilities and requirements for: - AC.5.1 UE Capabilities: Device-side AI/ML requirements - AC.5.2 Network Capabilities: Network-side AI/ML requirements

5. AI/ML Formats (AC.6)

Specification of formats for: - AI/ML models - Intermediate data

6. AI/ML Metadata (AC.7)

Definition of necessary metadata structures for AI/ML operations, including task manifests referenced in the call flows.

7. Negotiation and Signaling (AC.8)

Procedures for: - Model delivery negotiation - Inferencing coordination - General AI/ML media processing signaling

8. Data Channel Transport (AC.9)

Specification of AI/ML data transport mechanisms: - What data to transport over BDC (Bootstrap Data Channel) - What data to transport over ADC (Application Data Channel) - Transport procedures and protocols

Key Technical Entities

  • MF: Media Function
  • DCSF: Data Channel Selection Function
  • DCAR: Data Channel Application Repository
  • DC AS: Data Channel Application Server
  • BDC: Bootstrap Data Channel
  • ADC: Application Data Channel

Implementation Status

Most technical content is marked with Editor's Notes, indicating this is a skeleton CR establishing the structure for future detailed specifications. The most complete section is AC.4.1 (AI/ML model delivery for device inferencing), which provides a concrete call flow example.


Samsung Electronics Iberia SA
Title

[AIML_IMS-MED] Further details on DC app list

Further Details on DC Application List

Introduction

This contribution consolidates relevant text from existing 3GPP specifications (TS 23.228, TS 29.176, and TS 26.114) regarding the Data Channel (DC) application list request mechanism, particularly focusing on root URL replacement procedures. The document aims to clarify how these procedures are already defined in the context of Bootstrap Data Channel (BDC) setup and proposes their reuse for AIML_IMS-MED work.

Relevant Specifications Overview

Bootstrap Data Channel Setup Signalling (TS 23.228 Clause AC.7.1)

The specification defines the complete BDC establishment procedure for person-to-person use cases where the Media Function (MF) anchors the bootstrap data channel:

Key Steps:

  • Steps 1-2: UE#1 sends SIP INVITE with initial SDP containing bootstrap DC offers. IMS AS validates user subscription and determines if DCSF notification is required.

  • Steps 3-6: IMS AS notifies DCSF via Nimsas_SessionEventControl_Notify with session parameters. DCSF determines policy, reserves MDC1 media information for both originating and terminating sides, and responds with Nimsas_MediaControl_MediaInstruction containing:

  • MDC1 media endpoint addresses
  • DC Stream ID
  • Replacement HTTP URL representing the application list offered via MDC1 interface

  • Steps 7-10: IMS AS selects MF and invokes Nmf_MRM_Create to allocate DC media resources. The request includes information for both Mb and MDC1 interfaces. MF responds with negotiated media resource information.

  • Steps 11-19: SDP negotiation completes through terminating network, with similar DCSF/MF resource allocation on terminating side. Bootstrap data channels are established.

  • Steps 20-24: Critical application list request flow:

  • UEs send application request messages to MF via established bootstrap data channel
  • MF replaces the root URL with the replacement URL received in step 8
  • MF forwards message to DCSF media endpoint
  • DCSF provides application list and DC applications to UEs based on capabilities and choices through MF
  • Either UE may select applications from local or remote DCSF (subject to DCSF policy)

Media Control Service Operation (TS 23.228 Clause AA.2.4.3.2)

The Nimsas_MediaControl_MediaInstruction service operation defines the MediaInstructionSet structure:

DC Media Specification includes:

  • Media proxy configuration (HTTP or UDP)
  • MDC1/MDC2 media endpoint address
  • Replacement HTTP URL per stream ID allocated by application layer representing the application list (e.g., graphical user interface) provided to IMS subscriber via MDC1 interface (used only in BDC establishment)
  • Data Channel Mapping and Configuration information
  • DC Interworking indication
  • Data Channel Port and SCTP Port

Media Instructions supported: - TerminateMedia - OriginateMedia - TerminateAndOriginateMedia - UpdateMedia - DeleteMedia - RejectMedia

MF Resource Management (TS 29.176 Clause 5.2.2.2)

The Nmf_MRM_Create service operation defines how NF service consumer (IMS AS) requests media context creation:

For DC media resource type, the request includes:

  • Media proxy configuration in mediaProxyConfig attribute
  • Data channel mapping and configuration in streams attribute (SCTP stream ID, subprotocol, order, maxRetry, maxTime, priority)
  • Remote SCTP and DTLS endpoint information in remoteDcEndpoint
  • Optional maximum message size

For bootstrap data channel specifically:

  • Remote MDC1 media specification in remoteMdc1Endpoint attribute within Mdc1Info data type
  • Replacement HTTP URL for each streamId allocated by application layer representing the application list offered to IMS subscriber via MDC1 interface

For P2A/A2P application data channel:

  • Remote MDC2 media specification in remoteMdc2Endpoint attribute within Mdc2Info data type

Data Channel Application Definition (TS 26.114 Clause 6.2.10.1)

Data channel application consists of: - HTML web page including JavaScript(s) - Optionally image(s) and style sheet(s)

Bootstrap data channel is defined as: - Data channel used to retrieve DC application(s) for DCMTSI client - Data channel stream ID below 1000 - Uses HTTP protocol as data channel subprotocol - Application accessible at HTTP root ("/") URL describes GUI and logic for further data channel usage - Authority (host) part of URL and "Host" HTTP header shall be ignored on reception and set to empty value by DCMTSI client

Discussion

The complete flow for application list handling is already specified:

  1. DCSF provides replacement HTTP URL to IMS AS in MediaInstructionSet (representing application list)
  2. IMS AS forwards replacement HTTP URL to MF during resource allocation via Nmf_MRM_Create
  3. UE sends HTTP GET request for application list to MF via bootstrap data channel
  4. MF performs root URL replacement using the replacement HTTP URL received from IMS AS
  5. MF forwards request to DCSF media endpoint (MDC1)
  6. DCSF provides application list and selected DC applications to UE through MF

The specifications explicitly state that "the details of how to provide the application list to the UE and how to use it by the UE are not defined in TS 23.228," but the transport mechanism and URL replacement procedures are fully defined.

Proposal

From the UE perspective, the following procedures are already well-defined in TS 23.228 as part of BDC setup signalling:

  • Request of an application list
  • Download of the application list
  • Request of a selected application
  • Download of the selected application

For AIML_IMS-MED work:

  1. Reuse these existing procedures and HTTP protocol for the same purposes
  2. Capability exchange negotiation between UE and MF (e.g., for task and/or model selection) should happen after the selection and download of a DC application
  3. Capability exchange should occur via an application data channel established for that specific application

This approach leverages existing standardized mechanisms and maintains consistency with current IMS DC architecture.


Samsung Electronics Iberia SA
Title

[AIML_IMS-MED] Call flow for split inferencing

Summary of S4-260129: Call Flow for Split Inferencing

Document Information

  • Source: Samsung Electronics Co., Ltd.
  • Meeting: TSG-SA WG4 Meeting #135 (February 2026, Goa, India)
  • Work Item: AIML_IMS-MED
  • Purpose: Approval of call flow for split inferencing

Main Technical Contribution

This document proposes a detailed call flow for split inferencing in IMS-based AI/ML services, where AI model execution is distributed between the UE and network elements (MF - Media Function). The contribution is intended for inclusion in clause AC.4.3 of the base Change Request.

Split Inferencing Call Flow

Session Establishment and Bootstrap (Steps 1-2)

  • MMTel service establishment
  • Bootstrap Data Channel (BDC) establishment between UE and MF per TS 23.228, clause AC.7.1

Application Discovery and Selection (Steps 3-6)

  • Application List Request: UE requests available DC applications from MF via HTTP over BDC
  • MF Routing: MF replaces root URL with replacement URL and forwards to DCSF
  • Application Metadata Creation: DCSF creates user-specific DC application list based on:
  • User subscription information
  • Application metadata including:
    • Generic app information (description, app ID, URL)
    • AI-specific information (AI feature tag indicating requirements, AI task descriptions)
  • Metadata Delivery: DCSF provides application list URL and metadata to UE via MF
  • User Selection: User selects application based on AI service description and AI task annotations

Application Download (Steps 7-9)

  • UE requests selected application from MF
  • MF fetches AI application from DCSF
  • Application downloaded to UE via BDC along with AI task metadata (expressed as task manifest per clause AC.7)

AI Task Selection and Configuration (Steps 10-13)

  • Task Presentation: User presented with list of AI tasks supported by application, including:
  • Annotations from AI task metadata
  • Task description information
  • Information on execution endpoints supported by each task/subtask
  • User Task Selection: User selects desired AI task(s)
  • Application Data Channel: Established between UE and DC AS per TS 23.228, clause AC.7.2
  • Split Configuration Decision: UE identifies which tasks/AI models to execute locally vs. in network based on:
  • User-selected AI tasks
  • AI task metadata
  • UE capabilities
  • Configuration Request: UE requests split inference configuration from network, identifying AI models for UE and network execution

Model Distribution and Configuration Response (Steps 14-16)

  • Requirements Check: MF verifies requirements for network-side AI tasks/models; MF reallocation if requirements not met
  • Model Fetching: MF obtains AI models for both UE and network execution from either:
  • DCAR via DCSF (step 15a), or
  • DC AS (step 15b - alternative)
  • Configuration Response: MF sends response to UE including AI models for UE execution

Inference Execution (Steps 17-22)

  • SDP Re-negotiation: Associates media/data/intermediate data flows between UE and MF with corresponding tasks
  • UE Inference: Tasks designated for UE execution are performed
  • Data Transfer to Network: Output (media/data/intermediate data) from UE tasks sent to MF
  • Network Inference: MF executes tasks designated for network execution
  • Result Delivery: MF sends output (results or intermediate data for further UE processing) to UE
  • Optional Further UE Processing: UE may execute additional tasks as part of selected AI task(s)

Dynamic Task Reselection (Step 23)

  • User/UE may reselect AI tasks during session using AI task metadata from step 9
  • On reselection, flow returns to step 12 (split configuration decision)

Key Technical Features

Metadata Framework

  • Application metadata includes both generic and AI-specific information
  • AI task metadata (task manifest) provides detailed information on:
  • Task descriptions
  • Execution endpoint options
  • Requirements for split execution

Flexibility in Execution Distribution

  • UE determines split configuration based on capabilities and metadata
  • Network validates requirements and may reallocate MF resources
  • Dynamic task reselection supported during active session

Model Distribution Options

  • Multiple sources for AI model retrieval (DCAR via DCSF or DC AS)
  • Models distributed to both UE and network as needed for split execution

Media/Data Flow Management

  • SDP re-negotiation ensures proper association of data flows with tasks
  • Support for intermediate data exchange between UE and network for multi-stage inference pipelines

InterDigital Finland Oy
Title

[AIML_IMS-MED] Call flow for split inferencing

Comprehensive Summary of S4-260180: Call Flow for Split Inferencing

Document Overview

This change request proposes updates to the AIML call flow for split inferencing in IMS-based media services. It revises the previously agreed device inferencing call flow (S4aR260014) to accommodate split inferencing scenarios where AI model execution is partitioned between the UE and network-based DC AS (Data Channel Application Server).

Main Technical Contributions

1. Split Inferencing Capability Indication

Key Addition: - The UE now indicates split inferencing availability in the application request message sent to the MF (Media Function) when requesting the application list via the Bootstrap Data Channel (BDC) - This allows the network to understand the UE's capability to participate in distributed AI inference

2. Enhanced Application and Task Selection

Application Metadata Enhancements: - Application-related metadata now includes: - Generic app information (description, app ID, URL) - AI-specific information including AI feature tags indicating AI requirements - AI task-related descriptions for user-informed selection

Task Metadata: - AI task metadata is delivered with the application, potentially expressed as a task manifest - Task list presented to users includes annotations from AI task metadata - Execution endpoints supported by each task and subtask are now exposed to enable split inference decisions

3. Model Partitioning Framework

Partitioning List Introduction: The CR introduces a comprehensive partitioning framework:

Request Phase (Step 10): - UE requests both a model list and a partitioning list from DCAS - UE provides its capability metadata to enable appropriate partitioning options

Partitioning Metadata Definition: The partitioning list/submodel partitioning metadata specifies: - Submodel identifiers - unique identification of model partitions - Execution endpoints - where each submodel executes (UE vs. network) - Input/output tensor characteristics - data interfaces between submodels - Operational characteristics - performance and resource requirements

Download Phase (Step 12): - UE downloads both the model list and partitioning list corresponding to its capabilities

4. User-Driven Partition Selection

Selection Criteria (Step 13): - User is presented with lists of both models and partitions supported by the UE - User selects desired AI model(s) and partition - Partition selection may be based on: - Load distribution preferences - Battery impact considerations - Other task execution preferences

5. Split Inference Configuration and Execution

Configuration Phase (Step 14): - UE configures split inference with DCAS by selecting: - A specific model - A specific partition - From these selections, the corresponding submodel(s) to be executed are derived

Server-Side Preparation (Step 15): - DCAS prepares the server-side execution context - DCAS registers the sub-model(s) and associated metadata with the selected partitioning

Configuration Confirmation (Step 16): - DCAS indicates whether the requested configuration is accepted - DCAS confirms readiness to execute the server-side sub-model(s)

Submodel Deployment (Steps 17-18): - Selected tasks/models and corresponding AI submodels are communicated to DCAS - UE downloads the AI submodel(s) corresponding to subtasks to be executed on the device side

Execution (Step 19): - Tasks identified for split inference between UE and DCAS are executed in a distributed manner

Key Differences from Device Inferencing

The main distinctions from pure device inferencing include:

  1. Distributed execution model - inference split across UE and network
  2. Partitioning metadata - new information element defining how models are divided
  3. Negotiation phase - explicit configuration of split points and execution distribution
  4. Submodel management - separate handling of device-side and server-side model components
  5. Execution coordination - mechanisms for DCAS to prepare and confirm readiness for server-side execution

Open Issues

The document notes one FFS (For Further Study) item: - How device capabilities are sent to obtain an accurate list of models (noted after Step 6)


InterDigital Finland Oy
Title

[AIML_IMS-MED] Negotiation messages

Summary of 3GPP Technical Document S4-260181

Document Overview

This is a revision of S4aR260012 proposing additional details for negotiation messages and associated metadata in support of AI/ML-based media services (AIML_IMS-MED). The document provides JSON-formatted metadata examples and updates to align with the agreed call flow from S4aR260014.

Main Technical Contributions

1. Negotiation Message Summary Table (Section A.4.2)

The document introduces Table A4.2-1 which defines the complete set of negotiation messages for local inferencing call flows. Key updates include:

  • AI_APPLICATION_DISCOVERY_REQUEST/RESPONSE: Discovery of AI/ML application families/types with optional UE capability filtering
  • AI_APPLICATION_REQUEST/RESPONSE: Selection of specific AI/ML application with URN, returning application binary data and metadata
  • CANDIDATE_MODELS_LIST_REQUEST/RESPONSE: Renamed from previous version, exchanges UE capabilities for list of candidate models
  • AI_MODEL_SELECTION_REQUEST/RESPONSE: Model selection using URN(s), returning model binary data and metadata

Each message is mapped to possible HTTP protocol operations (GET, POST, RESPONSE) and associated metadata parameters.

2. Metadata Information Definitions (Section A.4.3)

A.4.3.1 Application Metadata

Defines characteristics and requirements of AI/ML applications including:

  • applicationIdentifier: URN-based identification
  • taskList: Contains task type identifiers, supported task types (ASR, TTS, Translation)
  • Performance constraints:
  • maximumTaskInferenceLatency (milliseconds)
  • minimumTaskInferenceAccuracy
  • maximumLocalEnergyConsumption (joules)
  • taskAccuracy (e.g., mAP score)
  • taskOperationalCharacteristics: computeIntensity, memoryFootprint, latencySensitivity, energySensitivity
  • associatedModels: List of models with modelName and modelDescription

A.4.3.2 Endpoint Capabilities Metadata

Separates capabilities into static and dynamic categories:

Static Capabilities (fixed/infrequently changed): - endpointIdentifier - flopsProcessingCapabilities (peak compute in FLOPS) - macOpProcessingCapabilities (MAC operations) - supportedAiMlFrameworks - accelerationSupported (boolean) - supportedEngines (CPU, GPU, NPU) - supportedPrecision (FP32, FP16, INT8)

Dynamic Capabilities (runtime-dependent): - availableMemorySize - currentComputeLoad - energyMode (Eco/Balanced/Performance) - batteryLevel - acceleratorAvailability

This separation enables both long-term compatibility checks and short-term runtime optimization.

A.4.3.3 Model Information Metadata

Comprehensive model characterization including:

  • Identification: modelIdentifier (URN), taskIdentifier (supports multi-task models)
  • Model properties: modelSize (MB), format, formatVersion, framework, frameworkVersion
  • Input/Output specifications:
  • inputMediaIdentifier, inputType, inputShape
  • outputIdentifier, outputType, outputShape, outputAccuracy
  • Performance metrics:
  • targetInferenceLatency (with hardwarePlatformIdentifier)
  • flopsProcessingCapabilities
  • macOpProcessingCapabilities
  • energyEstimation (joules, platform-specific)
  • Data types: modelDataType (Uint8, Float32, Float16)

3. Generic Negotiation Message Format (Section A.4.4)

Defines a transport-protocol-independent message format for AI metadata exchange over data channels:

Messages Container: - Array of Message objects (1..n cardinality)

Message Data Type includes: - id: Unique identifier within data channel session scope - type: Message subtype enumeration: - CANDIDATE_MODELS_REQUEST - CANDIDATE_MODELS_RESPONSE - AI_APPLICATION_DISCOVERY_REQUEST/RESPONSE - AI_APPLICATION_REQUEST/RESPONSE - AI_MODEL_SELECTION_REQUEST/RESPONSE - payload: Type-dependent message content - sessionId: Associated multimedia session identifier - sendingAtTime: Wall clock timestamp (optional)

This format provides flexibility for various transport protocols (e.g., HTTP) without imposing specific constraints.

Key Design Principles

  1. Separation of concerns: Application, endpoint, and model metadata are independently defined
  2. Static vs. dynamic distinction: Enables efficient capability negotiation and runtime adaptation
  3. Protocol independence: Generic message format supports multiple transport options
  4. Comprehensive metadata: Covers functional, performance, energy, and accuracy requirements
  5. Multi-task support: Models can serve multiple AI/ML tasks
  6. Platform-specific metrics: Latency and energy measurements tied to hardware platforms

Nokia
Title

[AI_IMS-MED] Adaptive Model Delivery

Summary of S4-260182: Adaptive Model Delivery for IMS DC Applications

1. Introduction

This contribution revises previous documents (S4-251799, S4aR250211) on adaptive model delivery, incorporating the agreed call flow for device inferencing from S4aR260014 (agreed in SA4#134). The work builds upon TR 26.927 which documented AI/ML model delivery procedures.

2. Discussion

2.1 Background and Motivation

The document addresses the critical challenge of timely model delivery for UE-centric inference in IMS DC-based AI/ML applications. Key points:

  • Real-time nature of multimedia communication sessions makes startup latency particularly problematic
  • Delayed inference startup adversely affects QoE and service usefulness
  • Adaptive model delivery can mitigate these challenges

2.2 Adaptive Model Delivery Concept

Based on TR 26.927 clause 5.2.2.2:

  • Reduces startup latency by delivering a smaller, lower precision but inference-ready model first
  • Subsequently updates to higher precision through model updates
  • Bit-incremental model update approach was evaluated in TR 26.847 clause 5.4

2.3 Reference Call Flows

The document references two agreed high-level call flows:

General AIML IMS DC Call Flow (from S4-252075)

Key steps include: 1. MMTel service establishment 2. BDC establishment between UE and MF 3. DCSF creates DC application list based on subscription filter and UE static capabilities 4. Application list includes AI service information 5. User selects app based on AI service 6. App download via BDC 7. Task selection and model variant selection 8. ADC establishment 9. Three inferencing modes: Local, Remote, or Split

Device Inferencing Call Flow (from S4aR260014)

Detailed 15-step procedure including: - Application discovery with AI_APPLICATION_DISCOVERY_REQUEST/RESPONSE messages - Application metadata including AI feature tags and task descriptions - Task manifest delivery - Model selection and delivery via BDC or ADC - Support for task reselection during session

3. Technical Proposal

3.1 New Clause: AI/ML Model Delivery to DCMTSI Client

3.1.1 General Model Delivery Procedure

Figure X.X-1: Basic Model Delivery over IMS DC

14-step procedure: 0. UE1 registers to IMS with AI/ML capability indication 1. MMTEL session establishment 2. IMS AS allocates DC resources 3. Session established between UE1 and UE2 4. Bootstrap Data Channel (bDC) establishment 5. DCSF creates subscriber-specific application list 6. Application list delivery over bDC 7. App selection and download with app manifest (includes inference tasks and model lists) 8. UE2 side DC procedures 9-10. Application data channel establishment with DC AS 11-12. Model selection and delivery (from DC AS or DCAR via DCSF) 13. Media exchange over MMTEL session 14. Inference execution on local or remote media

3.1.2 Adaptive Model Delivery Procedure

Figure X.Y-2: Adaptive Model Delivery over IMS DC

Enhanced procedure building on basic delivery:

Steps 1-10: Same as basic model delivery, with lower precision model selection in step 10

Step 11: Request for updatable model via MF

Steps 12a/12b: Model delivery from either: - Option a: DCAR via DCSF - Option b: DC-AS

Step 13: Model download to UE

Step 14: Inference loop starts and continues

Step 15: UE requests model update via MF

Steps 16a/16b: Model update delivery from either: - Option a: DCAR via DCSF
- Option b: DC-AS

Step 17: Model update download via MF

Step 18: UE applies model update to initial model

Step 19: Inference continues with potential for further updates

3.2 Key Technical Features

  • Two-stage delivery: Initial lower precision model followed by precision updates
  • Dual source support: Models and updates can be sourced from either DCAR (via DCSF) or DC-AS
  • Continuous inference: Inference can continue while model updates are applied
  • Flexible model selection: Selection can be performed by UE, MF, or DC AS
  • Session-aware: Procedure integrated with IMS DC session lifecycle

Editor's Notes and Open Issues

The referenced S4aR260014 document contains an Editor's Note indicating: - Whether MF needs to understand AI task semantics requires clarification (FFS) - Application type handling needs clarification - Large model handling procedures need clarification


InterDigital Finland Oy
Title

[AIML_IMS-MED] Negotiation messages for split inferencing

3GPP Change Request Summary: Split Inferencing Negotiation Messages

Document Overview

This contribution (S4-260183) proposes additional messages and associated metadata to enable split inferencing for AI/ML applications in IMS-based media services. It builds upon and updates contribution S4aR260009, with specific focus on defining the differences between device inferencing and split inferencing scenarios.

Main Technical Contributions

1. Negotiation Message Summary Table (Section A.4.2)

Key Addition: Introduction of Table A4.2-1 summarizing all negotiation messages for split inferencing call flows.

The table defines the following message pairs with their associated metadata:

  • Application Discovery Messages:
  • AI_APPLICATION_DISCOVERY_REQUEST (HTTP GET) - carries family/type of AI/ML applications
  • AI_APPLICATION_DISCOVERY_RESPONSE (HTTP RESPONSE) - returns list of AI/ML applications

  • Application Selection Messages:

  • AI_APPLICATION_REQUEST (HTTP GET) - carries URN of selected application
  • AI_APPLICATION_RESPONSE (HTTP RESPONSE) - returns selected application binary and metadata

  • Split Model List Messages:

  • MODELS_LIST_REQUEST (HTTP POST) - carries UE capabilities
  • MODELS_LIST_RESPONSE (HTTP RESPONSE) - returns candidate AI/ML models and partitionings

  • Split Inference Configuration Messages:

  • AI_SPLIT_INFERENCE_CONFIGURATION_REQUEST (HTTP POST) - carries URN(s) of selected models and submodel partitioning
  • SPLIT_INFERENCE_CONFIGURATION_AI_RESPONSE (HTTP RESPONSE) - returns selected models/submodels binary and metadata

  • Model Selection Messages:

  • AI_MODEL_SELECTION_REQUEST - carries URN(s) of selected models/submodels
  • AI_MODEL_SELECTION_RESPONSE - returns selected models/submodels binary and metadata

2. Common Metadata Information (Section A.4.3)

A.4.3.1 Application Metadata

  • Defines characteristics and requirements of applications and associated AI/ML media processing tasks
  • Includes performance, accuracy, energy constraints, and supported models
  • New for split inferencing: Indicates supported split and remote inference modes and whether model supports partitioning

A.4.3.2 Endpoint Capabilities Metadata

Introduces separation between static and dynamic capabilities:

  • Static capabilities: Fixed or infrequently changing properties
  • Processing architecture
  • Peak compute capacity
  • Supported AI/ML frameworks
  • Available execution engines (CPU, GPU, NPU)
  • Supported numerical precisions
  • Hardware acceleration features

  • Dynamic capabilities: Runtime-dependent characteristics

  • Available memory
  • Current compute load
  • Energy mode
  • Battery level
  • Accelerator availability

This separation enables both long-term compatibility checks and short-term runtime optimization.

A.4.3.3 Model Information Metadata

  • Describes functional, structural, and performance characteristics of AI/ML models
  • Includes supported tasks, input/output specifications, resource requirements, latency/energy metrics
  • New: Indicates whether model supports partitioning

3. Split Inferencing-Specific Metadata (Section A.4.3.4)

A.4.3.4.1 Submodel Partitioning Metadata

Major technical contribution: Comprehensive metadata structure for describing model partitioning for split inferencing.

Key metadata elements:

| Field | Description | |-------|-------------| | submodelsPartitioningIdentifier | URN identifying the partitioning configuration | | submodelComposition | Array of submodel objects (1..N) | | submodelIdentifier | URN of individual submodel | | endpointType | Execution location (UE, SERVER, EDGE, CLOUD, CUSTOM) | | subtaskTypeIdentifier | Subtask type supported by submodel | | submodelType | Role in pipeline (HEAD, INTERMEDIATE1, INTERMEDIATE2, TAIL) | | size | Submodel file size in MB | | submodelInputs/Outputs | Tensor specifications (ID, type, shape) | | outputAccuracy | Trained accuracy percentage | | subModelDataType | Data type (Uint8, Float32, Float16) |

Tensor specifications include: - tensorID - identifier for input/output tensor - tensorType - data type (integer, float32, float16) - tensorShape - tensor dimensions (e.g., (1,3,300,300))

JSON Example provided: Complete example showing HEAD submodel on UE and TAIL submodel on DCAS for object detection task.

4. Negotiation Message Format (Section A.4.5)

Generic message structure defined:

Table 5: AI Metadata Messages Format

  • messages: Array of Message objects (1..n)
  • Each message follows Message data type specification

Table 6: Metadata Message Data Type

| Field | Type | Cardinality | Description | |-------|------|-------------|-------------| | id | string | 1..1 | Unique identifier within data channel session | | type | number | 1..1 | Message subtype identifier | | payload | object | 1..1 | Type-dependent message payload | | sessionId | string | 1..1 | Associated multimedia session identifier | | sendingAtTime | number | 0..1 | Wall clock transmission time |

Defined message types: - MODELS_LIST_REQUEST - MODELS_LIST_RESPONSE - SPLIT_INFERENCE_CONFIGURATION_REQUEST - AI_APPLICATION_DISCOVERY_REQUEST - AI_APPLICATION_DISCOVERY_RESPONSE - AI_APPLICATION_REQUEST - AI_APPLICATION_RESPONSE - AI_SERVER_CONFIGURATION_REQUEST - AI_SERVER_CONFIGURATION_RESPONSE - AI_MODEL_SELECTION_REQUEST - AI_MODEL_SELECTION_RESPONSE

Summary of Changes

The CR introduces three main changes:

  1. Complete message taxonomy for split inferencing negotiation with HTTP protocol mapping
  2. Comprehensive metadata definitions covering applications, endpoint capabilities, models, and split-specific partitioning information
  3. Generic message format for AI metadata exchange over data channels with extensible type system

The contribution enables complete end-to-end split inferencing capability negotiation between UE and remote endpoints, with particular emphasis on submodel partitioning metadata that allows flexible distribution of AI/ML model execution across network nodes.


Nokia, Samsung Electronics Co., Ltd
Title

[AI_IMS_MED]On Application Manifest for AIML applications

Summary of S4-260184: Application Manifest for AIML Applications

1. Introduction

This contribution proposes IMS Data Channel (DC) application metadata for AI/ML applications. The document merges metadata elements from S4aR250213 and S4aR250208 based on previous RTC SWG discussions and email exchanges. It addresses comments from RTC Telco Post SA4#134-2 regarding the origin and transfer of the AIML application manifest.

2. Main Technical Contributions

2.1 General Framework for AI/ML Support over Data Channel

The contribution defines AI/ML DC applications as IMS DC applications that: - Interact with AI/ML models (e.g., performing inference on UE) - Communicate AI/ML data - Support different inference paradigms: local inference, remote inference, and split inference

Key architectural elements: - DCSF (via MF) provides policy and subscription-appropriate data channel applications to UE - DC Application Repository (DCAR) stores verified data channel applications - DCSF downloads applications from DCAR for distribution to UE - DCMTSI client uses metadata to select appropriate toolchains or execution environments

2.2 Base Application Manifest Structure

The manifest contains essential information for AI/ML DC applications:

Core elements: - baseUrl: URI template for downloading models with format: baseurl/$taskId$/$version$/$framework$/$subtask$/$variant$/model.$format$ - tasks: Array of AI tasks enabled by the application - taskParameters: Configuration parameters for different conditions - models: Array of AI/ML model objects with metadata

Task-level metadata includes: - taskId: Unique identifier - taskName/description: Human-readable task identifier (e.g., "Speech-to-speech Translation") - version: Task version number - capabilityIndex: Minimum capability requirements - executionCandidate: Supported endpoint locations (e.g., UE or MF)

2.3 Task Input/Output Specification

Task inputs (taskInputs): - taskInputId: Unique identifier - media-type: Input media type - route-to: Specifies subtaskInputId for data routing

Task outputs (taskOutputs): - taskOutputId: Unique identifier - media-type: Output media type - from: Specifies subtaskOutputId for output data origin

2.4 Model Metadata

Each model object contains: - id: Unique model identifier - version: Model version/variant - capabilityIndex: Minimum capability requirements - url: Model download location - latency: Maximum latency requirement (milliseconds) - accuracy: Minimum accuracy requirement (metrics/value/direction - FFS)

2.5 Subtask Metadata (Extension Parameters)

For tasks comprising multiple subtasks, the manifest includes detailed subtask information:

Subtask-level parameters: - id: Unique subtask identifier - function: Description of subtask function - capabilityIndex: Capability requirements (matches AI model capability) - executionTarget: Intended endpoint location - executionFallback: Alternative endpoint when primary unavailable

Subtask inputs (subtaskInputs): - subtaskInputId: Unique identifier - pipe-type: Logic for multiple data inputs (0=first available, 1=wait for all) - media-type: Input media type - from: Origin subtaskOutputId or taskInputId

Subtask outputs (subtaskOutputs): - subtaskOutputId: Unique identifier - media-type: Output media type - route-to: Destination subtaskInputId or taskOutputId

Subtask AI model parameters: - id, capabilityIndex, url, latency, accuracy (as per main model metadata) - contextSize: Maximum input data amount the model can process (typically in tokens)

3. Open Issues

Several aspects remain FFS (For Further Study): - Editor's Note: Definition of AI/ML task may be needed (referencing TS 26.927) - Editor's Note: Whether all fields in tables are needed and their definitions - Editor's Note: Capability index definition and usage - Editor's Note: Clear definition of accuracy metrics - Editor's Note: Pipe-type parameter needs further clarification - Model metadata specification alignment with TR 26.927

4. Document Type

This is a text proposal for the AI_IMS_MED work item, proposing new clauses (marked as "All New Text") to be added to the base CR.


InterDigital Finland Oy
Title

[AI_IMS_MED] Call flow for split inferencing loop

Summary of S4-260185: Call Flow for Split Inferencing Loop

Document Metadata

  • Source: Interdigital Finland Oy
  • Meeting: TSG-SA4 Meeting #135, Goa, India (9-13 February 2026)
  • Work Item: AIML_IMS-MED
  • Type: Change Request / Text Proposal

Main Technical Contribution

This contribution proposes a call flow for split inferencing operations between the UE and DCAS (Data Collection and Analytics Server), building upon previous work in TR 26.927 and earlier contributions.

Split Inferencing Architecture

The proposed call flow describes a collaborative inference execution model where: - The UE and DCAS jointly execute an inference task - The inference workload is split between the two entities - Intermediate inference results are exchanged over the user plane - Communication is facilitated through the MF (Media Function)

Proposed Call Flow Steps

The text proposal adds the following procedural steps:

  1. Configuration Phase
  2. UE and DCAS (via MF) configure intermediate data format parameters over ADC (Application Data Collection)
  3. Parameters include tensor characteristics and compression profile identifiers

  4. UE-Side Processing

  5. UE captures input media data
  6. UE executes its inference subtask using the selected UE submodel
  7. UE generates intermediate data for continuation at DCAS

  8. Data Exchange

  9. UE transmits intermediate data to DCAS (via MF) according to configured format

  10. DCAS-Side Processing

  11. DCAS executes its inference task on received intermediate data
  12. DCAS uses selected Remote submodel
  13. DCAS generates processed media data based on inference results

  14. Result Delivery

  15. DCAS transmits processed media data to UE (via MF)
  16. UE renders the final processed media data

Technical Significance

This proposal enables distributed AI/ML inference for media processing, allowing workload distribution between device and network based on computational capabilities, latency requirements, and network conditions. The standardization of intermediate data format parameters ensures interoperability in split inference scenarios.


InterDigital Finland Oy
Title

[AIML_IMS-MED] AI intermediate data format

Comprehensive Summary of S4-260189: AI Intermediate Data Format

1. Introduction and Scope

This contribution proposes defining an intermediate data carriage format for AI/ML split inferencing, derived from TR 26.927. The document introduces:

  • A description of intermediate data
  • Definition of intermediate data structure
  • An example format structure (proposed as an Annex) including:
  • AI Parameter Set (AIPS) specifying AI-related parameters
  • TLV encapsulation for both AIPS and intermediate data

2. Technical Background and Motivation

2.1 Split Inferencing Requirements

Split inferencing, approved and mandated in 5G, is a key objective of the work item. The solution must support:

  • Different input data types producing intermediate data
  • Multiple media modalities (video, audio, text) without restriction to one
  • An agnostic transport format for 5G use cases

2.2 Source and Derivation

The proposed format is derived from:

  • User-plane data structure in Clause 6.8 of TR 26.927
  • Addition of a partition identifier (previously "split-point identifier") from Clause 6.6 of TR 26.927
  • The partition identifier enables selection of pre-configured partitioning negotiated during configuration phase

2.3 Dynamic Nature of Tensor Characteristics

Tensor characteristics are not static and may change dynamically based on:

  • Resolution of input inference
  • Content of input inference

These characteristics must be conveyed through the user plane for accurate interpretation at the receiving end.

3. Main Technical Contributions

3.1 Intermediate Data Definition (Clause X.X.1)

Key Definition: Intermediate data refers to output tensor(s) computed by a sub-model executing an inference subtask up to a defined and negotiated partitioning, transferred between endpoints (device, edge, server) to serve as input to a subsequent sub-model.

Characteristics: - May be compressed and/or encoded before transmission - Processing shall not alter semantics required by receiving sub-model - Non-persistent, dynamic, and context-dependent - Characteristics (shape, size, format) vary as function of: - Input data - Selected inference partitioning - Runtime configuration

3.2 Intermediate Data Structure (Clause X.X.2)

Configuration Stage: Structure defined and exchanged at configuration stage, referred to as partitioning configuration.

Dynamic Factors: - Input media size/resolution changes may alter tensor shape - Selected partitioning identifies active partitioning among pre-configured options - Selected compression profile (algorithm and parameters) optimized for efficiency

Required Information in Format: - Tensor identifier - Inferred tensor length (derived from current tensor shape) - Partitioning identifier (referencing negotiated configuration) - Compression profile identifier (indicating compression method)

Solution: AI Parameter Set (AIPS) defined to capture information applicable to all tensors and associated data.

3.3 AI Parameter Set (AIPS) Definition (Annex X.X.1-3)

Purpose: Carries metadata (tensor metadata) associated with intermediate data payload.

AIPS Lifetime: - Starts: When decoder first receives and parses AIPS TLV unit - Ends: When: - New AIPS with same or different ai_parameter_set_id is received - New session begins - Decoder is reset - Number of tensors or tensor shape changes

AIPS Fields (Table X.X.13-1):

| Field | Meaning | |-------|---------| | ai_parameter_set_id | Unique ID of AIPS | | split_point_id or partition_id | Key identifier of split point/partition | | num_tensors | Number of tensors | | For each tensor: | | | - tensor_id | Tensor identifier | | - dtype | Data type of tensor data | | - rank | Number of dimensions | | - For each dimension: dimension | Size of dimension | | - compression_profile_id | Compression profile identifier |

3.4 TLV Encapsulation (Clause X.X.2-4)

TLV Message Components: - Type: Indicates payload information - Length: Value of payload - Payload: Data

TLV Unit Types (Table X.X.24-1):

| Type Value | Description | |------------|-------------| | 0 | Reserved | | 1 | AI Parameter Set data (AIPS) | | 2 | Intermediate data | | 3-255 | Undefined |

Encapsulation Scenarios:

  1. AIPS Data Encapsulation (X.X.24.2): TLV unit encapsulating AIPS data as defined in clause 1.3

  2. Single Tensor Encapsulation (X.X.24.3):

  3. TLV unit value comprises AIPS identifier and tensor data
  4. Tensor data includes: tensor identifier, tensor length (optional), tensor payload data
  5. Tensor payload contains flattened byte array, possibly compressed per AIPS compression profile ID

  6. Multiple Tensors Encapsulation (X.X.24.4): TLV unit encapsulating more than one tensor data

4. Key Changes from Previous Version

Terminology Updates: - "Split point" terminology changed to "partitioning" throughout - "Head sub-model" and "Tail sub-model" terminology refined to "sub-model" and "subsequent sub-model"

Structural Additions: - Addition of partition identifier (highlighted as new in original document) - Formalization of AIPS lifetime management - Complete TLV encapsulation framework

5. Proposal for Integration

The document proposes:

  1. Incorporate changes 1 and 2 into a base CR
  2. Include change 3 (AIPS and TLV details) in a dedicated annex for illustration purposes

Qualcomm Inc.
Title

CR on AIML processing in IMS calls

3GPP CR 0608 - AI/ML Processing in IMS Calls

Change Request Overview

Specification: TS 26.114 v19.2.0
Category: B (Addition of feature)
Release: Rel-20
Work Item: AIML_IMS-MED

This CR introduces normative procedures, formats, and signaling for AI/ML assisted media processing in DCMTSI (Data Channel for Multimedia Telephony Service over IMS).


Main Technical Contributions

1. General Framework and Architecture (AD.1, AD.2, AD.3)

Key Definitions

  • AI/ML application: Data channel application providing AI/ML assisted media processing during IMS sessions
  • AI/ML processing task: Well-defined AI/ML functions (e.g., speech-to-text, translation, noise suppression, scene description)
  • AI/ML model: Parameters and metadata required for inference execution
  • AI/ML inference engine: Local UE execution environment (e.g., WebNN-aligned runtime)
  • AI/ML metadata: Data derived from media streams with timing and binding information
  • Task manifest: UTF-8 JSON describing supported tasks and candidate models
  • Model card: UTF-8 JSON describing model identity, format, artifacts, I/O conventions, runtime requirements
  • Model artifact: Downloadable model binary and auxiliary files

Terminal Architecture Requirements

DCMTSI clients must support: - Media engine functions for RTP-based audio/video - Data channel client (bootstrap and application data channels per clauses 6.2.10, 6.2.13) - AI/ML application execution environment (e.g., web runtime) - AI/ML inference engine for local model execution - Capability discovery function (execution devices, operators, data types, resource limits) - Model validation function (integrity/authenticity verification via SHA-256 and digital signatures) - Binding and synchronization function (associates AI/ML tasks/metadata to RTP streams using SDP identifiers and media time anchors)

Reference Architecture

  • UE establishes Bootstrap Data Channel (BDC) to MF for retrieving DC application lists, AI/ML applications, and model artifacts via HTTP
  • DCSF and repositories (e.g., DCAR) provide provisioning of AI/ML applications and models
  • Application Data Channel (ADC) may be established to DC AS for task control, policy exchange, and metadata delivery
  • IMS Media Function does not perform inference or process RTP media for AI/ML purposes

2. Call Flows (AD.4)

AD.4.1 AI/ML Application and Model Delivery for Device Inferencing

14-step procedure:

  1. MMTel service establishment
  2. BDC establishment between UE and MF (per TS 23.228, clause AC.7.1)
  3. UE requests application list from MF via HTTP over BDC; MF forwards to DCSF
  4. DCSF creates user-specific DC application list (JSON/HTML) with:
  5. Generic app info (description, ID, URL)
  6. AI-specific info (AI feature tag, task descriptions)
  7. DCSF provides URL to application list; UE downloads list with metadata
  8. User selects app based on AI service description
  9. UE requests selected app from MF
  10. MF fetches AI application from DCSF
  11. AI application downloaded to UE via BDC with AI task metadata (task manifest)
  12. User presented with AI task list (with annotations from task metadata, execution endpoint info)
  13. Selected tasks/models informed to MF via:
    • BDC: HTTP GET with task/model URLs
    • ADC: AI Model Selection Request with model URNs
  14. MF fetches AI models from:
    • 12a: DCAR via DCSF
    • 12b: DC AS (alternative)
  15. UE downloads AI models from MF via:
    • BDC: HTTP response with model resources
    • ADC: AI Model Selection Response with model data
  16. Tasks executed for inference in UE
  17. User/UE may reselect AI tasks during session using received metadata

Editor's Note: Clarification needed on whether MF understands AI task nature, application handling types, and large model handling.

AD.4.2 On-Device Inferencing and Split Inference Operation

  • User/application selects AI/ML processing task during session
  • AI/ML application performs local capability discovery and selects compatible model artifact
  • Inference engine configured and task bound to RTP media streams using binding rules (clause AD.8)
  • If DC AS coordination required:
  • UE establishes application data channels (clause 6.2.13)
  • Associates with AI/ML application using a=3gpp-req-app SDP attribute
  • Exchanges capability, task, configuration, status via "3gpp-ai" subprotocol (clause AD.9.2)
  • Derived AI/ML metadata used for local rendering and/or transmitted over ADC
  • Metadata includes RTP stream identifier (mid) and media time anchor for alignment with RTP playout

Note: Split inference may use on-device inference for one task (e.g., STT) and DC AS for another (e.g., translation) while keeping RTP media unchanged.


3. Capabilities (AD.5)

AD.5.1 UE Capabilities

DCMTSI clients must determine and expose to AI/ML application: - Supported execution devices (CPU, GPU, NPU, accelerators) - Supported operator sets and data types (per local inference framework) - Resource limits (memory constraints, concurrent task limits) - Availability of audio/video media access points (e.g., decoded media frames)

Web runtime capability discovery may align with WebNN. Capability summary may be conveyed to DC AS using capability message type (clause AD.9.2).

AD.5.2 Network Capabilities

DC AS supporting AI/ML processing may provide: - Repositories and discovery information for AI/ML applications/models - Policy information (restrictions on tasks, model usage, data retention) - Application data channels for coordination with AI/ML application - Note: Network-side inference capabilities are outside Phase 1 scope


4. AI/ML Formats (AD.6)

Mandatory Model Format: - ONNX format conforming to ONNX version 1.16.0 - Minimum required opset version: 18 - Encoding: ONNX Protocol Buffers representation


5. Task Manifest and Model Card (AD.7)

AD.7.1 Task Manifest

UTF-8 JSON object included with AI/ML application delivery, containing: - List of supported tasks and optional subtasks with human-readable descriptions - For each task: candidate model identifiers (model_id, model_version_id) and model card resource reference - Task-specific configuration parameters including RTP stream mid binding requirements

AD.7.2 Model Card

UTF-8 JSON object provided for each candidate model, including: - Model identifier and version identifier - Model format specification (ONNX version, minimum opset, IR version) - Model I/O description: - Tensor element type and shape - Dynamic axes, layout, normalization conventions - Execution constraints: - Required operator support - Required data types - Quantization convention - Minimum resource requirements - Downloadable model artifacts: - Artifact URI, size, content type - Integrity information (SHA-256 digest) - Optional digital signature and key identifier

AD.7.2.1 JSON Schema for Model Card

Comprehensive JSON schema provided defining structure for: - model_card_version: Schema version (semver pattern) - identity: model_id, model_version_id, name, description, publisher, license, timestamps, tasks, languages, tags - format: type (const: "onnx"), onnx_version (const: "1.16.0"), min_opset (≥18), onnx_ir_version, encoding (enum: "protobuf") - artifacts: Array of downloadable artifacts with: - artifact_id, uri, content_type, size_bytes, sha256 - Optional compression (none/gzip/zstd) - Optional signature (alg, kid, sig) - variant (precision, quantization, preferred_devices, max_latency_ms) - selection_constraints (requires_webnn, requires_ops, requires_data_types, min_memory_mib, min_peak_scratch_mib) - io: inputs/outputs (tensorSpec arrays), preprocessing (audio/text), postprocessing (stt/tts), output_application_format - runtime: min_memory_mib, min_peak_scratch_mib, max_concurrent_instances, required_operator_sets, required_data_types, webnn preferences, device_preference - selection_policy: strategy (min_latency/min_energy/best_accuracy/balanced/custom), fallback_order

tensorSpec definition: - name, element_type (float32/float16/int8/int32/uint8/bool) - shape (array with integers or strings for dynamic axes) - Optional layout and dynamic_axes mapping

AD.7.3 Model Artifact Selection and Validation

Procedure: 1. UE performs capability discovery (devices, operators, data types, memory limits) 2. UE filters artifacts satisfying selection_constraints against UE capabilities 3. UE selects preferred artifact based on selection_policy and device_preference 4. UE downloads selected artifact URI via HTTP over BDC 5. UE verifies artifact using SHA-256 digest from model card 6. UE should verify digital signature when provided 7. UE instantiates inference engine and binds model I/O per model card (io.preprocessing, io.inputs, io.outputs, io.postprocessing)


6. Negotiation, Signaling, and Media Time Binding (AD.8)

AD.8.1 Binding to RTP Streams

  • AI/ML tasks operating on RTP media bound to RTP streams using SDP "mid" identifier
  • Task configuration and AI/ML metadata messages include relevant mid value

AD.8.2 Media Time Binding for AI/ML Metadata

  • AI/ML metadata over ADC may experience different delay/jitter vs. RTP media
  • To avoid drift, metadata messages shall include media time anchor derived from RTP media clock of stream identified by mid
  • For audio tasks, media time anchor may use:
  • NTP-based timestamp associated with RTP stream + duration in audio samples, OR
  • RTP timestamp
  • Time anchor representation must be consistent within session for given task
  • When DC AS forwards AI/ML metadata between endpoints, DC AS shall preserve mid binding and media time anchor for receiver alignment with RTP playout

7. Data Channel Transport (AD.9)

AD.9.1 Bootstrap Data Channel Transport

  • BDC uses HTTP subprotocol (clause 6.2.10)
  • AI/ML applications, task manifests, model cards, model artifacts retrieved via HTTP GET over BDC
  • DCMTSI client shall not transmit user media over BDC

AD.9.2 Application Data Channel Transport

Subprotocol: "3gpp-ai" for AI/ML control and metadata
Message Format: UTF-8 encoded JSON objects

Generic Message Types: - capability: UE inference capability summary - task: AI/ML processing task selection and model identifiers - configuration: Task configuration parameters including media stream mid binding and media time anchor representation - status: Lifecycle state and error reporting - metadata: Derived AI/ML metadata bound to media stream (mid) and media time

Detailed schema specified by AI/ML application. For cross-vendor interoperability, schema should be standardized for specific task.

Example metadata message: json { "type": "metadata", "task": "stt", "mid": "audio", "segmentId": 1842, "ntpTs": 381245120, "durSamples": 16000, "text": "...", "conf": 0.87 }


Summary

This CR establishes comprehensive normative framework for AI/ML assisted media processing in DCMTSI, covering: - Complete architecture with on-device and split inference support - Detailed call flows for application/model delivery and runtime operation - Capability discovery mechanisms for UE and network - Standardized ONNX model format requirements - Rich metadata structures (task manifests and model cards with JSON schemas) - Deterministic model selection and validation procedures - Media time binding mechanisms for metadata synchronization - Data channel transport protocols for control and metadata exchange

The framework enables AI/ML tasks (STT, translation, TTS, noise suppression, scene description) while maintaining compatibility with existing DCMTSI media handling.


Fraunhofer HHI, Nokia
Title

[AIML_IMS-MED] NNC web decoder demo

Summary of S4-260197: NNC Web Decoder Demo

1. Introduction

This contribution presents a live demonstration of a web-based Neural Network Codec (NNC) decoder, following up on previous telco discussions where decoding times and end-to-end latency were reported. The demonstration shows substantial latency reductions under realistic download conditions. The document also addresses security concerns regarding WebAssembly (Wasm) that were raised in the previous telco.

2. Decoder Implementation

Technical Architecture

  • Base Implementation: Built on NNCodec and MPEG's reference software NCTM
  • Language: Reuses existing C++ entropy coding (CABAC) components with additional functionality ported from Python to C++
  • Web Deployment: Compiled into WebAssembly (Wasm) library using Emscripten

Supported Features

  • Supports NNC edition 2
  • Limitation: Does not support tools using temporal prediction

Performance Optimizations

  • Parallelization: CABAC decoding parallelized across NNR data units
  • Scheduling Strategy: Prioritizes largest available NNR data unit first to reduce tail latency when multiple units are pending

3. Web Application

Integration

  • Wasm decoder library embedded into JavaScript web application
  • Executable in standard browsers
  • JavaScript application invokes Wasm decoder and provides user interface for timing measurements

User Interface Features

  • Configuration Options:
  • Simulated download rate selection
  • Number of decoding threads selection
  • Execution Modes:
  • Decoding after complete model download
  • Simultaneous download and decoding (progressive decoding of fully received NNR data units)

Measurement Capabilities

  • Download Simulation: Delays availability of each tensor/NNR data unit according to selected throughput
  • Metrics Captured:
  • Decoding time
  • Total end-to-end latency (from download start to complete model decoding)

4. Test Conditions

Model and Configuration

  • Model: Wav2Vec for automatic speech recognition (evaluated in 3GPP TR 26.847)
  • Encoder Settings:
  • Dependent scalar quantization (use_dq)
  • Parameter optimization for DeepCABAC (param_opt)
  • Unary binarization length 11 (cabac_unary_length_minus1)
  • QP −27
  • No data-driven tools

Compression Performance

  • Original Model: ~377 MB (94.4M float32 parameters)
  • Compressed Size: ~49 MB
  • Compression Ratio: ~13%

ASR Performance (LibriSpeech test-clean)

  • Original WER: 3.4%
  • Compressed WER: 3.6%

Test Environment

  • Browser: Brave 1.86.142 (64-bit), Chromium 144.0.7559.97
  • Hardware: Dell Precision 7680 Laptop, Intel Core i9-13950HX, 64 GB RAM
  • OS: Windows 10 Enterprise

5. WebAssembly Security Analysis

The contribution addresses security concerns raised in the previous telco with four key arguments:

5.1 Expert Development and Maintenance

  • Developed within W3C by WebAssembly Working Group
  • Participation from major browser vendors and technology companies (Mozilla, Microsoft, Google, Apple, Intel, ByteDance, Red Hat)
  • Browser support since 2017
  • Actively maintained (latest core draft: 16 June 2025)

5.2 Security Model and Mechanisms

  • Operates under web security model in browsers
  • Key Security Features:
  • Sandboxed execution
  • No implicit privileges
  • Module validation before execution
  • Memory isolation
  • Enforcement of standard browser security policies

5.3 Broad Industry Deployment

Examples of widely deployed Wasm applications: - Adobe Photoshop on the web - Google Earth on the web - TensorFlow.js (WebAssembly backend) - ONNX Runtime Web (Microsoft) - AutoCAD Web - ffmpeg.wasm project

This broad deployment indicates strong industry confidence in WebAssembly's security model.

5.4 3GPP-Specific Considerations

  • IMS DC applications have different threat model than open web
  • Applications come from trusted sources
  • Authentication and authorization required before execution on UE
  • Applications authorized by DCSF/DC-AR before download/execution
  • Precedent: SA4 already considers WebAssembly in TR 26.858 (Study on APIs for 3GPP Speech and Audio Codecs) in clauses 5.3.3 and 6

6. Conclusion

The contribution proposes scheduling a time slot for live demonstration (e.g., during a meeting break) and concludes that WebAssembly is secure for running NNC decoder in web environments based on: 1. Expert-driven standardization and ongoing maintenance 2. Sandboxed execution model and security mechanisms 3. Broad deployment across major browsers and applications 4. Security considerations specific to IMS DC applications


Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe
Title

[AIML_IMS-MED] On Compression of AI/ML data in IMS

Summary of S4-260198: On Compression of AI/ML Data in IMS

1. Introduction and Motivation

This contribution proposes the adoption of efficient compression techniques for AI/ML data transport in IMS services, specifically advocating for the specification of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) as a representation format.

2. Technical Justification

2.1 Use Case Requirements

The document identifies critical challenges in AI/ML data exchange based on SA1 and SA4 use cases:

  • Model delivery for local UE inference: Multiple context-dependent downloads (location, time, task) with limited local storage requiring frequent model re-downloads
  • Incremental AI/ML model updates: Both unidirectional (continuous UE updates) and multidirectional (co-learning between UEs and edge nodes) scenarios

2.2 Benefits of Compression

The contribution highlights three key advantages:

  • Bandwidth Optimization: Reduced model size minimizes data transfer and operational costs
  • Reduced Latency: Faster transmission to UEs and edge devices for real-time applications
  • Broader Accessibility: Enables AI/ML applications in bandwidth-constrained networks

2.3 NNC Standard Capabilities

The document presents NNC (ISO/IEC 15938-17) as the solution, demonstrating:

  • Compression performance: 0.1% to 20% of original size with transparent performance (validated in SA4 and MPEG evaluations)
  • Standardized format: Ensures interoperability for multi-party scenarios (e.g., third-party model providers, application server execution)

2.4 Advanced NNC Features

Key technical features beyond compression:

  • Topology Signalling: Generic syntax for AI/ML model architecture encoding
  • Random Access: Independent tensor decoding enabling parallelization
  • Parameter Update Signalling: Metadata for incremental update dependencies and relations
  • Robustness and Error Resilience: Configurable prioritization/error-protection through packetization; missing parameter update detection
  • Performance Indicator: Signals model performance metrics (e.g., accuracy)
  • Encapsulation Flexibility: Integration of existing formats (PyTorch, ONNX, NNEF, TensorFlow) with generic support for others

2.5 Web Application Suitability

WASM-based NNC decoder validation demonstrates: - Browser-side decoding feasibility - Reduced end-to-end latency (download + decoding) compared to uncompressed delivery - Multi-fold speed-ups under representative network conditions

3. Proposal

The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services.

Annex: Detailed NNC Technical Syntax

A.1 Data Components

A.1.1 Payload Types

NNC specifies representation through NNR compressed data units (NNR_NDU) with multiple payload types:

| Payload Type | Compressed Parameter Type | Description | |--------------|---------------------------|-------------| | NNR_PT_INT | - | Integer parameter tensor | | NNR_PT_FLOAT | - | Float parameter tensor | | NNR_PT_RAW_FLOAT | - | Uncompressed float parameter tensor | | NNR_PT_BLOCK | NNR_CPT_DC (0x01) | Weight tensor decomposition | | | NNR_CPT_LS (0x02) | Local scaling parameters | | | NNR_CPT_BI (0x04) | Biases present | | | NNR_CPT_BN (0x08) | Batch norm parameters |

  • Context-adaptive entropy coding using DeepCABAC (except NNR_PT_RAW_FLOAT)
  • Support for various bit depths via nnr_decompressed_data_format
  • Pre-quantized float parameter tensor representation

A.1.2 Topology Data

NNR topology units (NNR_TPL) signal AI/ML topology: - Storage format and compression signaled via topology_storage_format and topology_compression_format - Byte sequence representation (typically null-terminated UTF-8 strings) - Optional deflation per RFC 1950 - Topology element specification in NNR_NDU via topology_elem_id or topology_elem_id_index

A.1.3 Meta Data

NNR_NDU meta data syntax elements: - Tensor dimensions: tensor_dimensions_flag, tensor_dimension_list() - Scan order: Mapping of parameter values to dimensions - Entry points: bit_offset_delta1, bit_offset_delta2 for individual tensor decoding

Incremental coding support: - Parameter update tree (PUT) structure with parent-child relationships - Node identification via: - Enumeration: device_id, parameter_id, put_node_depth - Hash-based: parent_node_payload_sha256, parent_node_payload_sha512 - Global NN meta data in NNR_MPS including base_model_id for update relationships

A.1.4 Performance Data

Performance metrics signaled in NNR_MPS and NNR_LPS: - Presence and type specification via validation_set_performance_present_flag, metric_type_performance_map_valid_flag, performance_metric_type - Validation set performance indication - Performance maps for different optimization variants: - sparsification_performance_map() - pruning_performance_map() - unification_performance_map() - decomposition_performance_map()

A.1.5 Format Encapsulation

NNC encapsulates existing formats (NNEF, ONNX, PyTorch, TensorFlow): - Topology data transmission in NNR topology data units - Quantization meta data in NNR quantization data units - Format-specific specifications in Annexes A-D of the standard

A.2 Coding Tools

A.2.1 Parameter Reduction Methods

NNR_PT_BLOCK payload additional parameters: - Local scaling adaptation - Batch norm folding - Tensor decomposition with decomposition_rank and g_number_of_rows

Predictive Residual Encoding (PRE): - Enabled via nnr_pre_flag in NNR_MPS - Codes difference between current and previous parameter updates

Row-skipping mechanism: - Enabled via row_skip_enabled_flag - row_skip_list specifies entirely-zero tensor rows

A.2.2 Quantization and Codebook

Quantization control in quant_tensor(): - Method specification: lps_quantization_method_flags, mps_quantization_method_flags, codebook_present_flag - Quantization type: Uniform or dependent (dq_flag) - Step size: qp_value, lps_qp_density, mps_qp_density - Dependent quantization state: dq_state_list for entry point initialization

Codebook mapping: - Integer value remapping via integer_codebook() structure

A.2.3 Entropy Coding

DeepCABAC (context adaptive binary arithmetic coding): - Applied to all payloads except NNR_PT_RAW_FLOAT - Binarization syntax elements: sig_flag, sign_flag, abs_level_greater-flags, abs_remainder - Binarization control: cabac_unary_length - Probability estimation: Initialization and update via shift_idx_minus_1 - Random access support: scan_order, bit_offset_delta1, cabac_offset_list for entry points and state signaling

Incremental update coding modes: - Temporal context modeling: temporal_context_modeling_flag for probability estimation dependency on previous tensors - Histogram-dependent probability: hist_dep_sig_prob_enabled_flag for multi-tensor dependency


Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, Vodafone Group Plc
Title

[AIML_IMS-MED] Inclusion of NNC to AIML_IMS-MED

Summary of S4-260200: Inclusion of NNC to AIML_IMS-MED

1. Introduction and Context

This contribution proposes the addition of Neural Network Coding (NNC) compression capabilities to the AIML_IMS-MED work item. The proposal is motivated by S4-260198, which demonstrates the necessity for compression of AI/ML data in IMS-based transport scenarios. The document presents changes to be incorporated into the common base Change Request for AIML_IMS-MED.

2. Main Technical Contributions

2.1 NNC Decoder Support Requirement

The proposal mandates that DCMTSI clients supporting AI/ML model download or incremental model download shall support NNC decoding as specified in ISO/IEC 15938-17. Specifically:

  • NNC Edition 2 support is enabled by setting the general_profile_idc syntax element equal to 1
  • This establishes a baseline compression capability for AI/ML model transport over IMS

2.2 Configuration for Full AI/ML Model Download

For DCMTSI clients supporting complete AI/ML model download, the following NNC parameter configuration is specified:

  • Payload type: nnr_compressed_data_unit_payload_type = NNC_PT_BLOCK
  • Compressed parameter types: compressed_parameter_types = NNR_CPT_LS | NNR_CPT_BN (enabling local scaling and batch normalization)
  • Quantization options: Either dq_flag = 1 (dependent quantization) OR codebook_present_flag = 1 (codebook-based quantization)
  • Probability estimation: shift_idx_minus_1_present_flag = 1 (optimal initialization)

Functionality enabled: This configuration supports local scaling adaptation, batch norm folding, flexible quantization approaches, and optimized probability estimation for entropy coding.

2.3 Configuration for Incremental AI/ML Model Data Exchange

For DCMTSI clients supporting incremental model updates, an extended parameter set is defined:

  • Basic parameters: Same payload type (NNC_PT_BLOCK) and compressed parameter types (NNR_CPT_LS | NNR_CPT_BN) as full model download
  • Update tree support: mps_parent_signalling_enabled_flag = 1 and parent_node_id_present_flag = 1
  • Efficiency features:
  • row_skip_enabled_flag = 1 (row skipping)
  • nnr_pre_flag = 1 (predictive residual coding)
  • hist_dep_sig_prob_enabled_flag = 1 (history-dependent significance probability)
  • temporal_context_modeling_flag = 1 (temporal context adaptation)
  • scan_order > 0 (parallel decoding support)

Functionality enabled: This configuration provides comprehensive support for efficient incremental updates through parameter update trees, spatial/temporal prediction, adaptive probability modeling, and parallel processing capabilities.

2.4 Normative Reference Addition

The proposal adds ISO/IEC 15938-17:2024 Edition 2 as a normative reference, establishing the technical foundation for NNC compression in the specification.

Technical Significance

The contribution establishes two distinct NNC profiles optimized for different AI/ML model transport scenarios in IMS networks: 1. A baseline profile for complete model downloads with essential compression features 2. An advanced profile for incremental updates with sophisticated prediction and adaptation mechanisms to minimize update payload sizes


Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, Vodafone Group Plc
Title

[AIML_IMS-MED] On Compression of AI/ML data in IMS

Comprehensive Summary: Compression of AI/ML Data in IMS

Document Overview

This contribution (S4-260286, revision of S4-260198) proposes the adoption of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) for efficient compression and transport of AI/ML data in IMS services. The document is submitted by Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, and Vodafone Group Plc.

Main Technical Contributions

Motivation and Use Case Requirements

The contribution identifies critical challenges in AI/ML data exchange for IMS services:

  • Model Delivery Challenges: Use cases require multiple context-dependent model downloads (location, time, task-specific) rather than single downloads. Limited UE storage necessitates frequent model discarding and re-downloading.

  • Incremental Updates: Applications require both unidirectional continuous model updates to UEs and multidirectional updates for co-learning scenarios involving multiple UEs and edge nodes.

  • Key Benefits of Compression:

  • Bandwidth optimization reducing operational costs
  • Reduced latency through faster transmission
  • Broader accessibility in reduced-bandwidth networks
  • Interoperability through standardized data formats

NNC Standard Capabilities

The contribution highlights NNC's compression performance (0.1% to 20% of original size with transparent performance) and advanced features:

  • Topology Signalling: Generic syntax for encoding AI/ML model architecture
  • Random Access: Independent tensor decoding enabling parallelization
  • Parameter Update Signalling: Metadata for incremental update dependencies and relations
  • Robustness: Configurable prioritization/error-protection through packetization; missing update detection
  • Performance Indicators: Signaling of model performance metrics (e.g., accuracy)
  • Encapsulation Flexibility: Support for PyTorch, ONNX, NNEF, TensorFlow formats

The document also references WASM-based NNC decoder feasibility in web applications, demonstrating multi-fold latency reductions under representative network conditions.

Technical Details (Annex)

NNC Data Components

Payload Types (NNR_NDU)

NNC specifies multiple payload types via nnr_compressed_data_unit_payload_type:

  • NNR_PT_INT: Integer parameter tensors
  • NNR_PT_FLOAT: Float parameter tensors
  • NNR_PT_RAW_FLOAT: Uncompressed float tensors
  • NNR_PT_BLOCK: Block-structured float parameters with sub-types:
  • NNR_CPT_DC (0x01): Decomposed weight tensors
  • NNR_CPT_LS (0x02): Local scaling parameters
  • NNR_CPT_BI (0x04): Biases
  • NNR_CPT_BN (0x08): Batch normalization parameters

Non-RAW payloads use context-adaptive entropy coding (DeepCABAC). The compressed_parameter_types element uses OR-combination of parameter IDs. Support for various bit depths via nnr_decompressed_data_format and pre-quantized float tensors.

Topology Data (NNR_TPL)

Topology units signal AI/ML architecture via: - topology_storage_format: Storage format specification - topology_compression_format: Optional compression (RFC 1950 deflate) - topology_data: Byte sequence (typically UTF-8 string) - topology_elem_id / topology_elem_id_index: Topology element references in NNR_NDU

Metadata

NNR_NDU metadata includes: - Tensor Dimensions: tensor_dimensions_flag, tensor_dimension_list() - Scan Order: scan_order for parameter-to-dimension mapping - Entry Points: bit_offset_delta1, bit_offset_delta2 for parallel decoding

Incremental Coding Support: - Parameter Update Tree (PUT) structure via mps_parent_signalling_enabled_flag, parent_node_id_present_flag - Node identification through: - Enumeration: device_id, parameter_id, put_node_depth - Hash-based: parent_node_payload_sha256, parent_node_payload_sha512 - Global metadata in NNR_MPS including base_model_id

Performance Data

Performance metrics signaled in NNR_MPS and NNR_LPS: - validation_set_performance_present_flag, metric_type_performance_map_valid_flag, performance_metric_type - validation_set_performance: Performance on validation set - Performance maps for post-processing operations: - sparsification_performance_map() - pruning_performance_map() - unification_performance_map() - decomposition_performance_map()

Format Encapsulation

Annexes A-D specify encapsulation of NNEF, ONNX, PyTorch, and TensorFlow data through NNR topology and quantization data units.

Coding Tools

Parameter Reduction Methods

  • NNR_PT_BLOCK Reconstruction: Local scaling adaptation, batch norm folding, tensor decomposition with decomposition_rank and g_number_of_rows
  • Predictive Residual Encoding (PRE): nnr_pre_flag enables differential coding against previous updates
  • Row-Skipping: row_skip_enabled_flag and row_skip_list for zero-row signaling

Quantization and Codebook

  • Quantization control via lps_quantization_method_flags, mps_quantization_method_flags, codebook_present_flag
  • dq_flag: Uniform vs. dependent quantization selection
  • Quantization step size: qp_value, lps_qp_density, mps_qp_density
  • Dependent quantization state: dq_state_list for entry point initialization
  • Codebook mapping: integer_codebook() structure for value remapping

Entropy Coding (DeepCABAC)

Context-adaptive binary arithmetic coding for non-RAW payloads:

Binarization: sig_flag, sign_flag, abs_level_greater-flags, abs_remainder with cabac_unary_length specification

Probability Estimation: - Initialization/update: shift_idx_minus_1 - Random access: scan_order, bit_offset_delta1, cabac_offset_list

Incremental Update Modes: - temporal_context_modeling_flag: Probability estimation from previous tensor - hist_dep_sig_prob_enabled_flag: Multi-tensor historical dependency

Proposal

The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services, based on its compression efficiency, standardized format, and advanced features supporting various AI/ML data exchange scenarios.