Meeting: TSGS4_135_India | Agenda Item: 10.5
17 documents found
| TDoc Number | Source | Title | Summarie |
|---|---|---|---|
| Huawei Tech.(UK) Co.. Ltd |
Network, QoS and UE Considerations for client side inferencing AIML/IMS
|
Network, QoS and UE Considerations for Client Side Inferencing AIML/IMS1. IntroductionThis contribution addresses network-related issues in the previously discussed call flow for client/UE side inferencing (S4aR260004a). The main concerns relate to steps 12-16 of the draft call flow, which involve model download and deployment for UE-based AI inferencing. 2. Network Related Issues2.1 Model SizeProblem Identification: - TR 26.927 indicates models are approximately 40 MB (Table 6.6.2-1) - Current publicly available models for practical use cases are significantly larger (100+ GB) - Example: Hunyuan Image generation model set is 169 GB (available on Hugging Face) - Simple language models (e.g., single language translation) are approximately 100 MB Required Action: Details on supported model sizes and required response times need to be defined. 2.2 Network QoS SupportProblem Identification: - For real-time request-response (500 ms or even 1000 ms), current mobile networks cannot support required bit-rates - Example calculation: 100 GB model with 1000 ms response time requires ~800 Gbps - Such bit-rates are not realistic in current mobile networks Required Actions: - Define supported model size and transfer time requirements - Identify appropriate QoS profile (5QI) - If no suitable 5QI exists, request SA2 to update 5QI specifications for this use case 2.3 Compression and UE SupportProblem Identification: - TR 26.927 details NN compression with 2-20% compression ratios - Even with compression, resulting bit-rates remain infeasible for mobile networks - No UE capabilities for NN codec support have been defined - Cannot assume UE support for such capabilities Required Action: Clarify whether NNC is required for client-side inferencing and document related requirements. 2.4 Protocol Support IssueProblem Identification: - S4aR260004a mentions HTTP for download - HTTP/TCP is suboptimal for large, quick data downloads due to: - TCP slow start - Congestion control introducing additional latency - Tail latency from head-of-line blocking Proposed Solutions: - Consider alternative protocols: - RTP protocol with 3GPP burst QoS - QUIC (has bindings to 5G XRM framework for improved QoS support) - Leverage 3GPP XRM QoS support for bursty data transfer (HTTP/3 with QUIC or RTP) 2.5 Caching and Bandwidth WastageProblem Identification: - Current call flow indicates model download for every request - No explicit caching or model update mechanism - Results in: - Huge bandwidth wastage - Impossible network bit-rate requirements in current mobile networks Required Action: Include model updates and caching mechanisms in call flow rather than requesting new model from network each time. 3. Suggested Way ForwardThe contribution emphasizes that the intention is not to exclude UE inferencing (as agreed for the work item), but to clarify limitations and requirements before agreeing to a CR detailing such call flows. Proposed Actions:
|
|
| Nokia |
[AI_IMS-MED] AI/ML media processing and task updating
|
Summary of S4-260112: AI/ML Media Processing and Task UpdatingDocument OverviewThis contribution proposes updates to AI/ML media processing procedures and task updating call flows for IMS Data Channel (DC) applications. It builds upon TR 26.927 and TS 23.228 Annex AC, incorporating agreements from SA4#134 (S4-252075) and addressing feedback from SA2's reply LS on AIML for Media. Main Technical Contributions1. Refinement of AI/ML Task Processing Call FlowsIssues Identified with TR 26.927
Updated Call Flow Structure (Steps 1-23)The revised flow incorporates common call flows agreed in S4-252075: Initial Setup (Steps 1-13): - UE1 registers to IMS with AI/ML capability indication - MMTel session establishment between UE1 and UE2 - Bootstrap Data Channel (BbDC) establishment between UE1 and MF - DCSF creates DC application list based on: - Subscription list filter - UE static capabilities - Application list includes AI service information (e.g., intelligent translation service) - UE1 downloads application list and selects application - Application Data Channel (AaDC) establishment between UE1 and DC AS - Task selection and AI/ML model selection Media Processing Execution (Steps 14-16): - Media session runs over MMTel session - UE1 executes selected task and transmits input media streams - Network runs inference and forwards processed streams to UE2 (or UE1, or both depending on application) - Different alternatives supported based on inference location (local/remote/split) 2. Task Reselection and Update MechanismsTask Reselection (Step 17)
Task Update (Steps 17-23)
Update Procedure: - Step 17: UE1 sends UPDATE Task request over ADC with: - Task ID - New parameters - Start time (when to apply new parameters) - Optional additional parameters
Alternative Execution Paths (Steps 20-22): - Alt a - Local Inference: - DC AS sends UPDATE Task response (including new models) to UE1 via MF - UE1 runs updated inference task locally
3. Task Control Messages3.1 START Task MessagePurpose: Request to start an inference task (for split or remote inference) Message Content:
- Response Message:
- Media Stream Identification: Uses "mid" identifier from RFC 8843 as included in SDP offer/answer. Multiple RTP streams identified by comma-separated mid values. 3.2 UPDATE Task MessagePurpose: Update existing task that has already been started (requires prior 200 OK response to START Task) Use Cases: - Update model, parameters, input or output of existing task - Indicate new input/output stream (e.g., new UE added to call) Message Content:
- Response Message:
- Key Technical ClarificationsInference Location FlexibilityThe specification supports three inference deployment models: 1. Local inference: AI model downloaded and executed in UE 2. Remote inference: Inference executed in network (MF) 3. Split inference: Inference split between UE and network Message Exchange Protocol
Network Entity Roles
Editorial Notes
|
|
| Samsung Electronics Iberia SA |
[AIML_IMS-MED] Base CR for TR 26.114
|
3GPP Technical Document Summary: CR 0607 to TS 26.114Document Information
Purpose and RationaleThis Change Request introduces stage 3 specifications for AI/ML processing capabilities in IMS services. The CR addresses the missing technical specifications for AI/ML data delivery and signaling mechanisms required to support AI/ML-enhanced IMS services in Release 20. Main Technical Contributions1. References, Terms, and Abbreviations (Clauses 2, 3.1, 3.3)Updates to include AI/ML-specific terminology, definitions, and abbreviations relevant to IMS services. Specific content marked as Editor's Notes for future completion. 2. New Annex AC: AI/ML Assisted Media Processing for MTSIA comprehensive new normative annex is introduced covering all aspects of AI/ML integration with MTSI. AC.1 IntroductionProvides introductory material on AI/ML capabilities in IMS services. AC.2 Terminal ArchitectureDefines updates to terminal architecture to accommodate: - Inference engine - AI/ML models - Intermediate data handling AC.3 End-to-End Reference ArchitecturePotential updates to the end-to-end reference architecture for AI/ML support. Notes indicate possible liaison requirements with SA2. 3. AI/ML Call Flows (AC.4)AC.4.1 AI/ML Model Delivery for Device InferencingDetailed 15-step call flow for AI/ML model delivery and execution: Key Steps: 1. Session Establishment: MMTel service establishment 2. Bootstrap Data Channel (BDC) Setup: Established between UE and MF per TS 23.228 3. Application Discovery: UE requests application list via HTTP over BDC 4. Application List Creation: DCSF generates user-specific DC application list with metadata including: - Generic app information (description, ID, URL) - AI-specific information (AI feature tags, task descriptions) 5. Application Selection: User selects app based on AI service descriptions 6-9. Application Download: Selected AI application downloaded from DCSF via MF to UE, including AI task metadata (task manifest) 10. Task Selection: User presented with AI task list and selects desired tasks 11. Model Request: Selected tasks and models communicated to MF via: - BDC: HTTP GET with task/model URLs - ADC: AI Model Selection Request with model URNs 12. Model Retrieval: MF fetches AI models from either: - 12a: DCAR via DCSF - 12b: DC AS 13. Model Download: UE downloads AI models from MF via: - BDC: HTTP response with AI models as resource - ADC: AI Model Selection Response with model data 14. Inference Execution: Tasks executed on UE 15. Task Reselection: User/UE may reselect tasks during session using received metadata Open Issues Identified: - Whether MF needs to understand AI task semantics (FFS) - Application types that can be handled - Large model handling mechanisms AC.4.2 Network InferencingPlaceholder for network-based inference scenarios. AC.4.3 Split InferencingPlaceholder for distributed inference scenarios across UE and network. 4. AI/ML Capabilities (AC.5)Defines capabilities and requirements for: - AC.5.1 UE Capabilities: Device-side AI/ML requirements - AC.5.2 Network Capabilities: Network-side AI/ML requirements 5. AI/ML Formats (AC.6)Specification of formats for: - AI/ML models - Intermediate data 6. AI/ML Metadata (AC.7)Definition of necessary metadata structures for AI/ML operations, including task manifests referenced in the call flows. 7. Negotiation and Signaling (AC.8)Procedures for: - Model delivery negotiation - Inferencing coordination - General AI/ML media processing signaling 8. Data Channel Transport (AC.9)Specification of AI/ML data transport mechanisms: - What data to transport over BDC (Bootstrap Data Channel) - What data to transport over ADC (Application Data Channel) - Transport procedures and protocols Key Technical Entities
Implementation StatusMost technical content is marked with Editor's Notes, indicating this is a skeleton CR establishing the structure for future detailed specifications. The most complete section is AC.4.1 (AI/ML model delivery for device inferencing), which provides a concrete call flow example. |
|
| Samsung Electronics Iberia SA |
[AIML_IMS-MED] Further details on DC app list
|
Further Details on DC Application ListIntroductionThis contribution consolidates relevant text from existing 3GPP specifications (TS 23.228, TS 29.176, and TS 26.114) regarding the Data Channel (DC) application list request mechanism, particularly focusing on root URL replacement procedures. The document aims to clarify how these procedures are already defined in the context of Bootstrap Data Channel (BDC) setup and proposes their reuse for AIML_IMS-MED work. Relevant Specifications OverviewBootstrap Data Channel Setup Signalling (TS 23.228 Clause AC.7.1)The specification defines the complete BDC establishment procedure for person-to-person use cases where the Media Function (MF) anchors the bootstrap data channel: Key Steps:
Media Control Service Operation (TS 23.228 Clause AA.2.4.3.2)The DC Media Specification includes:
Media Instructions supported: - TerminateMedia - OriginateMedia - TerminateAndOriginateMedia - UpdateMedia - DeleteMedia - RejectMedia MF Resource Management (TS 29.176 Clause 5.2.2.2)The For DC media resource type, the request includes:
For bootstrap data channel specifically:
For P2A/A2P application data channel:
Data Channel Application Definition (TS 26.114 Clause 6.2.10.1)Data channel application consists of: - HTML web page including JavaScript(s) - Optionally image(s) and style sheet(s) Bootstrap data channel is defined as: - Data channel used to retrieve DC application(s) for DCMTSI client - Data channel stream ID below 1000 - Uses HTTP protocol as data channel subprotocol - Application accessible at HTTP root ("/") URL describes GUI and logic for further data channel usage - Authority (host) part of URL and "Host" HTTP header shall be ignored on reception and set to empty value by DCMTSI client DiscussionThe complete flow for application list handling is already specified:
The specifications explicitly state that "the details of how to provide the application list to the UE and how to use it by the UE are not defined in TS 23.228," but the transport mechanism and URL replacement procedures are fully defined. ProposalFrom the UE perspective, the following procedures are already well-defined in TS 23.228 as part of BDC setup signalling:
For AIML_IMS-MED work:
This approach leverages existing standardized mechanisms and maintains consistency with current IMS DC architecture. |
|
| Samsung Electronics Iberia SA |
[AIML_IMS-MED] Call flow for split inferencing
|
Summary of S4-260129: Call Flow for Split InferencingDocument Information
Main Technical ContributionThis document proposes a detailed call flow for split inferencing in IMS-based AI/ML services, where AI model execution is distributed between the UE and network elements (MF - Media Function). The contribution is intended for inclusion in clause AC.4.3 of the base Change Request. Split Inferencing Call FlowSession Establishment and Bootstrap (Steps 1-2)
Application Discovery and Selection (Steps 3-6)
Application Download (Steps 7-9)
AI Task Selection and Configuration (Steps 10-13)
Model Distribution and Configuration Response (Steps 14-16)
Inference Execution (Steps 17-22)
Dynamic Task Reselection (Step 23)
Key Technical FeaturesMetadata Framework
Flexibility in Execution Distribution
Model Distribution Options
Media/Data Flow Management
|
|
| InterDigital Finland Oy |
[AIML_IMS-MED] Call flow for split inferencing
|
Comprehensive Summary of S4-260180: Call Flow for Split InferencingDocument OverviewThis change request proposes updates to the AIML call flow for split inferencing in IMS-based media services. It revises the previously agreed device inferencing call flow (S4aR260014) to accommodate split inferencing scenarios where AI model execution is partitioned between the UE and network-based DC AS (Data Channel Application Server). Main Technical Contributions1. Split Inferencing Capability IndicationKey Addition: - The UE now indicates split inferencing availability in the application request message sent to the MF (Media Function) when requesting the application list via the Bootstrap Data Channel (BDC) - This allows the network to understand the UE's capability to participate in distributed AI inference 2. Enhanced Application and Task SelectionApplication Metadata Enhancements: - Application-related metadata now includes: - Generic app information (description, app ID, URL) - AI-specific information including AI feature tags indicating AI requirements - AI task-related descriptions for user-informed selection Task Metadata: - AI task metadata is delivered with the application, potentially expressed as a task manifest - Task list presented to users includes annotations from AI task metadata - Execution endpoints supported by each task and subtask are now exposed to enable split inference decisions 3. Model Partitioning FrameworkPartitioning List Introduction: The CR introduces a comprehensive partitioning framework: Request Phase (Step 10): - UE requests both a model list and a partitioning list from DCAS - UE provides its capability metadata to enable appropriate partitioning options Partitioning Metadata Definition: The partitioning list/submodel partitioning metadata specifies: - Submodel identifiers - unique identification of model partitions - Execution endpoints - where each submodel executes (UE vs. network) - Input/output tensor characteristics - data interfaces between submodels - Operational characteristics - performance and resource requirements Download Phase (Step 12): - UE downloads both the model list and partitioning list corresponding to its capabilities 4. User-Driven Partition SelectionSelection Criteria (Step 13): - User is presented with lists of both models and partitions supported by the UE - User selects desired AI model(s) and partition - Partition selection may be based on: - Load distribution preferences - Battery impact considerations - Other task execution preferences 5. Split Inference Configuration and ExecutionConfiguration Phase (Step 14): - UE configures split inference with DCAS by selecting: - A specific model - A specific partition - From these selections, the corresponding submodel(s) to be executed are derived Server-Side Preparation (Step 15): - DCAS prepares the server-side execution context - DCAS registers the sub-model(s) and associated metadata with the selected partitioning Configuration Confirmation (Step 16): - DCAS indicates whether the requested configuration is accepted - DCAS confirms readiness to execute the server-side sub-model(s) Submodel Deployment (Steps 17-18): - Selected tasks/models and corresponding AI submodels are communicated to DCAS - UE downloads the AI submodel(s) corresponding to subtasks to be executed on the device side Execution (Step 19): - Tasks identified for split inference between UE and DCAS are executed in a distributed manner Key Differences from Device InferencingThe main distinctions from pure device inferencing include:
Open IssuesThe document notes one FFS (For Further Study) item: - How device capabilities are sent to obtain an accurate list of models (noted after Step 6) |
|
| InterDigital Finland Oy |
[AIML_IMS-MED] Negotiation messages
|
Summary of 3GPP Technical Document S4-260181Document OverviewThis is a revision of S4aR260012 proposing additional details for negotiation messages and associated metadata in support of AI/ML-based media services (AIML_IMS-MED). The document provides JSON-formatted metadata examples and updates to align with the agreed call flow from S4aR260014. Main Technical Contributions1. Negotiation Message Summary Table (Section A.4.2)The document introduces Table A4.2-1 which defines the complete set of negotiation messages for local inferencing call flows. Key updates include:
Each message is mapped to possible HTTP protocol operations (GET, POST, RESPONSE) and associated metadata parameters. 2. Metadata Information Definitions (Section A.4.3)A.4.3.1 Application MetadataDefines characteristics and requirements of AI/ML applications including:
A.4.3.2 Endpoint Capabilities MetadataSeparates capabilities into static and dynamic categories: Static Capabilities (fixed/infrequently changed): - endpointIdentifier - flopsProcessingCapabilities (peak compute in FLOPS) - macOpProcessingCapabilities (MAC operations) - supportedAiMlFrameworks - accelerationSupported (boolean) - supportedEngines (CPU, GPU, NPU) - supportedPrecision (FP32, FP16, INT8) Dynamic Capabilities (runtime-dependent): - availableMemorySize - currentComputeLoad - energyMode (Eco/Balanced/Performance) - batteryLevel - acceleratorAvailability This separation enables both long-term compatibility checks and short-term runtime optimization. A.4.3.3 Model Information MetadataComprehensive model characterization including:
3. Generic Negotiation Message Format (Section A.4.4)Defines a transport-protocol-independent message format for AI metadata exchange over data channels: Messages Container: - Array of Message objects (1..n cardinality) Message Data Type includes: - id: Unique identifier within data channel session scope - type: Message subtype enumeration: - CANDIDATE_MODELS_REQUEST - CANDIDATE_MODELS_RESPONSE - AI_APPLICATION_DISCOVERY_REQUEST/RESPONSE - AI_APPLICATION_REQUEST/RESPONSE - AI_MODEL_SELECTION_REQUEST/RESPONSE - payload: Type-dependent message content - sessionId: Associated multimedia session identifier - sendingAtTime: Wall clock timestamp (optional) This format provides flexibility for various transport protocols (e.g., HTTP) without imposing specific constraints. Key Design Principles
|
|
| Nokia |
[AI_IMS-MED] Adaptive Model Delivery
|
Summary of S4-260182: Adaptive Model Delivery for IMS DC Applications1. IntroductionThis contribution revises previous documents (S4-251799, S4aR250211) on adaptive model delivery, incorporating the agreed call flow for device inferencing from S4aR260014 (agreed in SA4#134). The work builds upon TR 26.927 which documented AI/ML model delivery procedures. 2. Discussion2.1 Background and MotivationThe document addresses the critical challenge of timely model delivery for UE-centric inference in IMS DC-based AI/ML applications. Key points:
2.2 Adaptive Model Delivery ConceptBased on TR 26.927 clause 5.2.2.2:
2.3 Reference Call FlowsThe document references two agreed high-level call flows: General AIML IMS DC Call Flow (from S4-252075)Key steps include: 1. MMTel service establishment 2. BDC establishment between UE and MF 3. DCSF creates DC application list based on subscription filter and UE static capabilities 4. Application list includes AI service information 5. User selects app based on AI service 6. App download via BDC 7. Task selection and model variant selection 8. ADC establishment 9. Three inferencing modes: Local, Remote, or Split Device Inferencing Call Flow (from S4aR260014)Detailed 15-step procedure including: - Application discovery with AI_APPLICATION_DISCOVERY_REQUEST/RESPONSE messages - Application metadata including AI feature tags and task descriptions - Task manifest delivery - Model selection and delivery via BDC or ADC - Support for task reselection during session 3. Technical Proposal3.1 New Clause: AI/ML Model Delivery to DCMTSI Client3.1.1 General Model Delivery ProcedureFigure X.X-1: Basic Model Delivery over IMS DC 14-step procedure: 0. UE1 registers to IMS with AI/ML capability indication 1. MMTEL session establishment 2. IMS AS allocates DC resources 3. Session established between UE1 and UE2 4. Bootstrap Data Channel (bDC) establishment 5. DCSF creates subscriber-specific application list 6. Application list delivery over bDC 7. App selection and download with app manifest (includes inference tasks and model lists) 8. UE2 side DC procedures 9-10. Application data channel establishment with DC AS 11-12. Model selection and delivery (from DC AS or DCAR via DCSF) 13. Media exchange over MMTEL session 14. Inference execution on local or remote media 3.1.2 Adaptive Model Delivery ProcedureFigure X.Y-2: Adaptive Model Delivery over IMS DC Enhanced procedure building on basic delivery: Steps 1-10: Same as basic model delivery, with lower precision model selection in step 10 Step 11: Request for updatable model via MF Steps 12a/12b: Model delivery from either: - Option a: DCAR via DCSF - Option b: DC-AS Step 13: Model download to UE Step 14: Inference loop starts and continues Step 15: UE requests model update via MF Steps 16a/16b: Model update delivery from either:
- Option a: DCAR via DCSF Step 17: Model update download via MF Step 18: UE applies model update to initial model Step 19: Inference continues with potential for further updates 3.2 Key Technical Features
Editor's Notes and Open IssuesThe referenced S4aR260014 document contains an Editor's Note indicating: - Whether MF needs to understand AI task semantics requires clarification (FFS) - Application type handling needs clarification - Large model handling procedures need clarification |
|
| InterDigital Finland Oy |
[AIML_IMS-MED] Negotiation messages for split inferencing
|
3GPP Change Request Summary: Split Inferencing Negotiation MessagesDocument OverviewThis contribution (S4-260183) proposes additional messages and associated metadata to enable split inferencing for AI/ML applications in IMS-based media services. It builds upon and updates contribution S4aR260009, with specific focus on defining the differences between device inferencing and split inferencing scenarios. Main Technical Contributions1. Negotiation Message Summary Table (Section A.4.2)Key Addition: Introduction of Table A4.2-1 summarizing all negotiation messages for split inferencing call flows. The table defines the following message pairs with their associated metadata:
2. Common Metadata Information (Section A.4.3)A.4.3.1 Application Metadata
A.4.3.2 Endpoint Capabilities MetadataIntroduces separation between static and dynamic capabilities:
This separation enables both long-term compatibility checks and short-term runtime optimization. A.4.3.3 Model Information Metadata
3. Split Inferencing-Specific Metadata (Section A.4.3.4)A.4.3.4.1 Submodel Partitioning MetadataMajor technical contribution: Comprehensive metadata structure for describing model partitioning for split inferencing. Key metadata elements: | Field | Description |
|-------|-------------|
| Tensor specifications include:
- JSON Example provided: Complete example showing HEAD submodel on UE and TAIL submodel on DCAS for object detection task. 4. Negotiation Message Format (Section A.4.5)Generic message structure defined: Table 5: AI Metadata Messages Format
Table 6: Metadata Message Data Type| Field | Type | Cardinality | Description |
|-------|------|-------------|-------------|
| Defined message types:
- Summary of ChangesThe CR introduces three main changes:
The contribution enables complete end-to-end split inferencing capability negotiation between UE and remote endpoints, with particular emphasis on submodel partitioning metadata that allows flexible distribution of AI/ML model execution across network nodes. |
|
| Nokia, Samsung Electronics Co., Ltd |
[AI_IMS_MED]On Application Manifest for AIML applications
|
Summary of S4-260184: Application Manifest for AIML Applications1. IntroductionThis contribution proposes IMS Data Channel (DC) application metadata for AI/ML applications. The document merges metadata elements from S4aR250213 and S4aR250208 based on previous RTC SWG discussions and email exchanges. It addresses comments from RTC Telco Post SA4#134-2 regarding the origin and transfer of the AIML application manifest. 2. Main Technical Contributions2.1 General Framework for AI/ML Support over Data ChannelThe contribution defines AI/ML DC applications as IMS DC applications that: - Interact with AI/ML models (e.g., performing inference on UE) - Communicate AI/ML data - Support different inference paradigms: local inference, remote inference, and split inference Key architectural elements: - DCSF (via MF) provides policy and subscription-appropriate data channel applications to UE - DC Application Repository (DCAR) stores verified data channel applications - DCSF downloads applications from DCAR for distribution to UE - DCMTSI client uses metadata to select appropriate toolchains or execution environments 2.2 Base Application Manifest StructureThe manifest contains essential information for AI/ML DC applications: Core elements:
- baseUrl: URI template for downloading models with format: Task-level metadata includes: - taskId: Unique identifier - taskName/description: Human-readable task identifier (e.g., "Speech-to-speech Translation") - version: Task version number - capabilityIndex: Minimum capability requirements - executionCandidate: Supported endpoint locations (e.g., UE or MF) 2.3 Task Input/Output SpecificationTask inputs (taskInputs): - taskInputId: Unique identifier - media-type: Input media type - route-to: Specifies subtaskInputId for data routing Task outputs (taskOutputs): - taskOutputId: Unique identifier - media-type: Output media type - from: Specifies subtaskOutputId for output data origin 2.4 Model MetadataEach model object contains: - id: Unique model identifier - version: Model version/variant - capabilityIndex: Minimum capability requirements - url: Model download location - latency: Maximum latency requirement (milliseconds) - accuracy: Minimum accuracy requirement (metrics/value/direction - FFS) 2.5 Subtask Metadata (Extension Parameters)For tasks comprising multiple subtasks, the manifest includes detailed subtask information: Subtask-level parameters: - id: Unique subtask identifier - function: Description of subtask function - capabilityIndex: Capability requirements (matches AI model capability) - executionTarget: Intended endpoint location - executionFallback: Alternative endpoint when primary unavailable Subtask inputs (subtaskInputs): - subtaskInputId: Unique identifier - pipe-type: Logic for multiple data inputs (0=first available, 1=wait for all) - media-type: Input media type - from: Origin subtaskOutputId or taskInputId Subtask outputs (subtaskOutputs): - subtaskOutputId: Unique identifier - media-type: Output media type - route-to: Destination subtaskInputId or taskOutputId Subtask AI model parameters: - id, capabilityIndex, url, latency, accuracy (as per main model metadata) - contextSize: Maximum input data amount the model can process (typically in tokens) 3. Open IssuesSeveral aspects remain FFS (For Further Study): - Editor's Note: Definition of AI/ML task may be needed (referencing TS 26.927) - Editor's Note: Whether all fields in tables are needed and their definitions - Editor's Note: Capability index definition and usage - Editor's Note: Clear definition of accuracy metrics - Editor's Note: Pipe-type parameter needs further clarification - Model metadata specification alignment with TR 26.927 4. Document TypeThis is a text proposal for the AI_IMS_MED work item, proposing new clauses (marked as "All New Text") to be added to the base CR. |
|
| InterDigital Finland Oy |
[AI_IMS_MED] Call flow for split inferencing loop
|
Summary of S4-260185: Call Flow for Split Inferencing LoopDocument Metadata
Main Technical ContributionThis contribution proposes a call flow for split inferencing operations between the UE and DCAS (Data Collection and Analytics Server), building upon previous work in TR 26.927 and earlier contributions. Split Inferencing ArchitectureThe proposed call flow describes a collaborative inference execution model where: - The UE and DCAS jointly execute an inference task - The inference workload is split between the two entities - Intermediate inference results are exchanged over the user plane - Communication is facilitated through the MF (Media Function) Proposed Call Flow StepsThe text proposal adds the following procedural steps:
Technical SignificanceThis proposal enables distributed AI/ML inference for media processing, allowing workload distribution between device and network based on computational capabilities, latency requirements, and network conditions. The standardization of intermediate data format parameters ensures interoperability in split inference scenarios. |
|
| InterDigital Finland Oy |
[AIML_IMS-MED] AI intermediate data format
|
Comprehensive Summary of S4-260189: AI Intermediate Data Format1. Introduction and ScopeThis contribution proposes defining an intermediate data carriage format for AI/ML split inferencing, derived from TR 26.927. The document introduces:
2. Technical Background and Motivation2.1 Split Inferencing RequirementsSplit inferencing, approved and mandated in 5G, is a key objective of the work item. The solution must support:
2.2 Source and DerivationThe proposed format is derived from:
2.3 Dynamic Nature of Tensor CharacteristicsTensor characteristics are not static and may change dynamically based on:
These characteristics must be conveyed through the user plane for accurate interpretation at the receiving end. 3. Main Technical Contributions3.1 Intermediate Data Definition (Clause X.X.1)Key Definition: Intermediate data refers to output tensor(s) computed by a sub-model executing an inference subtask up to a defined and negotiated partitioning, transferred between endpoints (device, edge, server) to serve as input to a subsequent sub-model. Characteristics: - May be compressed and/or encoded before transmission - Processing shall not alter semantics required by receiving sub-model - Non-persistent, dynamic, and context-dependent - Characteristics (shape, size, format) vary as function of: - Input data - Selected inference partitioning - Runtime configuration 3.2 Intermediate Data Structure (Clause X.X.2)Configuration Stage: Structure defined and exchanged at configuration stage, referred to as partitioning configuration. Dynamic Factors: - Input media size/resolution changes may alter tensor shape - Selected partitioning identifies active partitioning among pre-configured options - Selected compression profile (algorithm and parameters) optimized for efficiency Required Information in Format: - Tensor identifier - Inferred tensor length (derived from current tensor shape) - Partitioning identifier (referencing negotiated configuration) - Compression profile identifier (indicating compression method) Solution: AI Parameter Set (AIPS) defined to capture information applicable to all tensors and associated data. 3.3 AI Parameter Set (AIPS) Definition (Annex X.X.1-3)Purpose: Carries metadata (tensor metadata) associated with intermediate data payload. AIPS Lifetime:
- Starts: When decoder first receives and parses AIPS TLV unit
- Ends: When:
- New AIPS with same or different AIPS Fields (Table X.X.13-1): | Field | Meaning |
|-------|---------|
| 3.4 TLV Encapsulation (Clause X.X.2-4)TLV Message Components: - Type: Indicates payload information - Length: Value of payload - Payload: Data TLV Unit Types (Table X.X.24-1): | Type Value | Description | |------------|-------------| | 0 | Reserved | | 1 | AI Parameter Set data (AIPS) | | 2 | Intermediate data | | 3-255 | Undefined | Encapsulation Scenarios:
4. Key Changes from Previous VersionTerminology Updates: - "Split point" terminology changed to "partitioning" throughout - "Head sub-model" and "Tail sub-model" terminology refined to "sub-model" and "subsequent sub-model" Structural Additions: - Addition of partition identifier (highlighted as new in original document) - Formalization of AIPS lifetime management - Complete TLV encapsulation framework 5. Proposal for IntegrationThe document proposes:
|
|
| Qualcomm Inc. |
CR on AIML processing in IMS calls
|
3GPP CR 0608 - AI/ML Processing in IMS CallsChange Request OverviewSpecification: TS 26.114 v19.2.0 This CR introduces normative procedures, formats, and signaling for AI/ML assisted media processing in DCMTSI (Data Channel for Multimedia Telephony Service over IMS). Main Technical Contributions1. General Framework and Architecture (AD.1, AD.2, AD.3)Key Definitions
Terminal Architecture RequirementsDCMTSI clients must support: - Media engine functions for RTP-based audio/video - Data channel client (bootstrap and application data channels per clauses 6.2.10, 6.2.13) - AI/ML application execution environment (e.g., web runtime) - AI/ML inference engine for local model execution - Capability discovery function (execution devices, operators, data types, resource limits) - Model validation function (integrity/authenticity verification via SHA-256 and digital signatures) - Binding and synchronization function (associates AI/ML tasks/metadata to RTP streams using SDP identifiers and media time anchors) Reference Architecture
2. Call Flows (AD.4)AD.4.1 AI/ML Application and Model Delivery for Device Inferencing14-step procedure:
Editor's Note: Clarification needed on whether MF understands AI task nature, application handling types, and large model handling. AD.4.2 On-Device Inferencing and Split Inference Operation
Note: Split inference may use on-device inference for one task (e.g., STT) and DC AS for another (e.g., translation) while keeping RTP media unchanged. 3. Capabilities (AD.5)AD.5.1 UE CapabilitiesDCMTSI clients must determine and expose to AI/ML application: - Supported execution devices (CPU, GPU, NPU, accelerators) - Supported operator sets and data types (per local inference framework) - Resource limits (memory constraints, concurrent task limits) - Availability of audio/video media access points (e.g., decoded media frames) Web runtime capability discovery may align with WebNN. Capability summary may be conveyed to DC AS using capability message type (clause AD.9.2). AD.5.2 Network CapabilitiesDC AS supporting AI/ML processing may provide: - Repositories and discovery information for AI/ML applications/models - Policy information (restrictions on tasks, model usage, data retention) - Application data channels for coordination with AI/ML application - Note: Network-side inference capabilities are outside Phase 1 scope 4. AI/ML Formats (AD.6)Mandatory Model Format: - ONNX format conforming to ONNX version 1.16.0 - Minimum required opset version: 18 - Encoding: ONNX Protocol Buffers representation 5. Task Manifest and Model Card (AD.7)AD.7.1 Task ManifestUTF-8 JSON object included with AI/ML application delivery, containing: - List of supported tasks and optional subtasks with human-readable descriptions - For each task: candidate model identifiers (model_id, model_version_id) and model card resource reference - Task-specific configuration parameters including RTP stream mid binding requirements AD.7.2 Model CardUTF-8 JSON object provided for each candidate model, including: - Model identifier and version identifier - Model format specification (ONNX version, minimum opset, IR version) - Model I/O description: - Tensor element type and shape - Dynamic axes, layout, normalization conventions - Execution constraints: - Required operator support - Required data types - Quantization convention - Minimum resource requirements - Downloadable model artifacts: - Artifact URI, size, content type - Integrity information (SHA-256 digest) - Optional digital signature and key identifier AD.7.2.1 JSON Schema for Model CardComprehensive JSON schema provided defining structure for: - model_card_version: Schema version (semver pattern) - identity: model_id, model_version_id, name, description, publisher, license, timestamps, tasks, languages, tags - format: type (const: "onnx"), onnx_version (const: "1.16.0"), min_opset (≥18), onnx_ir_version, encoding (enum: "protobuf") - artifacts: Array of downloadable artifacts with: - artifact_id, uri, content_type, size_bytes, sha256 - Optional compression (none/gzip/zstd) - Optional signature (alg, kid, sig) - variant (precision, quantization, preferred_devices, max_latency_ms) - selection_constraints (requires_webnn, requires_ops, requires_data_types, min_memory_mib, min_peak_scratch_mib) - io: inputs/outputs (tensorSpec arrays), preprocessing (audio/text), postprocessing (stt/tts), output_application_format - runtime: min_memory_mib, min_peak_scratch_mib, max_concurrent_instances, required_operator_sets, required_data_types, webnn preferences, device_preference - selection_policy: strategy (min_latency/min_energy/best_accuracy/balanced/custom), fallback_order tensorSpec definition: - name, element_type (float32/float16/int8/int32/uint8/bool) - shape (array with integers or strings for dynamic axes) - Optional layout and dynamic_axes mapping AD.7.3 Model Artifact Selection and ValidationProcedure: 1. UE performs capability discovery (devices, operators, data types, memory limits) 2. UE filters artifacts satisfying selection_constraints against UE capabilities 3. UE selects preferred artifact based on selection_policy and device_preference 4. UE downloads selected artifact URI via HTTP over BDC 5. UE verifies artifact using SHA-256 digest from model card 6. UE should verify digital signature when provided 7. UE instantiates inference engine and binds model I/O per model card (io.preprocessing, io.inputs, io.outputs, io.postprocessing) 6. Negotiation, Signaling, and Media Time Binding (AD.8)AD.8.1 Binding to RTP Streams
AD.8.2 Media Time Binding for AI/ML Metadata
7. Data Channel Transport (AD.9)AD.9.1 Bootstrap Data Channel Transport
AD.9.2 Application Data Channel TransportSubprotocol: "3gpp-ai" for AI/ML control and metadata Generic Message Types: - capability: UE inference capability summary - task: AI/ML processing task selection and model identifiers - configuration: Task configuration parameters including media stream mid binding and media time anchor representation - status: Lifecycle state and error reporting - metadata: Derived AI/ML metadata bound to media stream (mid) and media time Detailed schema specified by AI/ML application. For cross-vendor interoperability, schema should be standardized for specific task. Example metadata message:
SummaryThis CR establishes comprehensive normative framework for AI/ML assisted media processing in DCMTSI, covering: - Complete architecture with on-device and split inference support - Detailed call flows for application/model delivery and runtime operation - Capability discovery mechanisms for UE and network - Standardized ONNX model format requirements - Rich metadata structures (task manifests and model cards with JSON schemas) - Deterministic model selection and validation procedures - Media time binding mechanisms for metadata synchronization - Data channel transport protocols for control and metadata exchange The framework enables AI/ML tasks (STT, translation, TTS, noise suppression, scene description) while maintaining compatibility with existing DCMTSI media handling. |
|
| Fraunhofer HHI, Nokia |
[AIML_IMS-MED] NNC web decoder demo
|
Summary of S4-260197: NNC Web Decoder Demo1. IntroductionThis contribution presents a live demonstration of a web-based Neural Network Codec (NNC) decoder, following up on previous telco discussions where decoding times and end-to-end latency were reported. The demonstration shows substantial latency reductions under realistic download conditions. The document also addresses security concerns regarding WebAssembly (Wasm) that were raised in the previous telco. 2. Decoder ImplementationTechnical Architecture
Supported Features
Performance Optimizations
3. Web ApplicationIntegration
User Interface Features
Measurement Capabilities
4. Test ConditionsModel and Configuration
Compression Performance
ASR Performance (LibriSpeech test-clean)
Test Environment
5. WebAssembly Security AnalysisThe contribution addresses security concerns raised in the previous telco with four key arguments: 5.1 Expert Development and Maintenance
5.2 Security Model and Mechanisms
5.3 Broad Industry DeploymentExamples of widely deployed Wasm applications: - Adobe Photoshop on the web - Google Earth on the web - TensorFlow.js (WebAssembly backend) - ONNX Runtime Web (Microsoft) - AutoCAD Web - ffmpeg.wasm project This broad deployment indicates strong industry confidence in WebAssembly's security model. 5.4 3GPP-Specific Considerations
6. ConclusionThe contribution proposes scheduling a time slot for live demonstration (e.g., during a meeting break) and concludes that WebAssembly is secure for running NNC decoder in web environments based on: 1. Expert-driven standardization and ongoing maintenance 2. Sandboxed execution model and security mechanisms 3. Broad deployment across major browsers and applications 4. Security considerations specific to IMS DC applications |
|
| Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe |
[AIML_IMS-MED] On Compression of AI/ML data in IMS
|
Summary of S4-260198: On Compression of AI/ML Data in IMS1. Introduction and MotivationThis contribution proposes the adoption of efficient compression techniques for AI/ML data transport in IMS services, specifically advocating for the specification of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) as a representation format. 2. Technical Justification2.1 Use Case RequirementsThe document identifies critical challenges in AI/ML data exchange based on SA1 and SA4 use cases:
2.2 Benefits of CompressionThe contribution highlights three key advantages:
2.3 NNC Standard CapabilitiesThe document presents NNC (ISO/IEC 15938-17) as the solution, demonstrating:
2.4 Advanced NNC FeaturesKey technical features beyond compression:
2.5 Web Application SuitabilityWASM-based NNC decoder validation demonstrates: - Browser-side decoding feasibility - Reduced end-to-end latency (download + decoding) compared to uncompressed delivery - Multi-fold speed-ups under representative network conditions 3. ProposalThe contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services. Annex: Detailed NNC Technical SyntaxA.1 Data ComponentsA.1.1 Payload TypesNNC specifies representation through NNR compressed data units (NNR_NDU) with multiple payload types: | Payload Type | Compressed Parameter Type | Description | |--------------|---------------------------|-------------| | NNR_PT_INT | - | Integer parameter tensor | | NNR_PT_FLOAT | - | Float parameter tensor | | NNR_PT_RAW_FLOAT | - | Uncompressed float parameter tensor | | NNR_PT_BLOCK | NNR_CPT_DC (0x01) | Weight tensor decomposition | | | NNR_CPT_LS (0x02) | Local scaling parameters | | | NNR_CPT_BI (0x04) | Biases present | | | NNR_CPT_BN (0x08) | Batch norm parameters |
A.1.2 Topology DataNNR topology units (NNR_TPL) signal AI/ML topology:
- Storage format and compression signaled via A.1.3 Meta DataNNR_NDU meta data syntax elements:
- Tensor dimensions: Incremental coding support:
- Parameter update tree (PUT) structure with parent-child relationships
- Node identification via:
- Enumeration: A.1.4 Performance DataPerformance metrics signaled in NNR_MPS and NNR_LPS:
- Presence and type specification via A.1.5 Format EncapsulationNNC encapsulates existing formats (NNEF, ONNX, PyTorch, TensorFlow): - Topology data transmission in NNR topology data units - Quantization meta data in NNR quantization data units - Format-specific specifications in Annexes A-D of the standard A.2 Coding ToolsA.2.1 Parameter Reduction MethodsNNR_PT_BLOCK payload additional parameters:
- Local scaling adaptation
- Batch norm folding
- Tensor decomposition with Predictive Residual Encoding (PRE):
- Enabled via Row-skipping mechanism:
- Enabled via A.2.2 Quantization and CodebookQuantization control in Codebook mapping:
- Integer value remapping via A.2.3 Entropy CodingDeepCABAC (context adaptive binary arithmetic coding):
- Applied to all payloads except NNR_PT_RAW_FLOAT
- Binarization syntax elements: Incremental update coding modes:
- Temporal context modeling: |
|
| Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, Vodafone Group Plc |
[AIML_IMS-MED] Inclusion of NNC to AIML_IMS-MED
|
Summary of S4-260200: Inclusion of NNC to AIML_IMS-MED1. Introduction and ContextThis contribution proposes the addition of Neural Network Coding (NNC) compression capabilities to the AIML_IMS-MED work item. The proposal is motivated by S4-260198, which demonstrates the necessity for compression of AI/ML data in IMS-based transport scenarios. The document presents changes to be incorporated into the common base Change Request for AIML_IMS-MED. 2. Main Technical Contributions2.1 NNC Decoder Support RequirementThe proposal mandates that DCMTSI clients supporting AI/ML model download or incremental model download shall support NNC decoding as specified in ISO/IEC 15938-17. Specifically:
2.2 Configuration for Full AI/ML Model DownloadFor DCMTSI clients supporting complete AI/ML model download, the following NNC parameter configuration is specified:
Functionality enabled: This configuration supports local scaling adaptation, batch norm folding, flexible quantization approaches, and optimized probability estimation for entropy coding. 2.3 Configuration for Incremental AI/ML Model Data ExchangeFor DCMTSI clients supporting incremental model updates, an extended parameter set is defined:
Functionality enabled: This configuration provides comprehensive support for efficient incremental updates through parameter update trees, spatial/temporal prediction, adaptive probability modeling, and parallel processing capabilities. 2.4 Normative Reference AdditionThe proposal adds ISO/IEC 15938-17:2024 Edition 2 as a normative reference, establishing the technical foundation for NNC compression in the specification. Technical SignificanceThe contribution establishes two distinct NNC profiles optimized for different AI/ML model transport scenarios in IMS networks: 1. A baseline profile for complete model downloads with essential compression features 2. An advanced profile for incremental updates with sophisticated prediction and adaptation mechanisms to minimize update payload sizes |
|
| Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, Vodafone Group Plc |
[AIML_IMS-MED] On Compression of AI/ML data in IMS
|
Comprehensive Summary: Compression of AI/ML Data in IMSDocument OverviewThis contribution (S4-260286, revision of S4-260198) proposes the adoption of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) for efficient compression and transport of AI/ML data in IMS services. The document is submitted by Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, and Vodafone Group Plc. Main Technical ContributionsMotivation and Use Case RequirementsThe contribution identifies critical challenges in AI/ML data exchange for IMS services:
NNC Standard CapabilitiesThe contribution highlights NNC's compression performance (0.1% to 20% of original size with transparent performance) and advanced features:
The document also references WASM-based NNC decoder feasibility in web applications, demonstrating multi-fold latency reductions under representative network conditions. Technical Details (Annex)NNC Data ComponentsPayload Types (NNR_NDU)NNC specifies multiple payload types via
Non-RAW payloads use context-adaptive entropy coding (DeepCABAC). The Topology Data (NNR_TPL)Topology units signal AI/ML architecture via:
- MetadataNNR_NDU metadata includes:
- Tensor Dimensions: Incremental Coding Support:
- Parameter Update Tree (PUT) structure via Performance DataPerformance metrics signaled in NNR_MPS and NNR_LPS:
- Format EncapsulationAnnexes A-D specify encapsulation of NNEF, ONNX, PyTorch, and TensorFlow data through NNR topology and quantization data units. Coding ToolsParameter Reduction Methods
Quantization and Codebook
Entropy Coding (DeepCABAC)Context-adaptive binary arithmetic coding for non-RAW payloads: Binarization: Probability Estimation:
- Initialization/update: Incremental Update Modes:
- ProposalThe contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services, based on its compression efficiency, standardized format, and advanced features supporting various AI/ML data exchange scenarios. |
Total Summaries: 17 | PDFs Available: 17