S4-260180 - AI Summary

[AIML_IMS-MED] Call flow for split inferencing

Back to Agenda Download Summary
AI-Generated Summary AI

Comprehensive Summary of S4-260180: Call Flow for Split Inferencing

Document Overview

This change request proposes updates to the AIML call flow for split inferencing in IMS-based media services. It revises the previously agreed device inferencing call flow (S4aR260014) to accommodate split inferencing scenarios where AI model execution is partitioned between the UE and network-based DC AS (Data Channel Application Server).

Main Technical Contributions

1. Split Inferencing Capability Indication

Key Addition:
- The UE now indicates split inferencing availability in the application request message sent to the MF (Media Function) when requesting the application list via the Bootstrap Data Channel (BDC)
- This allows the network to understand the UE's capability to participate in distributed AI inference

2. Enhanced Application and Task Selection

Application Metadata Enhancements:
- Application-related metadata now includes:
- Generic app information (description, app ID, URL)
- AI-specific information including AI feature tags indicating AI requirements
- AI task-related descriptions for user-informed selection

Task Metadata:
- AI task metadata is delivered with the application, potentially expressed as a task manifest
- Task list presented to users includes annotations from AI task metadata
- Execution endpoints supported by each task and subtask are now exposed to enable split inference decisions

3. Model Partitioning Framework

Partitioning List Introduction:
The CR introduces a comprehensive partitioning framework:

Request Phase (Step 10):
- UE requests both a model list and a partitioning list from DCAS
- UE provides its capability metadata to enable appropriate partitioning options

Partitioning Metadata Definition:
The partitioning list/submodel partitioning metadata specifies:
- Submodel identifiers - unique identification of model partitions
- Execution endpoints - where each submodel executes (UE vs. network)
- Input/output tensor characteristics - data interfaces between submodels
- Operational characteristics - performance and resource requirements

Download Phase (Step 12):
- UE downloads both the model list and partitioning list corresponding to its capabilities

4. User-Driven Partition Selection

Selection Criteria (Step 13):
- User is presented with lists of both models and partitions supported by the UE
- User selects desired AI model(s) and partition
- Partition selection may be based on:
- Load distribution preferences
- Battery impact considerations
- Other task execution preferences

5. Split Inference Configuration and Execution

Configuration Phase (Step 14):
- UE configures split inference with DCAS by selecting:
- A specific model
- A specific partition
- From these selections, the corresponding submodel(s) to be executed are derived

Server-Side Preparation (Step 15):
- DCAS prepares the server-side execution context
- DCAS registers the sub-model(s) and associated metadata with the selected partitioning

Configuration Confirmation (Step 16):
- DCAS indicates whether the requested configuration is accepted
- DCAS confirms readiness to execute the server-side sub-model(s)

Submodel Deployment (Steps 17-18):
- Selected tasks/models and corresponding AI submodels are communicated to DCAS
- UE downloads the AI submodel(s) corresponding to subtasks to be executed on the device side

Execution (Step 19):
- Tasks identified for split inference between UE and DCAS are executed in a distributed manner

Key Differences from Device Inferencing

The main distinctions from pure device inferencing include:

  1. Distributed execution model - inference split across UE and network
  2. Partitioning metadata - new information element defining how models are divided
  3. Negotiation phase - explicit configuration of split points and execution distribution
  4. Submodel management - separate handling of device-side and server-side model components
  5. Execution coordination - mechanisms for DCAS to prepare and confirm readiness for server-side execution

Open Issues

The document notes one FFS (For Further Study) item:
- How device capabilities are sent to obtain an accurate list of models (noted after Step 6)

Document Information
Source:
InterDigital Finland Oy
Type:
discussion
For:
Agreement
Original Document:
View on 3GPP
Title: [AIML_IMS-MED] Call flow for split inferencing
Agenda item: 10.5
Agenda item description: AI_IMS-MED (Media aspects for AI/ML in IMS services)
Doc type: discussion
For action: Agreement
Contact: Stephane Onno
Uploaded: 2026-02-03T19:11:22.463000
Contact ID: 84864
Revised to: S4-260449
TDoc Status: revised
Is revision of: S4aR260008
Reservation date: 03/02/2026 16:26:22
Agenda item sort order: 52