# Comprehensive Summary of S4-260180: Call Flow for Split Inferencing

## Document Overview

This change request proposes updates to the AIML call flow for split inferencing in IMS-based media services. It revises the previously agreed device inferencing call flow (S4aR260014) to accommodate split inferencing scenarios where AI model execution is partitioned between the UE and network-based DC AS (Data Channel Application Server).

## Main Technical Contributions

### 1. Split Inferencing Capability Indication

**Key Addition:**
- The UE now indicates **split inferencing availability** in the application request message sent to the MF (Media Function) when requesting the application list via the Bootstrap Data Channel (BDC)
- This allows the network to understand the UE's capability to participate in distributed AI inference

### 2. Enhanced Application and Task Selection

**Application Metadata Enhancements:**
- Application-related metadata now includes:
  - Generic app information (description, app ID, URL)
  - **AI-specific information** including AI feature tags indicating AI requirements
  - **AI task-related descriptions** for user-informed selection

**Task Metadata:**
- AI task metadata is delivered with the application, potentially expressed as a **task manifest**
- Task list presented to users includes annotations from AI task metadata
- **Execution endpoints supported by each task and subtask** are now exposed to enable split inference decisions

### 3. Model Partitioning Framework

**Partitioning List Introduction:**
The CR introduces a comprehensive partitioning framework:

**Request Phase (Step 10):**
- UE requests both a model list **and a partitioning list** from DCAS
- UE provides its capability metadata to enable appropriate partitioning options

**Partitioning Metadata Definition:**
The partitioning list/submodel partitioning metadata specifies:
- **Submodel identifiers** - unique identification of model partitions
- **Execution endpoints** - where each submodel executes (UE vs. network)
- **Input/output tensor characteristics** - data interfaces between submodels
- **Operational characteristics** - performance and resource requirements

**Download Phase (Step 12):**
- UE downloads both the model list and partitioning list corresponding to its capabilities

### 4. User-Driven Partition Selection

**Selection Criteria (Step 13):**
- User is presented with lists of both models **and partitions** supported by the UE
- User selects desired AI model(s) **and partition**
- Partition selection may be based on:
  - **Load distribution** preferences
  - **Battery impact** considerations
  - Other task execution preferences

### 5. Split Inference Configuration and Execution

**Configuration Phase (Step 14):**
- UE configures split inference with DCAS by selecting:
  - A specific model
  - A specific partition
- From these selections, the corresponding submodel(s) to be executed are derived

**Server-Side Preparation (Step 15):**
- DCAS **prepares the server-side execution context**
- DCAS **registers the sub-model(s) and associated metadata** with the selected partitioning

**Configuration Confirmation (Step 16):**
- DCAS indicates whether the requested configuration is accepted
- DCAS confirms readiness to execute the server-side sub-model(s)

**Submodel Deployment (Steps 17-18):**
- Selected tasks/models and corresponding AI submodels are communicated to DCAS
- UE downloads the AI submodel(s) corresponding to subtasks to be executed on the device side

**Execution (Step 19):**
- Tasks identified for split inference between UE and DCAS are executed in a distributed manner

## Key Differences from Device Inferencing

The main distinctions from pure device inferencing include:

1. **Distributed execution model** - inference split across UE and network
2. **Partitioning metadata** - new information element defining how models are divided
3. **Negotiation phase** - explicit configuration of split points and execution distribution
4. **Submodel management** - separate handling of device-side and server-side model components
5. **Execution coordination** - mechanisms for DCAS to prepare and confirm readiness for server-side execution

## Open Issues

The document notes one FFS (For Further Study) item:
- **How device capabilities are sent to obtain an accurate list of models** (noted after Step 6)