# 3GPP Change Request Summary: Split Inferencing Negotiation Messages

## Document Overview

This contribution (S4-260183) proposes additional messages and associated metadata to enable split inferencing for AI/ML applications in IMS-based media services. It builds upon and updates contribution S4aR260009, with specific focus on defining the differences between device inferencing and split inferencing scenarios.

## Main Technical Contributions

### 1. Negotiation Message Summary Table (Section A.4.2)

**Key Addition:** Introduction of Table A4.2-1 summarizing all negotiation messages for split inferencing call flows.

The table defines the following message pairs with their associated metadata:

- **Application Discovery Messages:**
  - `AI_APPLICATION_DISCOVERY_REQUEST` (HTTP GET) - carries family/type of AI/ML applications
  - `AI_APPLICATION_DISCOVERY_RESPONSE` (HTTP RESPONSE) - returns list of AI/ML applications

- **Application Selection Messages:**
  - `AI_APPLICATION_REQUEST` (HTTP GET) - carries URN of selected application
  - `AI_APPLICATION_RESPONSE` (HTTP RESPONSE) - returns selected application binary and metadata

- **Split Model List Messages:**
  - `MODELS_LIST_REQUEST` (HTTP POST) - carries UE capabilities
  - `MODELS_LIST_RESPONSE` (HTTP RESPONSE) - returns candidate AI/ML models and partitionings

- **Split Inference Configuration Messages:**
  - `AI_SPLIT_INFERENCE_CONFIGURATION_REQUEST` (HTTP POST) - carries URN(s) of selected models and submodel partitioning
  - `SPLIT_INFERENCE_CONFIGURATION_AI_RESPONSE` (HTTP RESPONSE) - returns selected models/submodels binary and metadata

- **Model Selection Messages:**
  - `AI_MODEL_SELECTION_REQUEST` - carries URN(s) of selected models/submodels
  - `AI_MODEL_SELECTION_RESPONSE` - returns selected models/submodels binary and metadata

### 2. Common Metadata Information (Section A.4.3)

#### A.4.3.1 Application Metadata
- Defines characteristics and requirements of applications and associated AI/ML media processing tasks
- Includes performance, accuracy, energy constraints, and supported models
- **New for split inferencing:** Indicates supported split and remote inference modes and whether model supports partitioning

#### A.4.3.2 Endpoint Capabilities Metadata
Introduces separation between **static** and **dynamic** capabilities:

- **Static capabilities:** Fixed or infrequently changing properties
  - Processing architecture
  - Peak compute capacity
  - Supported AI/ML frameworks
  - Available execution engines (CPU, GPU, NPU)
  - Supported numerical precisions
  - Hardware acceleration features

- **Dynamic capabilities:** Runtime-dependent characteristics
  - Available memory
  - Current compute load
  - Energy mode
  - Battery level
  - Accelerator availability

This separation enables both long-term compatibility checks and short-term runtime optimization.

#### A.4.3.3 Model Information Metadata
- Describes functional, structural, and performance characteristics of AI/ML models
- Includes supported tasks, input/output specifications, resource requirements, latency/energy metrics
- **New:** Indicates whether model supports partitioning

### 3. Split Inferencing-Specific Metadata (Section A.4.3.4)

#### A.4.3.4.1 Submodel Partitioning Metadata

**Major technical contribution:** Comprehensive metadata structure for describing model partitioning for split inferencing.

**Key metadata elements:**

| Field | Description |
|-------|-------------|
| `submodelsPartitioningIdentifier` | URN identifying the partitioning configuration |
| `submodelComposition` | Array of submodel objects (1..N) |
| `submodelIdentifier` | URN of individual submodel |
| `endpointType` | Execution location (UE, SERVER, EDGE, CLOUD, CUSTOM) |
| `subtaskTypeIdentifier` | Subtask type supported by submodel |
| `submodelType` | Role in pipeline (HEAD, INTERMEDIATE1, INTERMEDIATE2, TAIL) |
| `size` | Submodel file size in MB |
| `submodelInputs/Outputs` | Tensor specifications (ID, type, shape) |
| `outputAccuracy` | Trained accuracy percentage |
| `subModelDataType` | Data type (Uint8, Float32, Float16) |

**Tensor specifications include:**
- `tensorID` - identifier for input/output tensor
- `tensorType` - data type (integer, float32, float16)
- `tensorShape` - tensor dimensions (e.g., (1,3,300,300))

**JSON Example provided:** Complete example showing HEAD submodel on UE and TAIL submodel on DCAS for object detection task.

### 4. Negotiation Message Format (Section A.4.5)

**Generic message structure defined:**

#### Table 5: AI Metadata Messages Format
- `messages`: Array of Message objects (1..n)
- Each message follows Message data type specification

#### Table 6: Metadata Message Data Type

| Field | Type | Cardinality | Description |
|-------|------|-------------|-------------|
| `id` | string | 1..1 | Unique identifier within data channel session |
| `type` | number | 1..1 | Message subtype identifier |
| `payload` | object | 1..1 | Type-dependent message payload |
| `sessionId` | string | 1..1 | Associated multimedia session identifier |
| `sendingAtTime` | number | 0..1 | Wall clock transmission time |

**Defined message types:**
- `MODELS_LIST_REQUEST`
- `MODELS_LIST_RESPONSE`
- `SPLIT_INFERENCE_CONFIGURATION_REQUEST`
- `AI_APPLICATION_DISCOVERY_REQUEST`
- `AI_APPLICATION_DISCOVERY_RESPONSE`
- `AI_APPLICATION_REQUEST`
- `AI_APPLICATION_RESPONSE`
- `AI_SERVER_CONFIGURATION_REQUEST`
- `AI_SERVER_CONFIGURATION_RESPONSE`
- `AI_MODEL_SELECTION_REQUEST`
- `AI_MODEL_SELECTION_RESPONSE`

## Summary of Changes

The CR introduces three main changes:

1. **Complete message taxonomy** for split inferencing negotiation with HTTP protocol mapping
2. **Comprehensive metadata definitions** covering applications, endpoint capabilities, models, and split-specific partitioning information
3. **Generic message format** for AI metadata exchange over data channels with extensible type system

The contribution enables complete end-to-end split inferencing capability negotiation between UE and remote endpoints, with particular emphasis on submodel partitioning metadata that allows flexible distribution of AI/ML model execution across network nodes.