[AIML_IMS-MED] Negotiation messages for split inferencing
This contribution (S4-260183) proposes additional messages and associated metadata to enable split inferencing for AI/ML applications in IMS-based media services. It builds upon and updates contribution S4aR260009, with specific focus on defining the differences between device inferencing and split inferencing scenarios.
Key Addition: Introduction of Table A4.2-1 summarizing all negotiation messages for split inferencing call flows.
The table defines the following message pairs with their associated metadata:
AI_APPLICATION_DISCOVERY_REQUEST (HTTP GET) - carries family/type of AI/ML applicationsAI_APPLICATION_DISCOVERY_RESPONSE (HTTP RESPONSE) - returns list of AI/ML applications
Application Selection Messages:
AI_APPLICATION_REQUEST (HTTP GET) - carries URN of selected applicationAI_APPLICATION_RESPONSE (HTTP RESPONSE) - returns selected application binary and metadata
Split Model List Messages:
MODELS_LIST_REQUEST (HTTP POST) - carries UE capabilitiesMODELS_LIST_RESPONSE (HTTP RESPONSE) - returns candidate AI/ML models and partitionings
Split Inference Configuration Messages:
AI_SPLIT_INFERENCE_CONFIGURATION_REQUEST (HTTP POST) - carries URN(s) of selected models and submodel partitioningSPLIT_INFERENCE_CONFIGURATION_AI_RESPONSE (HTTP RESPONSE) - returns selected models/submodels binary and metadata
Model Selection Messages:
AI_MODEL_SELECTION_REQUEST - carries URN(s) of selected models/submodelsAI_MODEL_SELECTION_RESPONSE - returns selected models/submodels binary and metadataIntroduces separation between static and dynamic capabilities:
Hardware acceleration features
Dynamic capabilities: Runtime-dependent characteristics
This separation enables both long-term compatibility checks and short-term runtime optimization.
Major technical contribution: Comprehensive metadata structure for describing model partitioning for split inferencing.
Key metadata elements:
| Field | Description |
|-------|-------------|
| submodelsPartitioningIdentifier | URN identifying the partitioning configuration |
| submodelComposition | Array of submodel objects (1..N) |
| submodelIdentifier | URN of individual submodel |
| endpointType | Execution location (UE, SERVER, EDGE, CLOUD, CUSTOM) |
| subtaskTypeIdentifier | Subtask type supported by submodel |
| submodelType | Role in pipeline (HEAD, INTERMEDIATE1, INTERMEDIATE2, TAIL) |
| size | Submodel file size in MB |
| submodelInputs/Outputs | Tensor specifications (ID, type, shape) |
| outputAccuracy | Trained accuracy percentage |
| subModelDataType | Data type (Uint8, Float32, Float16) |
Tensor specifications include:
- tensorID - identifier for input/output tensor
- tensorType - data type (integer, float32, float16)
- tensorShape - tensor dimensions (e.g., (1,3,300,300))
JSON Example provided: Complete example showing HEAD submodel on UE and TAIL submodel on DCAS for object detection task.
Generic message structure defined:
messages: Array of Message objects (1..n)| Field | Type | Cardinality | Description |
|-------|------|-------------|-------------|
| id | string | 1..1 | Unique identifier within data channel session |
| type | number | 1..1 | Message subtype identifier |
| payload | object | 1..1 | Type-dependent message payload |
| sessionId | string | 1..1 | Associated multimedia session identifier |
| sendingAtTime | number | 0..1 | Wall clock transmission time |
Defined message types:
- MODELS_LIST_REQUEST
- MODELS_LIST_RESPONSE
- SPLIT_INFERENCE_CONFIGURATION_REQUEST
- AI_APPLICATION_DISCOVERY_REQUEST
- AI_APPLICATION_DISCOVERY_RESPONSE
- AI_APPLICATION_REQUEST
- AI_APPLICATION_RESPONSE
- AI_SERVER_CONFIGURATION_REQUEST
- AI_SERVER_CONFIGURATION_RESPONSE
- AI_MODEL_SELECTION_REQUEST
- AI_MODEL_SELECTION_RESPONSE
The CR introduces three main changes:
The contribution enables complete end-to-end split inferencing capability negotiation between UE and remote endpoints, with particular emphasis on submodel partitioning metadata that allows flexible distribution of AI/ML model execution across network nodes.