CR on AIML processing in IMS calls
Specification: TS 26.114 v19.2.0
Category: B (Addition of feature)
Release: Rel-20
Work Item: AIML_IMS-MED
This CR introduces normative procedures, formats, and signaling for AI/ML assisted media processing in DCMTSI (Data Channel for Multimedia Telephony Service over IMS).
DCMTSI clients must support:
- Media engine functions for RTP-based audio/video
- Data channel client (bootstrap and application data channels per clauses 6.2.10, 6.2.13)
- AI/ML application execution environment (e.g., web runtime)
- AI/ML inference engine for local model execution
- Capability discovery function (execution devices, operators, data types, resource limits)
- Model validation function (integrity/authenticity verification via SHA-256 and digital signatures)
- Binding and synchronization function (associates AI/ML tasks/metadata to RTP streams using SDP identifiers and media time anchors)
14-step procedure:
Editor's Note: Clarification needed on whether MF understands AI task nature, application handling types, and large model handling.
Note: Split inference may use on-device inference for one task (e.g., STT) and DC AS for another (e.g., translation) while keeping RTP media unchanged.
DCMTSI clients must determine and expose to AI/ML application:
- Supported execution devices (CPU, GPU, NPU, accelerators)
- Supported operator sets and data types (per local inference framework)
- Resource limits (memory constraints, concurrent task limits)
- Availability of audio/video media access points (e.g., decoded media frames)
Web runtime capability discovery may align with WebNN. Capability summary may be conveyed to DC AS using capability message type (clause AD.9.2).
DC AS supporting AI/ML processing may provide:
- Repositories and discovery information for AI/ML applications/models
- Policy information (restrictions on tasks, model usage, data retention)
- Application data channels for coordination with AI/ML application
- Note: Network-side inference capabilities are outside Phase 1 scope
Mandatory Model Format:
- ONNX format conforming to ONNX version 1.16.0
- Minimum required opset version: 18
- Encoding: ONNX Protocol Buffers representation
UTF-8 JSON object included with AI/ML application delivery, containing:
- List of supported tasks and optional subtasks with human-readable descriptions
- For each task: candidate model identifiers (model_id, model_version_id) and model card resource reference
- Task-specific configuration parameters including RTP stream mid binding requirements
UTF-8 JSON object provided for each candidate model, including:
- Model identifier and version identifier
- Model format specification (ONNX version, minimum opset, IR version)
- Model I/O description:
- Tensor element type and shape
- Dynamic axes, layout, normalization conventions
- Execution constraints:
- Required operator support
- Required data types
- Quantization convention
- Minimum resource requirements
- Downloadable model artifacts:
- Artifact URI, size, content type
- Integrity information (SHA-256 digest)
- Optional digital signature and key identifier
Comprehensive JSON schema provided defining structure for:
- model_card_version: Schema version (semver pattern)
- identity: model_id, model_version_id, name, description, publisher, license, timestamps, tasks, languages, tags
- format: type (const: "onnx"), onnx_version (const: "1.16.0"), min_opset (≥18), onnx_ir_version, encoding (enum: "protobuf")
- artifacts: Array of downloadable artifacts with:
- artifact_id, uri, content_type, size_bytes, sha256
- Optional compression (none/gzip/zstd)
- Optional signature (alg, kid, sig)
- variant (precision, quantization, preferred_devices, max_latency_ms)
- selection_constraints (requires_webnn, requires_ops, requires_data_types, min_memory_mib, min_peak_scratch_mib)
- io: inputs/outputs (tensorSpec arrays), preprocessing (audio/text), postprocessing (stt/tts), output_application_format
- runtime: min_memory_mib, min_peak_scratch_mib, max_concurrent_instances, required_operator_sets, required_data_types, webnn preferences, device_preference
- selection_policy: strategy (min_latency/min_energy/best_accuracy/balanced/custom), fallback_order
tensorSpec definition:
- name, element_type (float32/float16/int8/int32/uint8/bool)
- shape (array with integers or strings for dynamic axes)
- Optional layout and dynamic_axes mapping
Procedure:
1. UE performs capability discovery (devices, operators, data types, memory limits)
2. UE filters artifacts satisfying selection_constraints against UE capabilities
3. UE selects preferred artifact based on selection_policy and device_preference
4. UE downloads selected artifact URI via HTTP over BDC
5. UE verifies artifact using SHA-256 digest from model card
6. UE should verify digital signature when provided
7. UE instantiates inference engine and binds model I/O per model card (io.preprocessing, io.inputs, io.outputs, io.postprocessing)
Subprotocol: "3gpp-ai" for AI/ML control and metadata
Message Format: UTF-8 encoded JSON objects
Generic Message Types:
- capability: UE inference capability summary
- task: AI/ML processing task selection and model identifiers
- configuration: Task configuration parameters including media stream mid binding and media time anchor representation
- status: Lifecycle state and error reporting
- metadata: Derived AI/ML metadata bound to media stream (mid) and media time
Detailed schema specified by AI/ML application. For cross-vendor interoperability, schema should be standardized for specific task.
Example metadata message:
{
"type": "metadata",
"task": "stt",
"mid": "audio",
"segmentId": 1842,
"ntpTs": 381245120,
"durSamples": 16000,
"text": "...",
"conf": 0.87
}
This CR establishes comprehensive normative framework for AI/ML assisted media processing in DCMTSI, covering:
- Complete architecture with on-device and split inference support
- Detailed call flows for application/model delivery and runtime operation
- Capability discovery mechanisms for UE and network
- Standardized ONNX model format requirements
- Rich metadata structures (task manifests and model cards with JSON schemas)
- Deterministic model selection and validation procedures
- Media time binding mechanisms for metadata synchronization
- Data channel transport protocols for control and metadata exchange
The framework enables AI/ML tasks (STT, translation, TTS, noise suppression, scene description) while maintaining compatibility with existing DCMTSI media handling.