[AIML_IMS-MED] On Compression of AI/ML data in IMS
This contribution proposes the adoption of efficient compression techniques for AI/ML data transport in IMS services, specifically advocating for the specification of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) as a representation format.
The document identifies critical challenges in AI/ML data exchange based on SA1 and SA4 use cases:
The contribution highlights three key advantages:
The document presents NNC (ISO/IEC 15938-17) as the solution, demonstrating:
Key technical features beyond compression:
WASM-based NNC decoder validation demonstrates:
- Browser-side decoding feasibility
- Reduced end-to-end latency (download + decoding) compared to uncompressed delivery
- Multi-fold speed-ups under representative network conditions
The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services.
NNC specifies representation through NNR compressed data units (NNR_NDU) with multiple payload types:
| Payload Type | Compressed Parameter Type | Description |
|--------------|---------------------------|-------------|
| NNR_PT_INT | - | Integer parameter tensor |
| NNR_PT_FLOAT | - | Float parameter tensor |
| NNR_PT_RAW_FLOAT | - | Uncompressed float parameter tensor |
| NNR_PT_BLOCK | NNR_CPT_DC (0x01) | Weight tensor decomposition |
| | NNR_CPT_LS (0x02) | Local scaling parameters |
| | NNR_CPT_BI (0x04) | Biases present |
| | NNR_CPT_BN (0x08) | Batch norm parameters |
nnr_decompressed_data_formatNNR topology units (NNR_TPL) signal AI/ML topology:
- Storage format and compression signaled via topology_storage_format and topology_compression_format
- Byte sequence representation (typically null-terminated UTF-8 strings)
- Optional deflation per RFC 1950
- Topology element specification in NNR_NDU via topology_elem_id or topology_elem_id_index
NNR_NDU meta data syntax elements:
- Tensor dimensions: tensor_dimensions_flag, tensor_dimension_list()
- Scan order: Mapping of parameter values to dimensions
- Entry points: bit_offset_delta1, bit_offset_delta2 for individual tensor decoding
Incremental coding support:
- Parameter update tree (PUT) structure with parent-child relationships
- Node identification via:
- Enumeration: device_id, parameter_id, put_node_depth
- Hash-based: parent_node_payload_sha256, parent_node_payload_sha512
- Global NN meta data in NNR_MPS including base_model_id for update relationships
Performance metrics signaled in NNR_MPS and NNR_LPS:
- Presence and type specification via validation_set_performance_present_flag, metric_type_performance_map_valid_flag, performance_metric_type
- Validation set performance indication
- Performance maps for different optimization variants:
- sparsification_performance_map()
- pruning_performance_map()
- unification_performance_map()
- decomposition_performance_map()
NNC encapsulates existing formats (NNEF, ONNX, PyTorch, TensorFlow):
- Topology data transmission in NNR topology data units
- Quantization meta data in NNR quantization data units
- Format-specific specifications in Annexes A-D of the standard
NNR_PT_BLOCK payload additional parameters:
- Local scaling adaptation
- Batch norm folding
- Tensor decomposition with decomposition_rank and g_number_of_rows
Predictive Residual Encoding (PRE):
- Enabled via nnr_pre_flag in NNR_MPS
- Codes difference between current and previous parameter updates
Row-skipping mechanism:
- Enabled via row_skip_enabled_flag
- row_skip_list specifies entirely-zero tensor rows
Quantization control in quant_tensor():
- Method specification: lps_quantization_method_flags, mps_quantization_method_flags, codebook_present_flag
- Quantization type: Uniform or dependent (dq_flag)
- Step size: qp_value, lps_qp_density, mps_qp_density
- Dependent quantization state: dq_state_list for entry point initialization
Codebook mapping:
- Integer value remapping via integer_codebook() structure
DeepCABAC (context adaptive binary arithmetic coding):
- Applied to all payloads except NNR_PT_RAW_FLOAT
- Binarization syntax elements: sig_flag, sign_flag, abs_level_greater-flags, abs_remainder
- Binarization control: cabac_unary_length
- Probability estimation: Initialization and update via shift_idx_minus_1
- Random access support: scan_order, bit_offset_delta1, cabac_offset_list for entry points and state signaling
Incremental update coding modes:
- Temporal context modeling: temporal_context_modeling_flag for probability estimation dependency on previous tensors
- Histogram-dependent probability: hist_dep_sig_prob_enabled_flag for multi-tensor dependency