# Comprehensive Summary: Compression of AI/ML Data in IMS

## Document Overview

This contribution (S4-260286, revision of S4-260198) proposes the adoption of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) for efficient compression and transport of AI/ML data in IMS services. The document is submitted by Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, and Vodafone Group Plc.

## Main Technical Contributions

### Motivation and Use Case Requirements

The contribution identifies critical challenges in AI/ML data exchange for IMS services:

- **Model Delivery Challenges**: Use cases require multiple context-dependent model downloads (location, time, task-specific) rather than single downloads. Limited UE storage necessitates frequent model discarding and re-downloading.

- **Incremental Updates**: Applications require both unidirectional continuous model updates to UEs and multidirectional updates for co-learning scenarios involving multiple UEs and edge nodes.

- **Key Benefits of Compression**:
  - Bandwidth optimization reducing operational costs
  - Reduced latency through faster transmission
  - Broader accessibility in reduced-bandwidth networks
  - Interoperability through standardized data formats

### NNC Standard Capabilities

The contribution highlights NNC's compression performance (0.1% to 20% of original size with transparent performance) and advanced features:

- **Topology Signalling**: Generic syntax for encoding AI/ML model architecture
- **Random Access**: Independent tensor decoding enabling parallelization
- **Parameter Update Signalling**: Metadata for incremental update dependencies and relations
- **Robustness**: Configurable prioritization/error-protection through packetization; missing update detection
- **Performance Indicators**: Signaling of model performance metrics (e.g., accuracy)
- **Encapsulation Flexibility**: Support for PyTorch, ONNX, NNEF, TensorFlow formats

The document also references WASM-based NNC decoder feasibility in web applications, demonstrating multi-fold latency reductions under representative network conditions.

## Technical Details (Annex)

### NNC Data Components

#### Payload Types (NNR_NDU)

NNC specifies multiple payload types via `nnr_compressed_data_unit_payload_type`:

- **NNR_PT_INT**: Integer parameter tensors
- **NNR_PT_FLOAT**: Float parameter tensors
- **NNR_PT_RAW_FLOAT**: Uncompressed float tensors
- **NNR_PT_BLOCK**: Block-structured float parameters with sub-types:
  - NNR_CPT_DC (0x01): Decomposed weight tensors
  - NNR_CPT_LS (0x02): Local scaling parameters
  - NNR_CPT_BI (0x04): Biases
  - NNR_CPT_BN (0x08): Batch normalization parameters

Non-RAW payloads use context-adaptive entropy coding (DeepCABAC). The `compressed_parameter_types` element uses OR-combination of parameter IDs. Support for various bit depths via `nnr_decompressed_data_format` and pre-quantized float tensors.

#### Topology Data (NNR_TPL)

Topology units signal AI/ML architecture via:
- `topology_storage_format`: Storage format specification
- `topology_compression_format`: Optional compression (RFC 1950 deflate)
- `topology_data`: Byte sequence (typically UTF-8 string)
- `topology_elem_id` / `topology_elem_id_index`: Topology element references in NNR_NDU

#### Metadata

NNR_NDU metadata includes:
- **Tensor Dimensions**: `tensor_dimensions_flag`, `tensor_dimension_list()`
- **Scan Order**: `scan_order` for parameter-to-dimension mapping
- **Entry Points**: `bit_offset_delta1`, `bit_offset_delta2` for parallel decoding

**Incremental Coding Support**:
- Parameter Update Tree (PUT) structure via `mps_parent_signalling_enabled_flag`, `parent_node_id_present_flag`
- Node identification through:
  - Enumeration: `device_id`, `parameter_id`, `put_node_depth`
  - Hash-based: `parent_node_payload_sha256`, `parent_node_payload_sha512`
- Global metadata in NNR_MPS including `base_model_id`

#### Performance Data

Performance metrics signaled in NNR_MPS and NNR_LPS:
- `validation_set_performance_present_flag`, `metric_type_performance_map_valid_flag`, `performance_metric_type`
- `validation_set_performance`: Performance on validation set
- Performance maps for post-processing operations:
  - `sparsification_performance_map()`
  - `pruning_performance_map()`
  - `unification_performance_map()`
  - `decomposition_performance_map()`

#### Format Encapsulation

Annexes A-D specify encapsulation of NNEF, ONNX, PyTorch, and TensorFlow data through NNR topology and quantization data units.

### Coding Tools

#### Parameter Reduction Methods

- **NNR_PT_BLOCK Reconstruction**: Local scaling adaptation, batch norm folding, tensor decomposition with `decomposition_rank` and `g_number_of_rows`
- **Predictive Residual Encoding (PRE)**: `nnr_pre_flag` enables differential coding against previous updates
- **Row-Skipping**: `row_skip_enabled_flag` and `row_skip_list` for zero-row signaling

#### Quantization and Codebook

- Quantization control via `lps_quantization_method_flags`, `mps_quantization_method_flags`, `codebook_present_flag`
- `dq_flag`: Uniform vs. dependent quantization selection
- Quantization step size: `qp_value`, `lps_qp_density`, `mps_qp_density`
- Dependent quantization state: `dq_state_list` for entry point initialization
- Codebook mapping: `integer_codebook()` structure for value remapping

#### Entropy Coding (DeepCABAC)

Context-adaptive binary arithmetic coding for non-RAW payloads:

**Binarization**: `sig_flag`, `sign_flag`, `abs_level_greater`-flags, `abs_remainder` with `cabac_unary_length` specification

**Probability Estimation**:
- Initialization/update: `shift_idx_minus_1`
- Random access: `scan_order`, `bit_offset_delta1`, `cabac_offset_list`

**Incremental Update Modes**:
- `temporal_context_modeling_flag`: Probability estimation from previous tensor
- `hist_dep_sig_prob_enabled_flag`: Multi-tensor historical dependency

## Proposal

The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services, based on its compression efficiency, standardized format, and advanced features supporting various AI/ML data exchange scenarios.