# Summary of S4-260198: On Compression of AI/ML Data in IMS

## 1. Introduction and Motivation

This contribution proposes the adoption of efficient compression techniques for AI/ML data transport in IMS services, specifically advocating for the specification of MPEG's Neural Network Coding standard **ISO/IEC 15938-17 (NNC)** as a representation format.

## 2. Technical Justification

### 2.1 Use Case Requirements

The document identifies critical challenges in AI/ML data exchange based on SA1 and SA4 use cases:

- **Model delivery for local UE inference**: Multiple context-dependent downloads (location, time, task) with limited local storage requiring frequent model re-downloads
- **Incremental AI/ML model updates**: Both unidirectional (continuous UE updates) and multidirectional (co-learning between UEs and edge nodes) scenarios

### 2.2 Benefits of Compression

The contribution highlights three key advantages:

- **Bandwidth Optimization**: Reduced model size minimizes data transfer and operational costs
- **Reduced Latency**: Faster transmission to UEs and edge devices for real-time applications
- **Broader Accessibility**: Enables AI/ML applications in bandwidth-constrained networks

### 2.3 NNC Standard Capabilities

The document presents NNC (ISO/IEC 15938-17) as the solution, demonstrating:

- **Compression performance**: 0.1% to 20% of original size with transparent performance (validated in SA4 and MPEG evaluations)
- **Standardized format**: Ensures interoperability for multi-party scenarios (e.g., third-party model providers, application server execution)

### 2.4 Advanced NNC Features

Key technical features beyond compression:

- **Topology Signalling**: Generic syntax for AI/ML model architecture encoding
- **Random Access**: Independent tensor decoding enabling parallelization
- **Parameter Update Signalling**: Metadata for incremental update dependencies and relations
- **Robustness and Error Resilience**: Configurable prioritization/error-protection through packetization; missing parameter update detection
- **Performance Indicator**: Signals model performance metrics (e.g., accuracy)
- **Encapsulation Flexibility**: Integration of existing formats (PyTorch, ONNX, NNEF, TensorFlow) with generic support for others

### 2.5 Web Application Suitability

WASM-based NNC decoder validation demonstrates:
- Browser-side decoding feasibility
- Reduced end-to-end latency (download + decoding) compared to uncompressed delivery
- Multi-fold speed-ups under representative network conditions

## 3. Proposal

The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services.

## Annex: Detailed NNC Technical Syntax

### A.1 Data Components

#### A.1.1 Payload Types

NNC specifies representation through **NNR compressed data units (NNR_NDU)** with multiple payload types:

| Payload Type | Compressed Parameter Type | Description |
|--------------|---------------------------|-------------|
| NNR_PT_INT | - | Integer parameter tensor |
| NNR_PT_FLOAT | - | Float parameter tensor |
| NNR_PT_RAW_FLOAT | - | Uncompressed float parameter tensor |
| NNR_PT_BLOCK | NNR_CPT_DC (0x01) | Weight tensor decomposition |
| | NNR_CPT_LS (0x02) | Local scaling parameters |
| | NNR_CPT_BI (0x04) | Biases present |
| | NNR_CPT_BN (0x08) | Batch norm parameters |

- Context-adaptive entropy coding using **DeepCABAC** (except NNR_PT_RAW_FLOAT)
- Support for various bit depths via `nnr_decompressed_data_format`
- Pre-quantized float parameter tensor representation

#### A.1.2 Topology Data

**NNR topology units (NNR_TPL)** signal AI/ML topology:
- Storage format and compression signaled via `topology_storage_format` and `topology_compression_format`
- Byte sequence representation (typically null-terminated UTF-8 strings)
- Optional deflation per RFC 1950
- Topology element specification in NNR_NDU via `topology_elem_id` or `topology_elem_id_index`

#### A.1.3 Meta Data

NNR_NDU meta data syntax elements:
- **Tensor dimensions**: `tensor_dimensions_flag`, `tensor_dimension_list()`
- **Scan order**: Mapping of parameter values to dimensions
- **Entry points**: `bit_offset_delta1`, `bit_offset_delta2` for individual tensor decoding

**Incremental coding support**:
- Parameter update tree (PUT) structure with parent-child relationships
- Node identification via:
  - Enumeration: `device_id`, `parameter_id`, `put_node_depth`
  - Hash-based: `parent_node_payload_sha256`, `parent_node_payload_sha512`
- Global NN meta data in **NNR_MPS** including `base_model_id` for update relationships

#### A.1.4 Performance Data

Performance metrics signaled in **NNR_MPS** and **NNR_LPS**:
- Presence and type specification via `validation_set_performance_present_flag`, `metric_type_performance_map_valid_flag`, `performance_metric_type`
- Validation set performance indication
- Performance maps for different optimization variants:
  - `sparsification_performance_map()`
  - `pruning_performance_map()`
  - `unification_performance_map()`
  - `decomposition_performance_map()`

#### A.1.5 Format Encapsulation

NNC encapsulates existing formats (NNEF, ONNX, PyTorch, TensorFlow):
- Topology data transmission in NNR topology data units
- Quantization meta data in NNR quantization data units
- Format-specific specifications in Annexes A-D of the standard

### A.2 Coding Tools

#### A.2.1 Parameter Reduction Methods

**NNR_PT_BLOCK payload** additional parameters:
- Local scaling adaptation
- Batch norm folding
- Tensor decomposition with `decomposition_rank` and `g_number_of_rows`

**Predictive Residual Encoding (PRE)**:
- Enabled via `nnr_pre_flag` in NNR_MPS
- Codes difference between current and previous parameter updates

**Row-skipping mechanism**:
- Enabled via `row_skip_enabled_flag`
- `row_skip_list` specifies entirely-zero tensor rows

#### A.2.2 Quantization and Codebook

Quantization control in `quant_tensor()`:
- Method specification: `lps_quantization_method_flags`, `mps_quantization_method_flags`, `codebook_present_flag`
- **Quantization type**: Uniform or dependent (`dq_flag`)
- **Step size**: `qp_value`, `lps_qp_density`, `mps_qp_density`
- **Dependent quantization state**: `dq_state_list` for entry point initialization

**Codebook mapping**:
- Integer value remapping via `integer_codebook()` structure

#### A.2.3 Entropy Coding

**DeepCABAC** (context adaptive binary arithmetic coding):
- Applied to all payloads except NNR_PT_RAW_FLOAT
- **Binarization syntax elements**: `sig_flag`, `sign_flag`, `abs_level_greater`-flags, `abs_remainder`
- Binarization control: `cabac_unary_length`
- **Probability estimation**: Initialization and update via `shift_idx_minus_1`
- **Random access support**: `scan_order`, `bit_offset_delta1`, `cabac_offset_list` for entry points and state signaling

**Incremental update coding modes**:
- **Temporal context modeling**: `temporal_context_modeling_flag` for probability estimation dependency on previous tensors
- **Histogram-dependent probability**: `hist_dep_sig_prob_enabled_flag` for multi-tensor dependency