S4-260198 - AI Summary

[AIML_IMS-MED] On Compression of AI/ML data in IMS

AI-Generated Summary AI

Summary of S4-260198: On Compression of AI/ML Data in IMS

1. Introduction and Motivation

This contribution proposes the adoption of efficient compression techniques for AI/ML data transport in IMS services, specifically advocating for the specification of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) as a representation format.

2. Technical Justification

2.1 Use Case Requirements

The document identifies critical challenges in AI/ML data exchange based on SA1 and SA4 use cases:

Model delivery for local UE inference: Multiple context-dependent downloads (location, time, task) with limited local storage requiring frequent model re-downloads
Incremental AI/ML model updates: Both unidirectional (continuous UE updates) and multidirectional (co-learning between UEs and edge nodes) scenarios

2.2 Benefits of Compression

The contribution highlights three key advantages:

Bandwidth Optimization: Reduced model size minimizes data transfer and operational costs
Reduced Latency: Faster transmission to UEs and edge devices for real-time applications
Broader Accessibility: Enables AI/ML applications in bandwidth-constrained networks

2.3 NNC Standard Capabilities

The document presents NNC (ISO/IEC 15938-17) as the solution, demonstrating:

Compression performance: 0.1% to 20% of original size with transparent performance (validated in SA4 and MPEG evaluations)
Standardized format: Ensures interoperability for multi-party scenarios (e.g., third-party model providers, application server execution)

2.4 Advanced NNC Features

Key technical features beyond compression:

Topology Signalling: Generic syntax for AI/ML model architecture encoding
Random Access: Independent tensor decoding enabling parallelization
Parameter Update Signalling: Metadata for incremental update dependencies and relations
Robustness and Error Resilience: Configurable prioritization/error-protection through packetization; missing parameter update detection
Performance Indicator: Signals model performance metrics (e.g., accuracy)
Encapsulation Flexibility: Integration of existing formats (PyTorch, ONNX, NNEF, TensorFlow) with generic support for others

2.5 Web Application Suitability

WASM-based NNC decoder validation demonstrates:
- Browser-side decoding feasibility
- Reduced end-to-end latency (download + decoding) compared to uncompressed delivery
- Multi-fold speed-ups under representative network conditions

3. Proposal

The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services.

Annex: Detailed NNC Technical Syntax

A.1 Data Components

A.1.1 Payload Types

NNC specifies representation through NNR compressed data units (NNR_NDU) with multiple payload types:

| Payload Type | Compressed Parameter Type | Description |
|--------------|---------------------------|-------------|
| NNR_PT_INT | - | Integer parameter tensor |
| NNR_PT_FLOAT | - | Float parameter tensor |
| NNR_PT_RAW_FLOAT | - | Uncompressed float parameter tensor |
| NNR_PT_BLOCK | NNR_CPT_DC (0x01) | Weight tensor decomposition |
| | NNR_CPT_LS (0x02) | Local scaling parameters |
| | NNR_CPT_BI (0x04) | Biases present |
| | NNR_CPT_BN (0x08) | Batch norm parameters |

Context-adaptive entropy coding using DeepCABAC (except NNR_PT_RAW_FLOAT)
Support for various bit depths via nnr_decompressed_data_format
Pre-quantized float parameter tensor representation

A.1.2 Topology Data

NNR topology units (NNR_TPL) signal AI/ML topology:
- Storage format and compression signaled via topology_storage_format and topology_compression_format
- Byte sequence representation (typically null-terminated UTF-8 strings)
- Optional deflation per RFC 1950
- Topology element specification in NNR_NDU via topology_elem_id or topology_elem_id_index

A.1.3 Meta Data

NNR_NDU meta data syntax elements:
- Tensor dimensions: tensor_dimensions_flag, tensor_dimension_list()
- Scan order: Mapping of parameter values to dimensions
- Entry points: bit_offset_delta1, bit_offset_delta2 for individual tensor decoding

Incremental coding support:
- Parameter update tree (PUT) structure with parent-child relationships
- Node identification via:
- Enumeration: device_id, parameter_id, put_node_depth
- Hash-based: parent_node_payload_sha256, parent_node_payload_sha512
- Global NN meta data in NNR_MPS including base_model_id for update relationships

A.1.4 Performance Data

Performance metrics signaled in NNR_MPS and NNR_LPS:
- Presence and type specification via validation_set_performance_present_flag, metric_type_performance_map_valid_flag, performance_metric_type
- Validation set performance indication
- Performance maps for different optimization variants:
- sparsification_performance_map()
- pruning_performance_map()
- unification_performance_map()
- decomposition_performance_map()

A.1.5 Format Encapsulation

NNC encapsulates existing formats (NNEF, ONNX, PyTorch, TensorFlow):
- Topology data transmission in NNR topology data units
- Quantization meta data in NNR quantization data units
- Format-specific specifications in Annexes A-D of the standard

A.2 Coding Tools

A.2.1 Parameter Reduction Methods

NNR_PT_BLOCK payload additional parameters:
- Local scaling adaptation
- Batch norm folding
- Tensor decomposition with decomposition_rank and g_number_of_rows

Predictive Residual Encoding (PRE):
- Enabled via nnr_pre_flag in NNR_MPS
- Codes difference between current and previous parameter updates

Row-skipping mechanism:
- Enabled via row_skip_enabled_flag
- row_skip_list specifies entirely-zero tensor rows

A.2.2 Quantization and Codebook

Quantization control in quant_tensor():
- Method specification: lps_quantization_method_flags, mps_quantization_method_flags, codebook_present_flag
- Quantization type: Uniform or dependent (dq_flag)
- Step size: qp_value, lps_qp_density, mps_qp_density
- Dependent quantization state: dq_state_list for entry point initialization

Codebook mapping:
- Integer value remapping via integer_codebook() structure

A.2.3 Entropy Coding

DeepCABAC (context adaptive binary arithmetic coding):
- Applied to all payloads except NNR_PT_RAW_FLOAT
- Binarization syntax elements: sig_flag, sign_flag, abs_level_greater-flags, abs_remainder
- Binarization control: cabac_unary_length
- Probability estimation: Initialization and update via shift_idx_minus_1
- Random access support: scan_order, bit_offset_delta1, cabac_offset_list for entry points and state signaling

Incremental update coding modes:
- Temporal context modeling: temporal_context_modeling_flag for probability estimation dependency on previous tensors
- Histogram-dependent probability: hist_dep_sig_prob_enabled_flag for multi-tensor dependency

Document Information

TDoc:
S4-260198

Source:
Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe

Type:
discussion

For:
Discussion

Original Document:
View on 3GPP

Title: [AIML_IMS-MED] On Compression of AI/ML data in IMS

Agenda item: 10.5

Agenda item description: AI_IMS-MED (Media aspects for AI/ML in IMS services)

Doc type: discussion

For action: Discussion

Contact: Gerhard Tech

Uploaded: 2026-02-03T18:32:20.377000

Contact ID: 91711

Revised to: S4-260286

TDoc Status: revised

Reservation date: 03/02/2026 17:13:42

Agenda item sort order: 52