S4-260286 - AI Summary

[AIML_IMS-MED] On Compression of AI/ML data in IMS

Back to Agenda Download Summary
AI-Generated Summary AI

Comprehensive Summary: Compression of AI/ML Data in IMS

Document Overview

This contribution (S4-260286, revision of S4-260198) proposes the adoption of MPEG's Neural Network Coding standard ISO/IEC 15938-17 (NNC) for efficient compression and transport of AI/ML data in IMS services. The document is submitted by Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, and Vodafone Group Plc.

Main Technical Contributions

Motivation and Use Case Requirements

The contribution identifies critical challenges in AI/ML data exchange for IMS services:

  • Model Delivery Challenges: Use cases require multiple context-dependent model downloads (location, time, task-specific) rather than single downloads. Limited UE storage necessitates frequent model discarding and re-downloading.

  • Incremental Updates: Applications require both unidirectional continuous model updates to UEs and multidirectional updates for co-learning scenarios involving multiple UEs and edge nodes.

  • Key Benefits of Compression:

  • Bandwidth optimization reducing operational costs
  • Reduced latency through faster transmission
  • Broader accessibility in reduced-bandwidth networks
  • Interoperability through standardized data formats

NNC Standard Capabilities

The contribution highlights NNC's compression performance (0.1% to 20% of original size with transparent performance) and advanced features:

  • Topology Signalling: Generic syntax for encoding AI/ML model architecture
  • Random Access: Independent tensor decoding enabling parallelization
  • Parameter Update Signalling: Metadata for incremental update dependencies and relations
  • Robustness: Configurable prioritization/error-protection through packetization; missing update detection
  • Performance Indicators: Signaling of model performance metrics (e.g., accuracy)
  • Encapsulation Flexibility: Support for PyTorch, ONNX, NNEF, TensorFlow formats

The document also references WASM-based NNC decoder feasibility in web applications, demonstrating multi-fold latency reductions under representative network conditions.

Technical Details (Annex)

NNC Data Components

Payload Types (NNR_NDU)

NNC specifies multiple payload types via nnr_compressed_data_unit_payload_type:

  • NNR_PT_INT: Integer parameter tensors
  • NNR_PT_FLOAT: Float parameter tensors
  • NNR_PT_RAW_FLOAT: Uncompressed float tensors
  • NNR_PT_BLOCK: Block-structured float parameters with sub-types:
  • NNR_CPT_DC (0x01): Decomposed weight tensors
  • NNR_CPT_LS (0x02): Local scaling parameters
  • NNR_CPT_BI (0x04): Biases
  • NNR_CPT_BN (0x08): Batch normalization parameters

Non-RAW payloads use context-adaptive entropy coding (DeepCABAC). The compressed_parameter_types element uses OR-combination of parameter IDs. Support for various bit depths via nnr_decompressed_data_format and pre-quantized float tensors.

Topology Data (NNR_TPL)

Topology units signal AI/ML architecture via:
- topology_storage_format: Storage format specification
- topology_compression_format: Optional compression (RFC 1950 deflate)
- topology_data: Byte sequence (typically UTF-8 string)
- topology_elem_id / topology_elem_id_index: Topology element references in NNR_NDU

Metadata

NNR_NDU metadata includes:
- Tensor Dimensions: tensor_dimensions_flag, tensor_dimension_list()
- Scan Order: scan_order for parameter-to-dimension mapping
- Entry Points: bit_offset_delta1, bit_offset_delta2 for parallel decoding

Incremental Coding Support:
- Parameter Update Tree (PUT) structure via mps_parent_signalling_enabled_flag, parent_node_id_present_flag
- Node identification through:
- Enumeration: device_id, parameter_id, put_node_depth
- Hash-based: parent_node_payload_sha256, parent_node_payload_sha512
- Global metadata in NNR_MPS including base_model_id

Performance Data

Performance metrics signaled in NNR_MPS and NNR_LPS:
- validation_set_performance_present_flag, metric_type_performance_map_valid_flag, performance_metric_type
- validation_set_performance: Performance on validation set
- Performance maps for post-processing operations:
- sparsification_performance_map()
- pruning_performance_map()
- unification_performance_map()
- decomposition_performance_map()

Format Encapsulation

Annexes A-D specify encapsulation of NNEF, ONNX, PyTorch, and TensorFlow data through NNR topology and quantization data units.

Coding Tools

Parameter Reduction Methods

  • NNR_PT_BLOCK Reconstruction: Local scaling adaptation, batch norm folding, tensor decomposition with decomposition_rank and g_number_of_rows
  • Predictive Residual Encoding (PRE): nnr_pre_flag enables differential coding against previous updates
  • Row-Skipping: row_skip_enabled_flag and row_skip_list for zero-row signaling

Quantization and Codebook

  • Quantization control via lps_quantization_method_flags, mps_quantization_method_flags, codebook_present_flag
  • dq_flag: Uniform vs. dependent quantization selection
  • Quantization step size: qp_value, lps_qp_density, mps_qp_density
  • Dependent quantization state: dq_state_list for entry point initialization
  • Codebook mapping: integer_codebook() structure for value remapping

Entropy Coding (DeepCABAC)

Context-adaptive binary arithmetic coding for non-RAW payloads:

Binarization: sig_flag, sign_flag, abs_level_greater-flags, abs_remainder with cabac_unary_length specification

Probability Estimation:
- Initialization/update: shift_idx_minus_1
- Random access: scan_order, bit_offset_delta1, cabac_offset_list

Incremental Update Modes:
- temporal_context_modeling_flag: Probability estimation from previous tensor
- hist_dep_sig_prob_enabled_flag: Multi-tensor historical dependency

Proposal

The contribution proposes considering NNC-based compression for inclusion in IMS-based AI/ML services, based on its compression efficiency, standardized format, and advanced features supporting various AI/ML data exchange scenarios.

Document Information
Source:
Nokia, Fraunhofer HHI, Deutsche Telekom, InterDigital Europe, Vodafone Group Plc
Type:
discussion
For:
Discussion
Original Document:
View on 3GPP
Title: [AIML_IMS-MED] On Compression of AI/ML data in IMS
Agenda item: 10.5
Agenda item description: AI_IMS-MED (Media aspects for AI/ML in IMS services)
Doc type: discussion
For action: Discussion
Contact: Gerhard Tech
Uploaded: 2026-02-04T18:18:10.383000
Contact ID: 91711
TDoc Status: noted
Is revision of: S4-260198
Reservation date: 04/02/2026 18:02:35
Agenda item sort order: 52