Neural Network Based Video Codec Architecture and Support for Error Resilience
This contribution proposes documenting neural network-based codec (NNC) architectures and their error resilience capabilities in the 6G Media study (FS_6G_MED). The document focuses on two specific NNC implementations: DVC and GRACE codecs, highlighting their potential relevance for 6G deployments targeting 2030.
The document describes the DVC (Deep Video Compression) codec proposed by Guo Lu et al. (2019), which represents a hybrid approach to neural network-based video coding:
Key Architecture Features:
- Replaces traditional video coding components with neural network equivalents while maintaining the overall predictive coding architecture
- Uses CNN models for optical flow estimation in motion estimation and compression
- Implements neural network-based motion compensation to generate predicted frames
- Maintains functional similarity between traditional and NNC components
Joint Optimization Approach:
The codec jointly trains/optimizes multiple components:
- Motion estimation
- Motion compensation
- Residual compression
- Quantization and bit-rate estimation
Performance:
- Achieves competitive results with H.264 and H.265
- Publicly available source code and research paper
- Similar approaches adopted in industry (Deep Render codec in FFMPEG and VLC)
The document presents GRACE codec (Yihua Cheng et al. 2025) as an extension of DVC with enhanced error resilience:
Channel-Aware Training:
- Jointly trains encoder and decoder under simulated packet loss conditions
- Enables codec awareness of specific loss patterns
- Implements channel-aware source coding design
Technical Implementation:
- Encodes each frame as a tensor split into independently decodable sub-tensors
- Uses arithmetic coding mapped to packets
- Tested across wide range of loss rates
- Includes lighter profiles (GRACE-lite) for mobile devices
Performance Validation:
- User study with 240 crowdsourced participants
- Tested 61 videos under realistic conditions
- Used Google GCC to emulate WebRTC congestion control
- Channel conditions: LTE and broadband traces (0.2-8 Mbps, 100ms end-to-end delay)
- MOS scores up to 38% better than H.264/H.265 with AL-FEC and error concealment
Key Performance Improvements:
- Exceptional reduction in tail latency
- Reduced non-rendered frames
- Reduced stalls per second
- Improved video smoothness
Hardware Requirements:
- Original GRACE: NVIDIA A40 GPU (31.2-51.2 fps)
- GRACE-lite: Real-time capable on current mobile devices
Content Specificity:
- NNC performance may be content-specific due to training data dependencies
Reconstruction Challenges:
- Potential reconstruction failures due to non-bit-exact arithmetic operations in GPU frameworks
- Issues with floating-point arithmetic and convolution operations
- Currently under discussion in SC29 (media standards organization)
- Identified as potential key enabler requiring resolution for future NNC codec adoption
The document makes two specific proposals:
Documentation Request: Document NNC features and their application to error-resilient AI traffic in the 6G MED TR under 6G Media (based on clauses 2 and 3)
Use Case Consideration: Include the use case of NNC with channel-aware source coding training in AI traffic characteristics
The contribution includes specific text proposals for:
- Change 1: Addition of two references to the normative references section
- Change 2: New clause 6.2.4.X under Work topic #2d (AI Traffic Characteristics) containing the technical description of DVC and GRACE codecs, including architecture diagrams and performance characteristics