Meeting: TSGS4_135_India | Agenda Item: 9.5
6 documents found
[FS_AVFOPS-MED] Work Plan for Advanced Video Formats and Operation Points (FS_AVFOPS)
This document (S4-260130) from Xiaomi proposes a work plan for the Study Item on Advanced Video Formats and Operation Points (FS_AVFOPS_MED). The study focuses on identifying and documenting new video representation formats and their integration into 3GPP specifications, particularly TS 26.265.
The study item encompasses eight main objectives:
Study feasibility of generating video signals corresponding to the hypothetical optical system (interoperability point 2) by: - Sub-objective 2a: Document requirements of real-time capturing systems to meet video signal characteristics (interface points 1a and 1b), including: - Temporal alignment of captured pictures - Frame rate - Bit depth - Sub-objective 2b: Identify different types of video processing functions applied to video source signals from typical UE capturing systems, including the possibility of AI-based image generation (interoperability point 2)
Identify gaps in specifications and provide guidance: - In 3GPP specifications: Especially on operation points, provide guidance for potential normative work - In other SDO specifications (e.g., MPEG): Coordinate possible actions with relevant SDOs
The study adopts the following working methods: - Individual CRs created for specific aspects of work (e.g., one CR per new scenario) - A merged CR will be created at the end of the study to compile all changes to TR 26.966 - Company proposals to progress CRs submitted as "discussion" TDocs - Upon agreement, the responsible person of the CR implements the agreement
Three CRs are currently in progress:
| CR | Title | Latest TDoc | Clause | Responsible | |---|---|---|---|---| | 0001r12 | New scenario: Video with changeable background | S4-252016 / S4aV250069 | 5.6 (new) | Emmanuel Thomas | | 0002r12 | New scenario: Refocusable video | S4aV250071 / S4-252018 | 5.7 (new) | Emmanuel Thomas | | 0003 | New scenario: Video with semantic segmentation map | S4aV250072 | 5.8 (new) | Emmanuel Thomas |
Three telcos planned (dates TBD): - First two telcos: Continue work on objectives 1, 2, and 3 - Third telco: - Continue work on objectives 1, 2, and 3 - Start work on objectives 4, 5, 6, and 7 - Host: Qualcomm
The document proposes to agree on the work plan provided in the timeline above.
[FS_AVFOPS_MED] New scenario: Video with changeable background
This CR proposes adding a new scenario to TS 26.966 addressing video with changeable background functionality, progressing objective 1 on identifying relevant new representation formats not yet documented in TS 26.265.
The CR introduces a scenario addressing the growing use case of mobile video editing where users: - Record and edit videos directly on their devices - Upload source video to cloud services for editing, or - Use local applications to generate final video
The key technical requirement is video compositing where: - One video is overlaid on another visual content - Alpha blending is performed using an alpha channel - Pixel-accurate transparency information is required - Alpha channel is typically carried in lossless manner
The CR includes four illustrative figures showing: 1. Original video frame 2. Associated alpha plane 3. Video frame with alpha plane applied 4. Alpha blended video frame with background image
The CR notes that coded representation of alpha auxiliary channels as part of video bitstream has not been addressed in 3GPP specifications until now.
urn:mpeg:mpegB:cicp:systems:auxiliary:alphaAlpha Plane Interpretation Rules: - Minimum sample value: Transparency (co-located pixel is transparent) - Maximum sample value: Opacity (co-located pixel is opaque) - Alpha value: Normalized between 0.0 and 1.0 - Sample values divided by maximum value (e.g., 255 for 8-bit) provides multiplier for master image intensity - Requirement: Encoded alpha planes must use full sample range (0-255 for 8-bit)
Defines signal properties for key/alpha/matte signals (terms used interchangeably):
Properties: - Black level: Complete transparency - White level: Complete opacity - Transfer function: Out of scope, but assumed linear; black/white levels conform to image format specifications - Alpha value: Normalized 0.0 (fully transparent) to 1.0 (fully opaque) - Sample mapping: Co-located with corresponding fill luminance or RGB samples (zero pixel offset) - Timing: Timed coincident with associated fill video signal
Code Points Supporting Alpha: - Value 4: Alpha (matte) - Value 51: R, G, B, Alpha (A) - Value 52: A, B, G, R - Value 101: CB, Y, A, CR, Y, A (4:2:2:4) - Value 103: CB, Y, CR, A (4:4:4:4)
The CR establishes functional analysis framework based on:
Two possibilities: 1. Existing hardware support: Reference to example hardware products 2. No existing hardware support: Discussion/description with justifications on expected hardware implementation impact, or reference to existing demos
[FS_AVFOPS_MED] New scenario: Refocusable video
This CR proposes adding a new scenario (Scenario #6) on Refocusable Video to TR 26.966, addressing objective 1 of identifying relevant new representation formats not yet documented in TS 26.265.
The CR introduces the concept of refocusable video, which enables post-capture modification of depth of field effects (bokeh). Key points:
Identifies gap: coded representation of depth maps as part of video bitstream has not been addressed in 3GPP specifications.
Comprehensive survey of depth map representation across multiple standards bodies:
urn:mpeg:mpegB:cicp:systems:auxiliary:depthDefines comprehensive depth map data representation with key definitions:
Terminology: - Reference Camera: Camera corresponding to viewpoint (can be virtual) - Depth Map: Array of depth values corresponding to image pixels - Depth Value: Distance in meters from reference camera to object surface, measured parallel to optical axis - Relative Depth Value: Offset and scaled representation of depth value
Two representations specified:
Co-located sample mapping
16-bit floating point:
Digital Picture Exchange Format v2.0 for moving pictures:
Depth component support: - Code value 8: Depth (Z) component
Transfer characteristics: - Code 11: Z (depth) – linear - Code 12: Z (depth) – homogeneous (requires distance to screen and angle of view in user-defined section)
Outlines analysis framework based on:
Option b: Describe expected hardware implementation impact with justifications
Codec capabilities: TBD
The CR adds 9 new normative/informative references covering: - Android AOSP camera bokeh documentation - JVET documents on CICP extensions - ISO/IEC standards (MIAF, ISOBMFF amendments) - SMPTE standards (RP 157, ST 268-1, ST 2087) - Google Dynamic Depth specification - Android MP4-AT file format
[FS_AVFOPS_MED] New scenario: Video with semantic segmentation map
This CR proposes adding a new scenario to TS 26.966 addressing video with semantic segmentation maps, progressing objective 1 on identifying relevant new representation formats not yet documented in TS 26.265.
Semantic Segmentation Fundamentals - Technique where every pixel in an image is classified into one or more semantic classes - Example classes from Android ARCore Scene Semantics API: sky, building, tree, road, vehicle, sidewalk, terrain, structure, water, object, person - Enables AR applications with advanced video processing (sky replacement, realistic lighting effects)
Mobile Implementation Context - Real-time capture of segmentation maps alongside camera view is commonly available on recent mobile devices - Leverages high-capacity camera/video pipelines and AI frameworks with hardware optimizations - Specialized models exist for specific content types (e.g., multi-class selfie segmentation)
Multi-class Selfie Segmentation Model - Provides 7 classes for selfie shots: - Background - Hair - Body-skin - Face-skin - Clothes - Others (accessories)
Use Cases - Video effects (hair replacement, face filtering) - Video indexing - AI search
Processing Pipeline Three main steps identified: 1. Frame acquisition 2. AI inference 3. Generation of segmentation map
Implementation Details - Uses Google Media Pipe framework API for image segmentation - AI model performs inference on camera frames - Output format: 2D array of unsigned 8-bit integers - Each value represents estimated category for each input pixel
Class Identifier Mapping For multi-class selfie segmentation: - 0: background - 1: hair - 2: body-skin - 3: face-skin - 4: clothes - 5: others (accessories)
Efficiency Considerations - Direct class identifier representation is inefficient (only 6 values out of 255 used) - Mapping class identifiers to sample value ranges improves: - Transport efficiency - Robustness to encoding artifacts
Example Mapping Table | Class ID | Assigned Value | Sample Range | |----------|---------------|--------------| | 0 | 21 | 0-42 | | 1 | 64 | 43-85 | | 2 | 107 | 86-128 | | 3 | 150 | 129-171 | | 4 | 193 | 172-214 | | 5 | 235 | 215-255 |
Current Status in JVET - Encoding of semantic maps not currently enabled by MV-HEVC standard - JVET developing possible MV-HEVC extension with: - New auxiliary layer type called "segmentation plane" - Picture segmentation information SEI message for interpreting decoded samples as class identifiers - Reference: JVET-AN2032 (40th Meeting, Geneva, October 2025)
Assessment Framework Two aspects for functional analysis:
Option b: No existing hardware support (provide justification/description of expected implementation impact)
Codec Capabilities
[FS_AVFOPS_MED] Updates to possible solutions and mapping to scenarios
This CR updates the possible solutions related to new use cases, specifically adding solutions for Scenario #5: Video with changeable background.
The CR extends Table 6.0-1 to include two new solutions for Scenario #5:
These solutions address video with changeable background use cases.
This solution leverages HEVC multi-layer extensions to carry alpha planes as auxiliary channels:
Auxiliary Picture Signalling:
- Uses scalability_mask_flag in the Video Parameter Set (VPS)
- Sets scalability mask index to '3' (reserved for "Auxiliary" scalability dimension)
- AuxId value determines auxiliary picture type:
- AuxId = 1: Alpha plane (AUX_ALPHA)
- AuxId = 2: Depth picture (AUX_DEPTH)
- Additional interpretation information carried via SEI messages (Alpha channel information, Depth representation information)
Two possible approaches identified for further study: 1. Multiview profiles (though only one view is present) 2. Combination of non-Multiview profile for base layer with monochrome profile for auxiliary layer
Open Issues: - Different chroma subsampling between layers - Different encoding configurations - Spatial resolution differences - Bit depth variations
This solution uses two independent HEVC bitstreams: 1. First bitstream: Video content 2. Second bitstream: Alpha plane sequence
Alpha Plane Signalling:
- Alpha plane carried as a single-layer HEVC bitstream
- Current HEVC specification lacks explicit signalling for alpha plane sequences
- Proposed solution: Use specific code points in VUI information
- Reference to potential CICP extension [x2] for signalling via colour_primaries parameter in VUI
VUI Parameters:
- Signalling through colour_description_present_flag and related parameters
- colour_primaries, transfer_characteristics, and matrix_coeffs fields in VUI
Since bitstreams are independent: - Video content bitstream: Any HEVC profile - Alpha plane bitstream: - Monochrome profiles (Monochrome, Monochrome 10, Monochrome 12, Monochrome 16) - 4:2:0 profiles
Both solutions (#5.1 and #5.2) have evaluation sections marked as "For further study", indicating: - Performance evaluation pending - Profile compatibility analysis needed - Implementation considerations to be determined
[FS_AVFOPS_MED] Permanent document on conformance v1.1.0
Source: Xiaomi (PD editor)
Title: AVFOPS Permanent Document v2.0.0
Version: 1.1.0
Meeting: SA4#135, February 2026, Goa, India
Agenda Item: 9.5
Document for: Agreement
This permanent document consolidates all conformance-related material for video operation points (VOPS), gathering requirements, frameworks, and test content submitted to SA4 meetings. The document has evolved from VOPS work item to FS_AVFOPS study item.
The platform architecture consists of: - Database: Contains descriptions of available sample bitstreams - Hosting server(s): Store submitted bitstreams - Public portal: Enables external users to search and download bitstreams - Bitstream validator: Validates compliance with TS 26.265 constraints prior to upload
The database is proposed as a git repository on web-based platforms (GitHub/GitLab) using JSON/markup files. Each bitstream links to TS numbers and profiles via URNs.
Repository location: https://forge.3gpp.org/rep/sa4/ts-26.265/conformance/bitstream-validator
Key capabilities: - Validates bitstream compliance with video coding specifications and profiles - Validates compliance with TS 26.265 bitstream constraints - Uses reference decoder (JVET) for codec conformance checking - Implements programmatic constraint validation via XML schema
Technical approach: 1. Parse input bitstream and generate XML dump of syntax elements 2. Express VOPS constraints as XML schema (XSD 1.1) 3. Validate XML bitstream description against constraint schemas
Usage workflow: ```
python -m sa4_bitstream_validator dump bitstream_path description.xml
python -m sa4_bitstream_validator validate description.xml bitstream_rules/operation_point.xsd ```
Advantages: - Codec-agnostic constraint expression - No programming knowledge required for constraint definition - Reusable bitstream descriptions for database
Constraints defined using XSD 1.1 with xs:assert elements. Example schema provided for MV-HEVC stereo operation point (vops_3gpp-mv-hevc-stereo.xsd) includes validation of:
- VPS multi-layer parameters
- Layer set configuration
- Scalability mask and ScalabilityId constraints
- VUI-specific constraints for MV-HEVC
- three_dimensional_reference_displays_info SEI message parameters
Comprehensive status tracking table provided covering:
AVC Bitstreams: - Motion-vector constraints: None implemented - Rate constraints: None implemented
HEVC Bitstreams: - Progressive constraints: Done - VUI constraints: Work-in-progress (done but not tested with bitstreams) - Frame-packing constraints: None implemented
Specific VUI constraint validations include:
- vui_parameters_present_flag = 1
- aspect_ratio_info_present_flag = 1
- video_signal_type_present_flag = 1 and colour_description_present_flag = 1
- video_full_range_flag = 0
- overscan_info_present_flag = 0
- chroma_loc_info_present_flag = 1
Note: Timing information constraints proposed for removal (marked as issues)
Status for various decoder profiles: - AVC decoders (FullHD, UHD, 8K): None implemented - HEVC decoders (HD, FullHD, 8K): None implemented - MV-HEVC-Main-Dual-layers-UHD420-Dec: Work-in-progress - MV-HEVC-Ext-Dual-layers-UHD420-Dec: None implemented - HEVC-Frame-Packed-Stereo-Dec: None implemented
Alignment with TS 26.265 V19.1.0: - Validation of multi-layer parameters in VPS - Validation of ScalabilityId constraint added - Validation of VUI-specific constraints for MV-HEVC operation points - Validation of three_dimensional_reference_displays_info SEI message
MV-HEVC Stereo Common Bitstream Requirements (6.3.6.2):
Implemented validations:
- vps_num_layer_sets_minus1 >= 1
- layer_id_included_flag[1][0] = 1 with at least one other layer included
- scalability_mask_flag[1] = 1
- ScalabilityId[1][1] = 1
- default_output_layer_idc = 0
- chroma_format_idc = 1
- aspect_ratio_idc = 1
- Colour primaries/transfer/matrix combinations for SDR HD or HDR
- three_dimensional_reference_displays_info SEI message presence and constraints:
- num_ref_displays_minus1 = 0
- left_view_id[0] and right_view_id[0] validation against view_id_val
Source Content: - Polytech Nantes database: 31 sequences, 1920x1080, 10-bit 4:2:2 YUV at 25 fps (availability issues noted)
Compressed Bitstreams:
Level 4, 30 fps, 300 frames
Hummingbird_Spatial (5.1.2.2): New submission
Reference Software: - HM reference software for HEVC - HTM reference software for MV-HEVC and 3D-HEVC extensions
Annex A provides background on DASH-IF conformance suite approach, noting that existing tools (DASH-IF, GPAC/MP4Box) can parse NAL units and generate XML dumps but do not implement comprehensive video bitstream validation against 3GPP operation point constraints.
This document represents significant progress in establishing a comprehensive conformance framework for 3GPP video operation points. The main achievements include:
The work-in-progress items focus primarily on MV-HEVC stereo operation points, with most AVC and single-layer HEVC operation points awaiting implementation.