S4-260174 - AI Summary

[FS_AVFOPS_MED] New scenario: Video with semantic segmentation map

Back to Agenda Download Summary
AI-Generated Summary AI

Summary of 3GPP Technical Document S4-260174

Document Information

  • Type: Change Request (CR 0003 rev 1)
  • Specification: TS 26.966 v19.0.0
  • Category: B (addition of feature)
  • Release: Rel-19
  • Work Item: FS_AVFOPS_MED (Feasibility Study on AVFOPS for Media)
  • Source: Xiaomi Communications

Purpose

This CR proposes adding a new scenario to TS 26.966 addressing video with semantic segmentation maps, progressing objective 1 on identifying relevant new representation formats not yet documented in TS 26.265.

Main Technical Contributions

5.8 Scenario #57: Video with Semantic Segmentation Maps

5.8.1 Overview and Use Case Description

Semantic Segmentation Fundamentals
- Technique where every pixel in an image is classified into one or more semantic classes
- Example classes from Android ARCore Scene Semantics API: sky, building, tree, road, vehicle, sidewalk, terrain, structure, water, object, person
- Enables AR applications with advanced video processing (sky replacement, realistic lighting effects)

Mobile Implementation Context
- Real-time capture of segmentation maps alongside camera view is commonly available on recent mobile devices
- Leverages high-capacity camera/video pipelines and AI frameworks with hardware optimizations
- Specialized models exist for specific content types (e.g., multi-class selfie segmentation)

Multi-class Selfie Segmentation Model
- Provides 7 classes for selfie shots:
- Background
- Hair
- Body-skin
- Face-skin
- Clothes
- Others (accessories)

Use Cases
- Video effects (hair replacement, face filtering)
- Video indexing
- AI search

5.8.1.2 Example Image Segmentation Method on Mobile Platform

Processing Pipeline
Three main steps identified:
1. Frame acquisition
2. AI inference
3. Generation of segmentation map

Implementation Details
- Uses Google Media Pipe framework API for image segmentation
- AI model performs inference on camera frames
- Output format: 2D array of unsigned 8-bit integers
- Each value represents estimated category for each input pixel

Class Identifier Mapping
For multi-class selfie segmentation:
- 0: background
- 1: hair
- 2: body-skin
- 3: face-skin
- 4: clothes
- 5: others (accessories)

Efficiency Considerations
- Direct class identifier representation is inefficient (only 6 values out of 255 used)
- Mapping class identifiers to sample value ranges improves:
- Transport efficiency
- Robustness to encoding artifacts

Example Mapping Table
| Class ID | Assigned Value | Sample Range |
|----------|---------------|--------------|
| 0 | 21 | 0-42 |
| 1 | 64 | 43-85 |
| 2 | 107 | 86-128 |
| 3 | 150 | 129-171 |
| 4 | 193 | 172-214 |
| 5 | 235 | 215-255 |

5.8.2 Review of Previous Work

  • Coded representation of semantic segmentation maps as part of video bitstream has not been addressed in 3GPP specifications until now

5.8.3 Review of Related Work

5.8.3.1 In ISO/IEC 23008-2 HEVC / ITU-T H.265

Current Status in JVET
- Encoding of semantic maps not currently enabled by MV-HEVC standard
- JVET developing possible MV-HEVC extension with:
- New auxiliary layer type called "segmentation plane"
- Picture segmentation information SEI message for interpreting decoded samples as class identifiers
- Reference: JVET-AN2032 (40th Meeting, Geneva, October 2025)

5.8.4 Functional Requirements

Assessment Framework
Two aspects for functional analysis:

  1. Hardware Impact Assessment
  2. Option a: Existing hardware product-grade support (provide examples)
  3. Option b: No existing hardware support (provide justification/description of expected implementation impact)

  4. Codec Capabilities

  5. TBD (to be determined)

References

  • [x1]: ARCore Scene Semantics API documentation
  • [x2]: Google AI Edge MediaPipe image segmentation guide
  • [x3]: Qualcomm AI Hub semantic segmentation Android
  • [x4]: JVET-AN2032 on VSEI extensions
  • [x5]: MediaPipe ImageSegmenter API documentation
  • [x6]: MediaPipe multi-class selfie segmentation model card
Document Information
Source:
Xiaomi Communications
Type:
CR
For:
Endorsement
Original Document:
View on 3GPP
Title: [FS_AVFOPS_MED] New scenario: Video with semantic segmentation map
Agenda item: 9.5
Agenda item description: FS_AVFOPS_MED (Study of Advanced Video Formats and Operation Points)
Doc type: CR
For action: Endorsement
Secretary remarks: Source modified on 2/3/2026. Original source : Xiaomi Communications
Release: Rel-19
Specification: 26.966
Version: 19.0.0
Related WIs: FS_AVFOPS_MED
CR number: 3.0
CR revision: 1.0
CR category: B
Clauses affected: 5.8 (new)
CN: True
CR: 3.0
ME: True
Spec: 26.966
Contact: Emmanuel Thomas
Uploaded: 2026-02-03T22:21:38.467000
Contact ID: 92007
TDoc Status: endorsed
Is revision of: S4aV250072
Clauses Affected: 5.8 (new)
Reservation date: 03/02/2026 15:33:08
Secretary Remarks: Source modified on 2/3/2026. Original source : Xiaomi Communications
Agenda item sort order: 40