S4-260174 - AI Summary

[FS_AVFOPS_MED] New scenario: Video with semantic segmentation map

AI-Generated Summary AI

Summary of 3GPP Technical Document S4-260174

Document Information

Type: Change Request (CR 0003 rev 1)
Specification: TS 26.966 v19.0.0
Category: B (addition of feature)
Release: Rel-19
Work Item: FS_AVFOPS_MED (Feasibility Study on AVFOPS for Media)
Source: Xiaomi Communications

Purpose

This CR proposes adding a new scenario to TS 26.966 addressing video with semantic segmentation maps, progressing objective 1 on identifying relevant new representation formats not yet documented in TS 26.265.

Main Technical Contributions

5.8 Scenario #57: Video with Semantic Segmentation Maps

5.8.1 Overview and Use Case Description

Semantic Segmentation Fundamentals
- Technique where every pixel in an image is classified into one or more semantic classes
- Example classes from Android ARCore Scene Semantics API: sky, building, tree, road, vehicle, sidewalk, terrain, structure, water, object, person
- Enables AR applications with advanced video processing (sky replacement, realistic lighting effects)

Mobile Implementation Context
- Real-time capture of segmentation maps alongside camera view is commonly available on recent mobile devices
- Leverages high-capacity camera/video pipelines and AI frameworks with hardware optimizations
- Specialized models exist for specific content types (e.g., multi-class selfie segmentation)

Multi-class Selfie Segmentation Model
- Provides 7 classes for selfie shots:
- Background
- Hair
- Body-skin
- Face-skin
- Clothes
- Others (accessories)

Use Cases
- Video effects (hair replacement, face filtering)
- Video indexing
- AI search

5.8.1.2 Example Image Segmentation Method on Mobile Platform

Processing Pipeline
Three main steps identified:
1. Frame acquisition
2. AI inference
3. Generation of segmentation map

Implementation Details
- Uses Google Media Pipe framework API for image segmentation
- AI model performs inference on camera frames
- Output format: 2D array of unsigned 8-bit integers
- Each value represents estimated category for each input pixel

Class Identifier Mapping
For multi-class selfie segmentation:
- 0: background
- 1: hair
- 2: body-skin
- 3: face-skin
- 4: clothes
- 5: others (accessories)

Efficiency Considerations
- Direct class identifier representation is inefficient (only 6 values out of 255 used)
- Mapping class identifiers to sample value ranges improves:
- Transport efficiency
- Robustness to encoding artifacts

Example Mapping Table
| Class ID | Assigned Value | Sample Range |
|----------|---------------|--------------|
| 0 | 21 | 0-42 |
| 1 | 64 | 43-85 |
| 2 | 107 | 86-128 |
| 3 | 150 | 129-171 |
| 4 | 193 | 172-214 |
| 5 | 235 | 215-255 |

5.8.2 Review of Previous Work

Coded representation of semantic segmentation maps as part of video bitstream has not been addressed in 3GPP specifications until now

5.8.3 Review of Related Work

5.8.3.1 In ISO/IEC 23008-2 HEVC / ITU-T H.265

Current Status in JVET
- Encoding of semantic maps not currently enabled by MV-HEVC standard
- JVET developing possible MV-HEVC extension with:
- New auxiliary layer type called "segmentation plane"
- Picture segmentation information SEI message for interpreting decoded samples as class identifiers
- Reference: JVET-AN2032 (40th Meeting, Geneva, October 2025)

5.8.4 Functional Requirements

Assessment Framework
Two aspects for functional analysis:

Hardware Impact Assessment
Option a: Existing hardware product-grade support (provide examples)
Option b: No existing hardware support (provide justification/description of expected implementation impact)
Codec Capabilities
TBD (to be determined)

References

[x1]: ARCore Scene Semantics API documentation
[x2]: Google AI Edge MediaPipe image segmentation guide
[x3]: Qualcomm AI Hub semantic segmentation Android
[x4]: JVET-AN2032 on VSEI extensions
[x5]: MediaPipe ImageSegmenter API documentation
[x6]: MediaPipe multi-class selfie segmentation model card

Document Information

TDoc:
S4-260174

Source:
Xiaomi Communications

Type:
CR

For:
Endorsement

Original Document:
View on 3GPP

Title: [FS_AVFOPS_MED] New scenario: Video with semantic segmentation map

Agenda item: 9.5

Agenda item description: FS_AVFOPS_MED (Study of Advanced Video Formats and Operation Points)

Doc type: CR

For action: Endorsement

Secretary remarks: Source modified on 2/3/2026. Original source : Xiaomi Communications

Release: Rel-19

Specification: 26.966

Version: 19.0.0

Related WIs: FS_AVFOPS_MED

CR number: 3.0

CR revision: 1.0

CR category: B

Clauses affected: 5.8 (new)

CN: True

CR: 3.0

ME: True

Spec: 26.966

Contact: Emmanuel Thomas

Uploaded: 2026-02-03T22:21:38.467000

Contact ID: 92007

TDoc Status: endorsed

Is revision of: S4aV250072

Clauses Affected: 5.8 (new)

Reservation date: 03/02/2026 15:33:08

Secretary Remarks: Source modified on 2/3/2026. Original source : Xiaomi Communications

Agenda item sort order: 40