# Summary of 3GPP Technical Document S4-260174

## Document Information
- **Type**: Change Request (CR 0003 rev 1)
- **Specification**: TS 26.966 v19.0.0
- **Category**: B (addition of feature)
- **Release**: Rel-19
- **Work Item**: FS_AVFOPS_MED (Feasibility Study on AVFOPS for Media)
- **Source**: Xiaomi Communications

## Purpose
This CR proposes adding a new scenario to TS 26.966 addressing video with semantic segmentation maps, progressing objective 1 on identifying relevant new representation formats not yet documented in TS 26.265.

## Main Technical Contributions

### 5.8 Scenario #57: Video with Semantic Segmentation Maps

#### 5.8.1 Overview and Use Case Description

**Semantic Segmentation Fundamentals**
- Technique where every pixel in an image is classified into one or more semantic classes
- Example classes from Android ARCore Scene Semantics API: sky, building, tree, road, vehicle, sidewalk, terrain, structure, water, object, person
- Enables AR applications with advanced video processing (sky replacement, realistic lighting effects)

**Mobile Implementation Context**
- Real-time capture of segmentation maps alongside camera view is commonly available on recent mobile devices
- Leverages high-capacity camera/video pipelines and AI frameworks with hardware optimizations
- Specialized models exist for specific content types (e.g., multi-class selfie segmentation)

**Multi-class Selfie Segmentation Model**
- Provides 7 classes for selfie shots:
  - Background
  - Hair
  - Body-skin
  - Face-skin
  - Clothes
  - Others (accessories)

**Use Cases**
- Video effects (hair replacement, face filtering)
- Video indexing
- AI search

#### 5.8.1.2 Example Image Segmentation Method on Mobile Platform

**Processing Pipeline**
Three main steps identified:
1. **Frame acquisition**
2. **AI inference**
3. **Generation of segmentation map**

**Implementation Details**
- Uses Google Media Pipe framework API for image segmentation
- AI model performs inference on camera frames
- Output format: 2D array of unsigned 8-bit integers
- Each value represents estimated category for each input pixel

**Class Identifier Mapping**
For multi-class selfie segmentation:
- 0: background
- 1: hair
- 2: body-skin
- 3: face-skin
- 4: clothes
- 5: others (accessories)

**Efficiency Considerations**
- Direct class identifier representation is inefficient (only 6 values out of 255 used)
- Mapping class identifiers to sample value ranges improves:
  - Transport efficiency
  - Robustness to encoding artifacts

**Example Mapping Table**
| Class ID | Assigned Value | Sample Range |
|----------|---------------|--------------|
| 0 | 21 | 0-42 |
| 1 | 64 | 43-85 |
| 2 | 107 | 86-128 |
| 3 | 150 | 129-171 |
| 4 | 193 | 172-214 |
| 5 | 235 | 215-255 |

### 5.8.2 Review of Previous Work

- Coded representation of semantic segmentation maps as part of video bitstream has not been addressed in 3GPP specifications until now

### 5.8.3 Review of Related Work

#### 5.8.3.1 In ISO/IEC 23008-2 HEVC / ITU-T H.265

**Current Status in JVET**
- Encoding of semantic maps not currently enabled by MV-HEVC standard
- JVET developing possible MV-HEVC extension with:
  - New auxiliary layer type called "segmentation plane"
  - Picture segmentation information SEI message for interpreting decoded samples as class identifiers
- Reference: JVET-AN2032 (40th Meeting, Geneva, October 2025)

### 5.8.4 Functional Requirements

**Assessment Framework**
Two aspects for functional analysis:

1. **Hardware Impact Assessment**
   - Option a: Existing hardware product-grade support (provide examples)
   - Option b: No existing hardware support (provide justification/description of expected implementation impact)

2. **Codec Capabilities**
   - TBD (to be determined)

## References
- [x1]: ARCore Scene Semantics API documentation
- [x2]: Google AI Edge MediaPipe image segmentation guide
- [x3]: Qualcomm AI Hub semantic segmentation Android
- [x4]: JVET-AN2032 on VSEI extensions
- [x5]: MediaPipe ImageSegmenter API documentation
- [x6]: MediaPipe multi-class selfie segmentation model card