S4-260285 - AI Summary

Avatar-udpate to section 6.3.4

AI-Generated Summary AI

Summary of S4-260285: AVATAR-Update to section 6.3.4

Document Overview

This contribution updates section 6.3.4 of the AVATAR specification to align with the current status of MPEG Avatar Representation Format (ARF) work. The document reflects progression of ISO/IEC 23090-39 from early development to Committee Draft International Standard (CDIS) stage.

Main Technical Changes

MPEG ARF Specification Status Update

Reference document updated: From WG03N1316 to WG03N1693
Specification maturity: ISO/IEC 23090-39 has reached Committee Draft International Standard (CDIS) stage
Scope refinement: Editorial improvements to description of avatar representation format scope, including interchange format for computer-generated avatars, containers, and animation stream formats

Avatar Data Model and Representation Format

Restructured Data Model Description

The document significantly restructures how the ARF data model components are described:

High-level Avatar Information (Metadata Object):
- Name, Identifier, Age, Gender
- Holds avatar-level descriptive information for system adaptation and policy control

Preamble Section (new addition):
- Signature string for unique document identification
- Version string tied to specific ARF revision
- Optional authenticationFeatures with encrypted facial/voice feature vectors and public key URI
- supportedAnimations object specifying compatible facial, body, hand, landmark, and texture animation frameworks using URNs
- Optional proprietaryAnimations for vendor-specific schemes (e.g., ML-based reconstruction models)

Components Section (detailed expansion):
- Skeleton: Defines joints as scene graph nodes subset, references inverse bind matrices data item (Nx16 tensor), optional animationInfo
- Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4x4 matrix transformations
- Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights tensor (NxM)
- Mesh: Geometric primitives with name, ID, optional path, data items containing geometry
- BlendshapeSets: Shape targets for base mesh, references geometry-only shapes (GLB files), optional animationInfo
- LandmarkSets: Vertex/face indices with barycentric weights for landmark positioning
- TextureSets: Material resources linked to texture targets and animation frameworks

Container Format

Supports partial access to avatar components
Two container formats: ISOBMFF (ISO/IEC 14496-12) and Zip-based (ISO/IEC 21320-1)
ISOBMFF containers: ARF document in MetaBox item, may include animation tracks
Zip-based containers: Top-level ARF document with relative component file references

Scene Description Integration

Designed to work with MPEG Scene Description (ISO/IEC 23090-14) based on glTF
Not limited to MPEG Scene Description
ISO/IEC 23090-14 defines MPEG_node_avatar extension
ISO/IEC 23090-39 extends MPEG_node_avatar for better ARF integration

Reference Software (ISO/IEC 23090-43)

Major update from "under development" to defined implementation:

arfref Module (C++ and Python):
- Parsing of ARF containers
- Helper functions for asset decoding
- Partial glTF 2.0 encoding/decoding support for meshes
- Animation mapping (AnimationLink objects)
- Animation stream decoding
- Available through Python language

arfviewer Module:
- Avatar Animation Units (AAUs) support
- Time-sequence blendshape weights with optional confidence metrics
- Joint transformations for skeletal animation
- AAU format with chronological data blocks
- Inverse kinematics system for missing joint information
- Blendshape animator managing neutral mesh vertices and deltas with weighted summation

Reference Client Architecture

Based on ISO/IEC 23090-14 concepts
Avatar pipeline as part of Media Access Function (MAF)
Fetches ARF container and animation streams
Reconstructs avatar and provides to Presentation Engine through 3D mesh component buffers

Animation Bitstream Format

Comprehensive new section detailing AAU-based animation stream format:

Avatar Animation Units (AAUs) Structure

Sequence of AAUs with header, payload, and optional padding
Header: 7-bit AAU type code, AAU payload length in bytes
Payload: 32-bit timestamp in "ticks" plus type-specific data

AAU Types Defined

AAU_CONFIG: Configuration unit
AAU_BLENDSHAPE: Facial animation sample
AAU_JOINT: Body/hand joint animation sample
AAU_LANDMARK: Landmark animation sample
AAU_TEXTURE: Texture animation sample
Reserved ranges for future extensions

Configuration Units

Animation profile string (UTF-8 encoded)
Timescale value (32-bit float) for ticks-per-second conversion
Profile string identifies constraints and options

Facial Animation Samples (AAU_BLENDSHAPE)

Target blendshape set identifier
Per-blendshape confidence flag
Number of blendshape entries
Per entry: index, weight (32-bit float), optional confidence
Deformation formula: v = v₀ + Σₖ wₖ · Δvₖ

Joint Animation Samples (AAU_JOINT)

Target joint set identifier
Per-joint velocity flag
Number of joint entries
Per entry: joint index, 4×4 transformation matrix, optional velocity matrix
Linear Blend Skinning (LBS) formula: vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰

Landmark Animation Samples (AAU_LANDMARK)

Landmark set ID
Velocity and confidence flags
Dimensionality flag (2D vs 3D)
Number of landmarks
Per landmark: index, coordinates, optional velocity and confidence
Use cases: facial tracking overlays, sensor-mesh registration, calibration

Texture Animation Samples (AAU_TEXTURE)

Parametric texture weights for TextureSet targets
Similar structure to blendshape samples
Controls micro-geometry patterns, makeup, dynamic material variations

Animation Stream Delivery

Live transmission as AAU sequences
Storage as avatar animation tracks in ISOBMFF-based ARF containers
Sample grouping for pre-recorded sequences (e.g., "smile," "wave," "dance")
Dual use for real-time communication and offline authoring/replay

Exploration Experiments

Status changed from "initiated" to "continues":

Compression for Animation Streams: Evaluate compression methods for facial and body animations
Integrating Geometry Data Components: Specify integration into interoperable container format
Animation Sample Formats: Develop structures for blend shapes, facial landmarks, animation controllers, joint transforms
Content Discovery and Partial Access: Evaluate solutions
Animation Controllers: Study combination of blend shape and joint animation

Editorial Corrections

Revision 1 corrects reference software status
Various grammatical and formatting improvements throughout
Consistent terminology usage (e.g., "with" instead of "to with")

Document Information

TDoc:
S4-260285

Source:
InterDigital New York

Type:
discussion

For:
Agreement

Original Document:
View on 3GPP

Title: Avatar-udpate to section 6.3.4

Agenda item: 9.8

Agenda item description: FS_Avatar_Ph2_MED (Study on Avatar communication Phase 2)

Doc type: discussion

For action: Agreement

Release: Rel-20

Specification: 26.813

Spec: 26.813

Contact: Gaelle Martin-Cocher

Uploaded: 2026-02-04T14:56:21.570000

Contact ID: 91571

TDoc Status: agreed

Is revision of: S4-260251

Reservation date: 04/02/2026 14:53:40

Agenda item sort order: 43