S4-260285 - AI Summary

Avatar-udpate to section 6.3.4

Back to Agenda Download Summary
AI-Generated Summary AI

Summary of S4-260285: AVATAR-Update to section 6.3.4

Document Overview

This contribution updates section 6.3.4 of the AVATAR specification to align with the current status of MPEG Avatar Representation Format (ARF) work. The document reflects progression of ISO/IEC 23090-39 from early development to Committee Draft International Standard (CDIS) stage.

Main Technical Changes

MPEG ARF Specification Status Update

  • Reference document updated: From WG03N1316 to WG03N1693
  • Specification maturity: ISO/IEC 23090-39 has reached Committee Draft International Standard (CDIS) stage
  • Scope refinement: Editorial improvements to description of avatar representation format scope, including interchange format for computer-generated avatars, containers, and animation stream formats

Avatar Data Model and Representation Format

Restructured Data Model Description

The document significantly restructures how the ARF data model components are described:

High-level Avatar Information (Metadata Object):
- Name, Identifier, Age, Gender
- Holds avatar-level descriptive information for system adaptation and policy control

Preamble Section (new addition):
- Signature string for unique document identification
- Version string tied to specific ARF revision
- Optional authenticationFeatures with encrypted facial/voice feature vectors and public key URI
- supportedAnimations object specifying compatible facial, body, hand, landmark, and texture animation frameworks using URNs
- Optional proprietaryAnimations for vendor-specific schemes (e.g., ML-based reconstruction models)

Components Section (detailed expansion):
- Skeleton: Defines joints as scene graph nodes subset, references inverse bind matrices data item (Nx16 tensor), optional animationInfo
- Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4x4 matrix transformations
- Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights tensor (NxM)
- Mesh: Geometric primitives with name, ID, optional path, data items containing geometry
- BlendshapeSets: Shape targets for base mesh, references geometry-only shapes (GLB files), optional animationInfo
- LandmarkSets: Vertex/face indices with barycentric weights for landmark positioning
- TextureSets: Material resources linked to texture targets and animation frameworks

Container Format

  • Supports partial access to avatar components
  • Two container formats: ISOBMFF (ISO/IEC 14496-12) and Zip-based (ISO/IEC 21320-1)
  • ISOBMFF containers: ARF document in MetaBox item, may include animation tracks
  • Zip-based containers: Top-level ARF document with relative component file references

Scene Description Integration

  • Designed to work with MPEG Scene Description (ISO/IEC 23090-14) based on glTF
  • Not limited to MPEG Scene Description
  • ISO/IEC 23090-14 defines MPEG_node_avatar extension
  • ISO/IEC 23090-39 extends MPEG_node_avatar for better ARF integration

Reference Software (ISO/IEC 23090-43)

Major update from "under development" to defined implementation:

arfref Module (C++ and Python):
- Parsing of ARF containers
- Helper functions for asset decoding
- Partial glTF 2.0 encoding/decoding support for meshes
- Animation mapping (AnimationLink objects)
- Animation stream decoding
- Available through Python language

arfviewer Module:
- Avatar Animation Units (AAUs) support
- Time-sequence blendshape weights with optional confidence metrics
- Joint transformations for skeletal animation
- AAU format with chronological data blocks
- Inverse kinematics system for missing joint information
- Blendshape animator managing neutral mesh vertices and deltas with weighted summation

Reference Client Architecture

  • Based on ISO/IEC 23090-14 concepts
  • Avatar pipeline as part of Media Access Function (MAF)
  • Fetches ARF container and animation streams
  • Reconstructs avatar and provides to Presentation Engine through 3D mesh component buffers

Animation Bitstream Format

Comprehensive new section detailing AAU-based animation stream format:

Avatar Animation Units (AAUs) Structure

  • Sequence of AAUs with header, payload, and optional padding
  • Header: 7-bit AAU type code, AAU payload length in bytes
  • Payload: 32-bit timestamp in "ticks" plus type-specific data

AAU Types Defined

  • AAU_CONFIG: Configuration unit
  • AAU_BLENDSHAPE: Facial animation sample
  • AAU_JOINT: Body/hand joint animation sample
  • AAU_LANDMARK: Landmark animation sample
  • AAU_TEXTURE: Texture animation sample
  • Reserved ranges for future extensions

Configuration Units

  • Animation profile string (UTF-8 encoded)
  • Timescale value (32-bit float) for ticks-per-second conversion
  • Profile string identifies constraints and options

Facial Animation Samples (AAU_BLENDSHAPE)

  • Target blendshape set identifier
  • Per-blendshape confidence flag
  • Number of blendshape entries
  • Per entry: index, weight (32-bit float), optional confidence
  • Deformation formula: v = v₀ + Σₖ wₖ · Δvₖ

Joint Animation Samples (AAU_JOINT)

  • Target joint set identifier
  • Per-joint velocity flag
  • Number of joint entries
  • Per entry: joint index, 4×4 transformation matrix, optional velocity matrix
  • Linear Blend Skinning (LBS) formula: vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰

Landmark Animation Samples (AAU_LANDMARK)

  • Landmark set ID
  • Velocity and confidence flags
  • Dimensionality flag (2D vs 3D)
  • Number of landmarks
  • Per landmark: index, coordinates, optional velocity and confidence
  • Use cases: facial tracking overlays, sensor-mesh registration, calibration

Texture Animation Samples (AAU_TEXTURE)

  • Parametric texture weights for TextureSet targets
  • Similar structure to blendshape samples
  • Controls micro-geometry patterns, makeup, dynamic material variations

Animation Stream Delivery

  • Live transmission as AAU sequences
  • Storage as avatar animation tracks in ISOBMFF-based ARF containers
  • Sample grouping for pre-recorded sequences (e.g., "smile," "wave," "dance")
  • Dual use for real-time communication and offline authoring/replay

Exploration Experiments

Status changed from "initiated" to "continues":

  • Compression for Animation Streams: Evaluate compression methods for facial and body animations
  • Integrating Geometry Data Components: Specify integration into interoperable container format
  • Animation Sample Formats: Develop structures for blend shapes, facial landmarks, animation controllers, joint transforms
  • Content Discovery and Partial Access: Evaluate solutions
  • Animation Controllers: Study combination of blend shape and joint animation

Editorial Corrections

  • Revision 1 corrects reference software status
  • Various grammatical and formatting improvements throughout
  • Consistent terminology usage (e.g., "with" instead of "to with")
Document Information
Source:
InterDigital New York
Type:
discussion
For:
Agreement
Original Document:
View on 3GPP
Title: Avatar-udpate to section 6.3.4
Agenda item: 9.8
Agenda item description: FS_Avatar_Ph2_MED (Study on Avatar communication Phase 2)
Doc type: discussion
For action: Agreement
Release: Rel-20
Specification: 26.813
Spec: 26.813
Contact: Gaelle Martin-Cocher
Uploaded: 2026-02-04T14:56:21.570000
Contact ID: 91571
TDoc Status: agreed
Is revision of: S4-260251
Reservation date: 04/02/2026 14:53:40
Agenda item sort order: 43