S4-260251 - AI Summary

Avatar-udpate to section 6.3.4

Back to Agenda Download Summary
AI-Generated Summary AI

Technical Summary: AVATAR - Update to Section 6.3.4

Overview

This document provides a comprehensive update to section 6.3.4 concerning the MPEG Avatar Representation Format (ARF), now standardized as ISO/IEC 23090-39. The document reflects the progression of the standard from its initial development phase to reaching Committee Draft International Standard (CDIS) stage.

MPEG Avatar Representation Format Development

Scope and Objectives

The MPEG WG03 (Systems) workgroup is developing a new standard for avatar representation format with the following scope:

  • Develop an interchange representation format for computer-generated avatars and associated containers
  • Define an animation stream format to represent avatar dynamics and time-based information
  • Include geometrical models and all associated data (blendshapes, skeleton, normals, textures, maps, metadata)
  • Provide a streamable format for dynamics (animation parameters, tracking information, contextual data)
  • Ensure interoperability between existing models and formats

Requirements and Priorities

The Phase 1 requirements are categorized with three priority levels (High, Medium, Low) across multiple categories:

High Priority Requirements:
- Suitable exchange format for conversion between avatar representation formats
- Mesh-based format for representation and animation
- Signal coding format
- Semantic and signal representation
- Multiple levels of detail for geometry
- Facial and body animation
- Delay-sensitive animation streams
- Partial transport of base avatar
- Various storage and transport capabilities

Medium Priority Requirements:
- DRM protection support
- Integration into scene description
- Avatar authenticity and user association protection

Low Priority Requirements:
- Avatar-avatar, user-avatar, avatar-scene interactions
- Storage and replay of animation streams

ARF Data Model and Structure

Core Components

The ARF data model (Figure 12) includes the following components:

Preamble Section:
- Signature string for unique document identification
- Version string tied to specific ARF revision
- Optional authenticationFeatures (encrypted facial and voice feature vectors with public key URI)
- supportedAnimations object specifying compatible animation frameworks (facial, body, hand, landmark, texture)
- Optional proprietaryAnimations for vendor-specific schemes

Metadata Object:
- Avatar-level descriptive information (name, unique identifier, age, gender)
- Used for experience adaptation and policy/access control

Components Section:
- Skeleton: Defines joints with inverse bind matrices, optional animationInfo
- Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4×4 matrix transformations
- Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights
- Mesh: Geometric primitives with name, ID, optional path, geometry data items
- BlendshapeSets: Shape targets for base mesh with optional animationInfo
- LandmarkSets: Vertex/face indices with barycentric weights for tracked landmarks
- TextureSets: Material resources with texture targets and animation links

Container Formats

Two container formats are supported:

  1. ISOBMFF containers (ISO/IEC 14496-12):
  2. ARF document in ISOBMFF item in top-level MetaBox
  3. Additional items for each component
  4. May include animation track with time-based samples

  5. Zip-based containers (ISO/IEC 21320-1):

  6. Top-level ARF document
  7. Component files referenced relative to document location

Both formats support partial access to avatar components.

Integration with MPEG Scene Description

Scene Description Integration

  • ARF designed to work with MPEG Scene Description (ISO/IEC 23090-14) based on glTF
  • Not limited to MPEG Scene Description; can integrate with any scene description solution
  • ISO/IEC 23090-14 defines MPEG_node_avatar extension
  • ISO/IEC 23090-39 extends MPEG_node_avatar for better ARF integration

Reference Client Architecture

The reference architecture (Figure 13) includes:

  • Avatar Pipeline: Part of Media Access Function (MAF)
  • Retrieves avatar model and associated information
  • Fetches ARF container and animation streams
  • Animates and reconstructs avatar
  • Provides reconstructed avatar to Presentation Engine through buffers containing 3D mesh components

Animation Bitstream Format

Avatar Animation Units (AAUs)

The animation stream format uses AAUs as the fundamental structure (Figure 14):

AAU Structure:
- Header:
- AAU type (7-bit code)
- AAU payload length (bytes)
- Payload:
- 32-bit timestamp in "ticks"
- Type-specific data
- Optional padding for byte alignment

AAU Types:
- AAU_CONFIG: Configuration unit
- AAU_BLENDSHAPE: Facial animation sample
- AAU_JOINT: Body/hand joint animation sample
- AAU_LANDMARK: Landmark animation sample
- AAU_TEXTURE: Texture animation sample

Configuration Units

Configuration AAUs communicate stream-level parameters:
- Animation profile string (UTF-8 encoded)
- Timescale value (32-bit float, ticks per second)

Facial Animation Samples (AAU_BLENDSHAPE)

Structure includes:
- Target blendshape set identifier
- Per-blendshape confidence flag
- Number of blendshape entries
- For each entry: blendshape index, weight (32-bit float), optional confidence (32-bit float)

Deformation Formula:

v = v₀ + Σₖ wₖ · Δvₖ

Where:
- v₀: base vertex position
- Δvₖ: offset for blendshape k
- wₖ: transmitted blendshape weight

Joint Animation Samples (AAU_JOINT)

Structure includes:
- Target joint set identifier
- Per-joint velocity flag
- Number of joint entries
- For each entry: joint index, 4×4 transformation matrix (16 floats), optional 4×4 velocity matrix

Linear Blend Skinning (LBS) Formula:

vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰

Where:
- wᵢⱼ: weight of joint j on vertex i
- Mⱼ: global transformation matrix for joint j
- vᵢ⁰: rest position of vertex i

Landmark Animation Samples (AAU_LANDMARK)

Structure includes:
- Landmark set ID
- Velocity and confidence flags
- Dimensionality flag (2D vs. 3D)
- Number of landmarks
- For each landmark: index, coordinates (2D or 3D), optional velocity and confidence

Use cases: facial tracking overlays, sensor-mesh registration, animation data calibration

Texture Animation Samples (AAU_TEXTURE)

Structure analogous to blendshape samples but applied to texture targets:
- Controls parametric texture effects (micro-geometry patterns, makeup, dynamic material variations)

Animation Stream Delivery

Dual delivery modes:
1. Live transmission: Sequences of AAUs for real-time avatar driving
2. Stored format: Avatar animation tracks in ISOBMFF-based ARF container with sample grouping for pre-recorded sequences ("smile," "wave," "dance")

Ongoing Exploration Experiments

The group continues exploration on:

  1. Compression for Animation Streams: Methods for compressing facial and body animation streams
  2. Integrating Geometry Data Components: Specifying integration of avatar data into interoperable container format
  3. Animation Sample Formats: Developing structures for various animation data types (blend shapes, facial landmarks, animation controllers, joint transforms)
  4. Content Discovery and Partial Access: Solutions for content discovery and partial access
  5. Animation Controllers: Study on combining blend shape and joint animation
Document Information
Source:
InterDigital New York
Type:
discussion
For:
Agreement
Original Document:
View on 3GPP
Title: Avatar-udpate to section 6.3.4
Agenda item: 9.8
Agenda item description: FS_Avatar_Ph2_MED (Study on Avatar communication Phase 2)
Doc type: discussion
For action: Agreement
Release: Rel-20
Specification: 26.813
Spec: 26.813
Contact: Gaelle Martin-Cocher
Uploaded: 2026-02-03T21:28:19.900000
Contact ID: 91571
Revised to: S4-260285
TDoc Status: revised
Reservation date: 03/02/2026 21:15:23
Agenda item sort order: 43