TDoc: S4-260120 | PDF

Qualcomm Atheros, Inc.

Title

[FS_Avatar_Ph2_MED] 3D Gaussian Splatting Avatar Methods for Real-Time Communication

3D Gaussian Splatting Avatar Methods for Real-Time Communication

Introduction

This contribution surveys 3D Gaussian Splatting (3DGS) methods for avatar representation in the context of the Avatar Communication Phase 2 study (FS_Avatar_Ph2_MED, SP-251663), specifically addressing Objective 3 on animation techniques for avatar reconstruction and rendering. The document evaluates 3DGS methods for real-time communication scenarios and their compatibility with MPEG Avatar Representation Format (ARF, ISO/IEC 23090-39).

Key Technical Background: - 3DGS represents objects as sets of anisotropic 3D Gaussians (splats) - Each Gaussian stores: 3D mean position, oriented covariance (ellipsoidal footprint), opacity, and appearance parameters (RGB or spherical harmonics coefficients) - Rendering projects 3D Gaussians into screen space as 2D Gaussians with depth-ordered alpha compositing - Achieves real-time rendering at 100-370 FPS on desktop GPUs with quality comparable to neural radiance fields - Maps well to GPU compute and graphics pipelines

Critical Question for Avatar Communication: How Gaussians deform under animation - either by binding to parametric meshes (FLAME for faces, SMPL/SMPL-X for bodies) or using small neural networks for residual motion prediction.

Survey of 3DGS Avatar Methods

Head and Face Avatar Methods

Key Differentiation Axes

Head/face methods differ along three practical dimensions:

Binding domain: mesh surface anchors, UV space anchors, or volumetric anchors
Runtime neural inference: whether MLPs are required at runtime
Non-FLAME component handling: hair, teeth, tongue, and eye occlusion

Most interoperable approaches: Fully explicit runtime with Gaussians driven from same blendshape and skeletal parameters as mesh renderers.

Method Comparison

| Method | FPS | Gaussians | Parametric Model | Runtime MLP | Key Feature | |--------|-----|-----------|------------------|-------------|-------------| | GaussianBlendshape | 370 | 70K | Custom blendshapes | No | Linear blending identical to mesh blendshapes, 32-39 dB PSNR | | SplattingAvatar | 300+ | ~100K | FLAME mesh | No | Mesh-embedded via barycentric coords, 30 FPS on iPhone 13 | | FlashAvatar | 300 | 10-50K | FLAME | Small MLP | UV-based init on FLAME, small MLPs for expression offsets | | GaussianAvatars | 90-100 | ~100K | FLAME | No | FLAME-rigged, multi-view training, explicit binding | | HHAvatar | ~100 | ~150K | FLAME | Temporal modules | First method for dynamic hair physics modeling | | MeGA | ~90 | ~200K | FLAME (face) + 3DGS (hair) | No | Hybrid mesh+Gaussian, occlusion-aware blending, editable |

Standout Methods for Real-Time Communication: GaussianBlendshape and SplattingAvatar use purely explicit representations with no runtime neural networks, enabling deterministic rendering and direct ARF compatibility.

Mesh-Embedded Gaussian Splatting

Technical Approach: - Each Gaussian anchored to animatable mesh supporting standard blendshapes and skeletal skinning - Parameterization: triangle index + barycentric coordinates + optional offset vector in local tangent-normal frame - Runtime: receiver deforms mesh using joint transforms and blendshape weights, then reconstructs Gaussian center from animated triangle vertices using barycentric weights - No per-frame neural inference required - purely algebraic reconstruction ensures deterministic motion

Orientation and Footprint Handling: - Gaussians stored in local frame aligned to triangle (per-axis scales + local rotation) - Local-to-world transform from animated triangle frame transports covariance - Keeps projected splat stable under motion, avoids jitter - Appearance parameters (opacity, color coefficients) remain static unless dynamic effects explicitly modeled

Standardization Advantages: - Reuses same animation signals as mesh avatar - Enables graceful fallback: mesh-only renderers can ignore Gaussian extension and still animate - 3DGS-capable renderers can render Gaussians alone or hybrid mesh+Gaussians composition

Limitation: Coarse driving mesh can restrict fine-scale effects (lip roll, eyelid thickness, hair motion). Addressed by higher resolution parametric meshes, local offsets, or dedicated Gaussian subsets for non-mesh components.

Gaussian Blendshapes

Technical Approach: - Mirrors classical mesh blendshape animation - Each Gaussian has neutral parameters + per-expression deltas for center position, scale, opacity - Runtime computes linear combination identical to mesh blendshape pipeline - Key advantage: Determinism and ARF-friendly control - same blendshape weight stream drives both mesh vertices and Gaussian deltas

Hybrid Methods with Small MLPs

Technical Approach: - Parametric model for global control + small neural modules outputting residual offsets conditioned on expression, pose, or time - Improves fine detail and handles effects difficult to capture with purely linear blendshapes

Tradeoff: Runtime inference and model distribution become part of interoperability (model versioning, determinism, platform-specific performance)

Full-Body Avatar Methods

Full-body methods have converged on SMPL/SMPL-X parametric body models, enabling compatibility with standard skeletal animation systems.

Method Comparison

| Method | FPS | Gaussians | Body Model | Training | Key Feature | |--------|-----|-----------|------------|----------|-------------| | GauHuman | 189 | ~13K | SMPL | 1-2 min | Fastest training, ~3.5 MB storage, KL divergence split/clone | | HUGS | 60 | ~200K | SMPL | 30 min | Disentangles human/scene | | ASH | ~60 | ~100K | SMPL | ~1 hour | 2D texture-space parameterization, Dual Quaternion skinning, motion retargeting | | GART | >150 | ~50K | SMPL | sec-min | Latent bones for non-rigid deformations (dresses, loose clothing) | | ExAvatar | ~60 | ~150K | SMPL-X | ~2 hours | Only SMPL-X method with unified body/face/hand animation |

Standout Methods: - GauHuman: Best combination of minimal storage (~3.5 MB) and fast training (1-2 min) - ExAvatar: Only method providing unified body/face/hand animation through SMPL-X - critical for immersive communication

Animation Architecture: - Body model provides compact, standardized animation interface - Base avatar: static set of Gaussians + binding metadata - Runtime: joint transforms from SMPL/SMPL-X pose parameters deform body via skinning - Gaussian propagation: surface anchoring (barycentric/UV coordinates) or direct skinning weights per Gaussian - Enables motion retargeting by sending only pose stream while keeping high-fidelity Gaussian appearance fixed

Non-Rigid Effects Challenge: - Clothing, long hair, accessories don't follow body surface with rigid skinning - Solutions: latent bones or local deformation modules (additional control points beyond SMPL skeleton) - ARF integration consideration: Distinguish between body-locked Gaussians (fully driven by standardized skeleton) and secondary Gaussians (may require optional control signals or local simulation)

Distribution Size Considerations: - Full-body avatars require tens to hundreds of thousands of Gaussians - Each Gaussian includes geometry and appearance attributes - Compression and level-of-detail essential for real deployments - Practical ARF profile should specify default Gaussian count budget and allow progressive refinement layers for high-end devices

Animation Compatibility Classification

Methods classified into three categories based on runtime architecture:

1. Purely Explicit (no MLPs)

Methods: SplattingAvatar, GaussianBlendshape, GaussianAvatars - Performance: 300-370 FPS - ARF Compatibility: Direct mapping - Animation: Driven entirely by standard skeletal joints and blendshape weights - Fully compatible with ARF Animation Stream Format

2. Hybrid (small MLPs)

Methods: 3DGS-Avatar, FlashAvatar, HUGS - Performance: 50-100 FPS (near-real-time) - Architecture: Small MLPs add expression-dependent offsets without fundamentally changing animation interface - ARF Integration: Can still be driven by blendshape parameters with MLP weights distributed as part of base avatar

3. Fully Neural

Methods: Gaussian Head Avatar, GaussianHead - Training: 1-2 days - Latency: Higher - ARF Integration: May be integrated into ARF containers as proprietary customized models

Interoperability Key Question: Not whether MLP exists, but whether animation interface remains the same. If driven solely by joints and blendshape weights, ARF Animation Stream Format remains sufficient and decoder only needs renderer choice.

Determinism Considerations: - Explicit methods: Naturally deterministic given fixed floating-point rules, no platform-specific neural inference dependency - Hybrid methods: Viable if MLP is small and shipped as part of base avatar, but conformance should define fixed operator sets and numerical tolerances - Fully neural pipelines: Better treated as optional proprietary components inside ARF container rather than baseline interoperable tool

Proposed Architecture for ARF Integration

Four-Step Integration Approach

Step 1: Storage - Store mesh-embedded Gaussians as auxiliary data within glTF/ARF containers - Parameterization: relative to mesh surface using barycentric coordinates (SplattingAvatar) or linear blendshape offsets (GaussianBlendshape) - Preserves backward compatibility with mesh-only renderers

Step 2: Animation - Animate via standard skeletal and blendshape parameters already defined in ARF Animation Stream Format - No changes to animation stream required - Gaussian positions derived from same joint transforms and blendshape weights used for mesh animation

Step 3: Compression - Apply GS compression for Gaussian attributes within base avatar to minimize distribution size

Step 4: Streaming - Stream only AAUs at approximately 40 KB/s for real-time animation - Base avatar (including compressed Gaussian data) distributed once at session establishment - Enables high-quality Gaussian splatting rendering on capable devices while maintaining mesh-based rendering compatibility on constrained devices

Deployment Requirements

Capability Exchange: - Endpoints signal support for 3DGS rendering - Supported attribute sets - Supported Gaussian count budgets - Fallback to mesh rendering if 3DGS not supported or resources constrained - Avoids ecosystem fragmentation and maintains backward compatibility

Proposals

The document proposes that SA4 considers the following for FS_Avatar_Ph2_MED study:

Acknowledge 3D Gaussian Splatting as a viable rendering primitive for avatar communication
Coordinate with MPEG on integration of Gaussian splatting data within ARF Base Avatar Format (ISO/IEC 23090-39)
Evaluate compression techniques (SPZ, L-GSC, HAC++, Compact3D) for inclusion in study of static and animation data compression (Objective 7)
Define capability signaling and conformance points for 3DGS avatar rendering:
Supported Gaussian count budgets
Supported attribute sets
Required numerical tolerances for determinism
Study hybrid approaches with small MLPs - whether they warrant optional ARF profile, and if so, constrain operator sets and model sizes to preserve portability

View PDF View Metadata

TDoc: S4-260121 | PDF

Qualcomm Atheros, Inc.

Title

[FS_Avatar_Ph2_MED] Avatar Evaluation Framework and Objective Metrics

Summary of S4-260121: Avatar Evaluation Framework and Objective Metrics

Introduction

This contribution addresses Objectives 2 and 3 of the Avatar Communication Phase 2 SID (SP-251663), which concern QoE metrics, evaluation frameworks, and evaluation criteria for animation techniques. The document proposes a practical evaluation methodology designed to deliver repeatable, automated, and vendor-neutral results based on a core principle: evaluate what the user actually sees by measuring quality from rendered video output rather than internal system parameters.

Evaluation Framework

Design Principles

The framework is built on four key principles:

Black-box evaluation: Metrics computed from rendered output video, not internal system states, ensuring cross-vendor comparability
Reproducibility: Fixed test content, deterministic rendering conditions, and standardized capture workflows for consistent results
Automation: All metrics computable without human intervention for large-scale testing
(Note: Only three principles explicitly detailed despite mentioning four)

Testbed Architecture

The proposed testbed comprises five key components:

Stimulus player: Feeds avatar system with animation streams (blendshape weights, landmarks, joint poses)
Render configuration: Locks camera intrinsics, lighting, background, and resolution to eliminate variability
Capture module: Records rendered frames using lossless/visually lossless compression with frame-accurate timestamps
Network emulator: Applies controlled latency, jitter, bandwidth limits, and packet loss for transport testing
Metrics engine: Computes frame-level and clip-level objective metrics from captured assets

Objective Metrics for Avatar Evaluation

The contribution proposes metrics across three quality dimensions:

Visual Quality Metrics

PSNR (dB): Peak signal-to-noise ratio between reference and test frames
SSIM (0-1): Structural similarity index

Animation Quality Metrics

Video-based computation extracting landmarks and skeletons from rendered output:

Lip Vertex Error (LVE) (pixels/mm): RMS error of mouth landmarks; critical for lip sync evaluation
Facial Distance Deviation (FDD) (pixels/mm): Deviation of expression-related landmark distances; measures facial expression accuracy
Motion Vertex Error (MVE) (pixels/mm): RMS error of body joint positions; evaluates full-body animation fidelity

Temporal and Synchronization Metrics

Proposed for second phase evaluation due to complexity:

Rendering Frame Rate (FPS): Computed from frame timestamp deltas
Dropped Frame Ratio (%): Percentage of missing or repeated frame indices
Motion-to-Photon Latency (ms): Time from input motion event to visible response
End-to-End Latency (ms): Total delay from sender capture to receiver presentation
Audio-Visual Sync Offset (ms): Offset between mouth motion and corresponding audio via cross-correlation

Test Content

Standardized animation streams should cover:

Neutral speech: Clear visemes and steady head motion for baseline lip sync
Expressive speech: Emotions (happiness, surprise, concern) for facial expression testing
Conversational turn-taking: Gaze shifts, nods, backchannel gestures
Non-verbal body motion: Pointing, waving, posture changes, idle animation

Each test set should contain reference audio, reference animation streams, and reference rendered video from both high-quality reference pipeline and source capture.

Proposals

The contribution proposes to:

Adopt objective evaluation approach based on rendered video output as primary evaluation method for reproducibility
Include the proposed metric set (visual quality, animation fidelity, temporal performance) in TR 26.813
Define normative capture workflow using lossless recording, timecode embedding, and reference alignment for consistent metric computation across implementations

View PDF View Metadata

TDoc: S4-260177 | PDF

Qualcomm Atheros, Inc.

Title

[FS_Avatar_Ph2_MED] Interoperability guidance for ARF

Interoperability Guidance for ARF

Introduction

This contribution addresses the FFS noted in TS 26.264 clause 5.6.1 regarding evaluation of MPEG ARF and interoperability aspects. The key interoperability challenge is mapping: receivers can only animate an avatar if they can correctly map incoming animation parameters to the appropriate Skeleton, BlendshapeSet, and LandmarkSet in the ARF container.

ISO/IEC 23090-39 defines signalling to declare supported animation frameworks and provide mapping tables. This contribution proposes concrete interoperability guidance with detailed examples for both linear and non-linear mappings.

Interoperability Framework

Interoperability Principles

The proposed guidance is based on four core principles:

Single source of truth: The ARF document is the normative description for interpreting an animation stream for a given avatar
Explicit profile identification: Each animation stream identifies its animation profile, and the ARF document lists supported profile URNs
Deterministic mapping: When the stream profile doesn't directly match stored assets, the ARF document provides mapping tables from the stream profile to the target asset set
Sender responsibility: The sender (who owns the ARF container) ensures either direct matching identifiers are used or mappings are present. The receiver is not expected to guess.

Mapping Signalling in ARF

ARF provides three signalling layers for mapping between animation frameworks:

SupportedAnimations: Lists supported face, body, hand, landmark, and texture animation profiles as URNs. Each URN identifies a framework and specific parameter set (e.g., blendshape set or joint set).

AnimationInfo and AnimationLink: Each animatable asset in components (Skeleton, BlendshapeSet, LandmarkSet) includes animationInfo. Each AnimationLink points to one SupportedAnimations entry as the target for that asset.

Mapping Objects: When additional frameworks are used for capture or streaming, animationInfo can include Mapping objects that map from a source SupportedAnimations entry to the target entry. Two mapping types are supported: - LinearAssociation: Expresses a weighted sum from multiple source parameters to one target parameter - NonLinearAssociation: Expresses non-linear transforms using one or more channels with lookup tables and interpolation

Mapping indices refer to parameter identifiers in the animation stream (ShapeKey.id for blendshapes, target joint index for joint animation, target landmark index for landmark animation).

Receiver Processing Procedure

The receiver applies the following procedure:

Parse preamble.supportedAnimations and build an index-to-URN map for each animation type
Determine the animation profile used by the received stream and find its index in the corresponding SupportedAnimations list
Select the target Skeleton, BlendshapeSet, or LandmarkSet to animate and find the matching AnimationLink entry
If the stream profile index equals AnimationLink.target, apply stream parameters directly to target assets
Otherwise, find a Mapping entry where Mapping.source equals the stream profile index, then apply the LinearAssociation or NonLinearAssociation rules to compute target parameters
For any target indices not produced by mapping, use neutral defaults (0.0 for blendshape weights, bind pose for joints, neutral position for landmarks)

Mapping Mechanisms

Direct Match and Identifier Spaces

The simplest case occurs when the sender generates the animation stream using the same framework and parameter set as the target asset in ARF.

| Scenario | Typical Issue | ARF Signalling and Behaviour | |----------|---------------|------------------------------| | Direct match | Stream profile and parameter identifiers match target assets in ARF container | No mapping needed. Receiver applies parameters directly. ARF document declares profile in SupportedAnimations and links target assets with AnimationLink.target | | Subset | Source and target use same semantics but target has fewer parameters | Unmapped target parameters default to neutral values |

Linear Mappings

Linear mappings are suitable when a target parameter can be expressed as a weighted sum of one or more source parameters. Typical use cases include mirroring left/right shapes, splitting/merging parameters, and simple scaling. Represented in ARF by LinearAssociation with targetIndex, sourceIndices, and weights.

Examples:

| Target Parameter (ARF) | Source Parameters (Stream) | Linear Association | |------------------------|---------------------------|-------------------| | Smile (targetIndex 12) | mouthSmileLeft (5), mouthSmileRight (6) | w12 = 0.5w5 + 0.5w6 | | JawOpen (targetIndex 3) | jawOpen (13) | w3 = 1.0w13 | | MouthCornerPull (targetIndex 20) | mouthSmileLeft (5), mouthSmileRight (6), cheekSquintLeft (26), cheekSquintRight (27) | w20 = 0.4w5 + 0.4w6 + 0.1w26 + 0.1*w27 |

Non-linear Mappings

Non-linear mappings are needed when linear blending is insufficient. Typical cases include dead zones, saturation, perceptual calibration curves, and gating where one parameter modulates another. Represented in ARF by NonLinearAssociation. Each channel maps one source parameter through a lookup table defined by Data items. Channel outputs are combined using COMBINATION_SUM or COMBINATION_MUL.

Examples:

| Target Parameter (ARF) | Source Parameter(s) | Non-linear Mapping | |------------------------|---------------------|-------------------| | JawOpen (targetIndex 3) | jawOpen (13) | Piecewise curve with deadzone and saturation. Example input [0.0,0.1,0.4,1.0] maps to output [0.0,0.0,0.7,1.0] with INTERPOLATION_LINEAR | | Blink (targetIndex 7) | eyeBlinkLeft (1), eyeBlinkRight (2) | Each eye uses threshold curve. INTERPOLATION_STEP to convert soft signal into binary blink. Combine with COMBINATION_SUM and clamp to [0,1] | | MouthOpenSmile (targetIndex 30) | jawOpen (13) and Smile (12 after linear mapping) | Use COMBINATION_MUL to gate smile by jaw opening. Channel 1 maps jawOpen through deadzone curve. Channel 2 maps smile through S curve. Multiply channel outputs | | BrowRaise (targetIndex 15) | browInnerUp (9) | Gamma curve to better match target rig. Example output = pow(input, 0.5). Approximated with LUT and INTERPOLATION_CUBICSPLINE | | Landmark mouthMidTop (targetIndex 18) | landmarks 50 and 52 | Non-linear only if needed for stabilization or bias compensation. Example: apply LUT to compress extreme motion before writing 2D or 3D coordinate |

Proposal

The contribution proposes:

Document the content of sections 2 and 3 in TR 26.813
Add explicit text to TS 26.264 clause 5.6.1 stating that the avatar owner shall ensure identifiers used in avatar animation streams either:
Directly match the identifiers of target Skeleton, BlendshapeSet, and LandmarkSet stored in ARF, OR
Mapping tables are present in the ARF document to convert from the stream profile to the target assets
Remove the corresponding note from TS 26.264 and declare it as resolved

View PDF View Metadata

TDoc: S4-260188 | PDF

Nokia

Title

[FS_Avatar_Ph2_MED] Draft LS on MPEG I ARF compression aspects

3GPP SA4 LS on Compression Aspects of MPEG-I ARF (ISO/IEC DIS 23090-39)

Document Overview

This is a Liaison Statement (LS) from 3GPP TSG SA WG4 to ISO/IEC JTC1/SC29/WG7 and WG3 regarding compression aspects of avatar representation formats for Release 20 work on avatar communication Phase 2.

Background Context

Release 19 Baseline

3GPP SA4 has adopted ISO/IEC DIS 23090-39 (MPEG-I: Avatar Representation Formats) as the representation format for user avatars in avatar communication over IMS
This adoption is specified in TS 26.264 for Release 19
Limitation identified: Compression aspects of avatar data were not addressed in Release 19 because ISO/IEC DIS 23090-39 does not yet specify compression mechanisms

Release 20 Phase 2 Study

SA4 has initiated a Phase 2 study on avatar communication in Release 20
Key objective: Compression of static avatar data and animation data
Study conclusion timeline: SA4#137e meeting (August 24-28, 2026)
Normative work completion: March 2027

Technical Questions to ISO/IEC

SA4 is seeking clarification on two critical aspects:

Question 1: Existing MPEG Compression Technologies

Are there existing MPEG technologies that can be utilized to compress: - Avatar static data (especially meshes) - Avatar animation data including: - Blend shape sets - Skeletal animation - Other animation-related information

Question 2: Integration Timeline

If such compression technologies exist: - Are there plans to integrate them into ISO/IEC DIS 23090-39? - What is the anticipated timeline for such integration in the context of 3GPP Release 20 schedule?

Requested Action

SA4 formally requests ISO/IEC SC29/WG7 and ISO/IEC SC29/WG3 to provide answers to both questions above, considering the Release 20 timeline constraints.

View PDF View Metadata

TDoc: S4-260190 | PDF

Nokia

Title

[FS_Avatar_Ph2_MED] Considerations on security aspects

Summary of S4-260190: Considerations on Security Aspects for Avatar Phase 2

Document Overview

This contribution from Nokia addresses security-related gaps in the Rel-20 study item FS_Avatar_Ph2_MED, specifically focusing on security mechanisms for Avatar communications in 3GPP systems.

Background and Context

Study Item Scope

The Rel-20 SID FS_Avatar_Ph2_MED (approved at SA#110, December 2025) aims to address gaps from previous work and resolve open points identified in TS 26.264 Rel-19. Objective 6 specifically mandates collaboration with SA3 to study security implications including: - Identification and authentication (including schemes for Avatar-related APIs) - Privacy preservation - Content protection (e.g., watermarking and DRM) - Secure distribution mechanisms for Avatar data

Current Status in Specifications

TS 26.264 Gaps: - No dedicated security clause exists - Clause 5.6.2.2 NOTE 2 identifies content protection aspects as FFS

TR 26.813 Coverage: - Clause 8 describes Access Protection mechanisms for BAR API - Clause 9 addresses security and privacy aspects - However, no exploration of how these methods apply to Avatar calls in 3GPP systems - Conclusion acknowledges need for robust authentication, encryption, and DRM mechanisms with further SA3 collaboration

TS 33.328 Limitations: - New Annex R (Rel-19) specifies security for IMS avatar communication - Covers procedures to prevent UE from providing unauthorized Avatar IDs - Covers authorization for avatar downloads from BAR - Does not cover security controls to prevent sending UE from using fake avatar representations not belonging to the user

Key Observations

The Rapporteur's base CR (S4aV260006, presented at 27 January 2026 SA4 Video SWG telco) does not address Objective 6
Security aspects remain critical for avatar communications and require dedicated study
Collaboration with SA3 is necessary for appropriate solutions

Technical Proposal

The contribution proposes adding a new sub-clause (suggested as 8.3.4) to the base CR for TR 26.813, specifically under Clause 8 (Avatar integration into 3GPP services and enablers). This new sub-clause should:

Be dedicated to addressing security aspects
Focus primarily on security mechanisms and solutions for Avatar calls via the generalized IMS DC architecture
Cover authentication, encryption, and content protection mechanisms

View PDF View Metadata

TDoc: S4-260192 | PDF

Nokia

Title

[FS_Avatar_Ph2_MED] Authentication for avatar data

Summary of S4-260192: Authentication for Avatar Data

Document Overview

This contribution from Nokia proposes authentication mechanisms for avatar data in IMS-based avatar calls as part of the FS_Avatar_Ph2_MED study item (Rel-20). The document addresses security gaps identified in Rel-19 TS 26.264, specifically focusing on authentication schemes for avatar-related APIs.

Background and Motivation

The Rel-20 SID FS_Avatar_Ph2_MED (approved at SA#110, Dec 2025) includes an objective to study security implications in collaboration with SA3, covering: - Identification and authentication (including schemes for Avatar related APIs) - Privacy preservation - Content protection (watermarking and DRM) - Secure distribution mechanisms for Avatar data

Currently, TR 26.813 and TS 33.328 do not address these security aspects.

Main Technical Contributions

Proposed Security Framework for IMS-based Avatar Calls

The contribution proposes adding a new sub-clause 8.3.4 covering security considerations for IMS-based avatar calls.

Authentication Mechanism

Core Concept: - Introduces a Digital Credential-based solution using Base Avatar Assertion (BAA) - BAA cryptographically binds the Base Avatar Representation to the avatar owner - Ensures that a base avatar represents the actual user of the avatar

Architecture Components:

Authenticator:
Deployed on UEs
Receives and securely stores BAA from issuer
Verifies BAA for avatar authentication
Proves possession of private key corresponding to public key in BAA
Issuer:
May be part of operator network operating the IMS for avatar calls, or a trusted external entity
Verifies that a Base Avatar represents its owner
Provides Digital Credential (BAA) to avatar owner's UE
Verifies authenticator possesses the private/public key pair
Provides authorization functions and provisioning server

Base Avatar Assertion (BAA) Structure: - Digital Credential proving a Base Avatar represents a user owning a specific private/public key pair - Generic structure shown in Figure Y (referenced but not detailed in text)

Operational Procedures

BAA Issuance Procedure (Steps 1-7):

Authorization Request: Application on UE1 generates authorization request for selected avatar representation and sends to Issuer
User Verification: Issuer verifies request and authorizes user (UE1) if verification succeeds
Authorization Response: Issuer sends response to UE1 indicating successful authorization, including Issuer ID
Key Pair Generation: Authenticator on UE1 creates public/private key pair associated with selected avatar presentation
Enrolment Request: Authenticator sends enrolment request to Issuer (including public key)
BAA Creation: Issuer verifies enrolment request (user-avatar match, public key validity), creates BAA with signature
BAA Delivery: Issuer sends enrolment response to UE1 including BAA; Authenticator stores BAA for future authentication

Avatar Authentication Procedure (Step 8):

When UE1 initiates avatar call with UE2 using selected avatar
UE2 obtains BAA from UE1 or Issuer
Application/authenticator on UE2 verifies BAA validity (e.g., signature verification to confirm trusted issuer)
Authentication typically occurs at:
Beginning of avatar calls
Session resumption or restart

Implementation Example

Figure Z provides an example implementation of authenticator and issuer in the current system architecture (specific details not provided in text).

Proposal

The contribution proposes to add the above content as a base CR to address authentication requirements for avatar data in IMS-based avatar calls.

View PDF View Metadata

TDoc: S4-260226 | PDF

InterDigital Pennsylvania

Title

[FS_Avatar_Ph2_MED] Media Configuration for Avatar Calls

Summary of S4-260226: Media Configuration for Avatar Calls

Introduction

This discussion paper addresses media configuration requirements for AR-MTSI clients supporting Avatar communication within the context of the Study on Avatar communication Phase 2. The Phase 2 study focuses on enabling additional Avatar use cases and enhancing Avatar-based RTC services with emphasis on quality of experience and advanced animation features for photo-realistic and immersive user experiences.

Background on Existing AR-MTSI Media Configuration

Current AR Support Parameters (TS 26.264 Clause 7)

The document reviews existing media configuration requirements defined in TS 26.264 for AR-MTSI clients:

+sip.3gpp-ar-support parameter in SIP REGISTER Contact header with two values:
"ar-capable": Terminal fully capable of receiving and rendering AR media
"ar-assisted": Terminal capable of transmitting AR metadata on uplink but requires network rendering support (no support for processing/rendering 3D scenes)

Current Avatar Support Parameters (TS 26.264 Clause 7.3.1)

+sip.3gpp-avatar-support parameter in SIP REGISTER Contact header with two values:
"avatar-capable": Terminal fully capable of receiving, animating and rendering avatars
"avatar-assisted": Terminal requires network support for avatar animation and rendering

Network-Assisted Avatar Rendering (TS 26.264 Clause 7.3.2)

When network animation and rendering is requested, an AR AS shall: - Allocate an MF capable of real-time avatar rendering - Configure the MF with appropriate rendering parameters based on receiving UE's video capabilities - Modify SDP to route avatar animation data to the MF instead of receiving UE, inserting the MF into the media path

Identified Gap

The document identifies a critical gap: TS 26.264 has not yet documented the media configuration details for an AR-MTSI client in terminal that intends to participate in an avatar call.

When media configuration details were proposed in S4-251845 at SA4-134 Dallas meeting, feedback indicated that the behavior of IMS network elements is unspecified in the IMS architecture when an MTSI client sends the new Contact header field parameters "+sip.3gpp-ar-support" and/or "+sip.3gpp-avatar-support" in a SIP REGISTER message.

Proposal

The document proposes to send a Liaison Statement to SA2 requesting: - Definition of IMS network behavior when an AR-MTSI client registers with Contact header field "+sip.3gpp-ar-support" and/or "+sip.3gpp-avatar-support" - Specification of how to provide a suitable MF capable of providing AR rendering and/or avatar rendering support in an IMS session

A draft LS is provided in companion document S4-260227.

View PDF View Metadata

TDoc: S4-260227 | PDF

InterDigital Pennsylvania

Title

[FS_Avatar_Ph2_MED] LS on IMS network behaviour for new Contact header parameters

LS on IMS Network Behaviour for New Contact Header Parameters

Document Information

Source: SA4
Target: SA2
Release: Rel-19/Rel-20
Work Items: AvCall-MED, FS_Avatar_Ph2_MED
Meeting: SA4 Meeting #135, Goa, India, February 9-13, 2026

Overall Description

Context and Background

SA4 has introduced new media configuration requirements in TS 26.264 for Augmented Reality (AR) and avatar-based MTSI clients. These developments have architectural implications for IMS network behavior that require SA2's attention and clarification.

AR-MTSI Client Requirements (Clause 7 of TS 26.264)

New Contact Header Parameter: +sip.3gpp-ar-support

Indicates the level of AR capability during SIP registration
Signals that the terminal requires network-based rendering support for AR call participation
AR-MTSI terminals must register with appropriate parameter values depending on whether terminal-based or network-based rendering is used

Avatar Support Requirements (Clause 7.3 of TS 26.264)

New Contact Header Parameter: +sip.3gpp-avatar-support

Values:
"avatar-capable": Terminal can animate and render avatars locally
"avatar-assisted": Network support required for avatar animation and rendering

Network-Based Avatar Rendering (Clause 7.3.2): - When network-based avatar animation/rendering is requested, an AR Application Server shall: - Allocate a Media Function (MF) capable of real-time avatar rendering - Configure the MF based on receiving UE's video capabilities - Modify SDP to insert the MF into the media path

Identified Architectural Gap

SA4 has identified that IMS architecture specifications do not currently define the behavior of IMS network elements when MTSI clients include these new Contact header field parameters (+sip.3gpp-ar-support and/or +sip.3gpp-avatar-support) in SIP REGISTER messages.

Question to SA2

Is there any architectural guidance on: - Whether or how IMS entities should interpret these parameters? - How suitable Media Functions providing AR rendering and/or avatar rendering support should be selected and invoked?

Request to SA2

If guidance is not currently available, SA4 requests SA2 to consider studying, in the IMS architecture specifications, the expected behavior of IMS network elements when an AR-MTSI client registers its capabilities via Contact header field parameters.

This includes (but is not limited to): - Mechanisms for recognizing terminal capabilities during registration - Enabling the provisioning or insertion of appropriate Media Functions to support: - AR rendering - Avatar rendering - Future similar services within an IMS session

Rationale

SA4 believes such clarification would: - Ensure architectural consistency - Facilitate interoperable deployment of AR and avatar-based services in IMS

Action Requested

SA4 kindly asks SA2 to review the above information and provide guidance on the way forward.

View PDF View Metadata

TDoc: S4-260251 | PDF

InterDigital New York

Title

Avatar-udpate to section 6.3.4

Technical Summary: AVATAR - Update to Section 6.3.4

Overview

This document provides a comprehensive update to section 6.3.4 concerning the MPEG Avatar Representation Format (ARF), now standardized as ISO/IEC 23090-39. The document reflects the progression of the standard from its initial development phase to reaching Committee Draft International Standard (CDIS) stage.

MPEG Avatar Representation Format Development

Scope and Objectives

The MPEG WG03 (Systems) workgroup is developing a new standard for avatar representation format with the following scope:

Develop an interchange representation format for computer-generated avatars and associated containers
Define an animation stream format to represent avatar dynamics and time-based information
Include geometrical models and all associated data (blendshapes, skeleton, normals, textures, maps, metadata)
Provide a streamable format for dynamics (animation parameters, tracking information, contextual data)
Ensure interoperability between existing models and formats

Requirements and Priorities

The Phase 1 requirements are categorized with three priority levels (High, Medium, Low) across multiple categories:

High Priority Requirements: - Suitable exchange format for conversion between avatar representation formats - Mesh-based format for representation and animation - Signal coding format - Semantic and signal representation - Multiple levels of detail for geometry - Facial and body animation - Delay-sensitive animation streams - Partial transport of base avatar - Various storage and transport capabilities

Medium Priority Requirements: - DRM protection support - Integration into scene description - Avatar authenticity and user association protection

Low Priority Requirements: - Avatar-avatar, user-avatar, avatar-scene interactions - Storage and replay of animation streams

ARF Data Model and Structure

Core Components

The ARF data model (Figure 12) includes the following components:

Preamble Section: - Signature string for unique document identification - Version string tied to specific ARF revision - Optional authenticationFeatures (encrypted facial and voice feature vectors with public key URI) - supportedAnimations object specifying compatible animation frameworks (facial, body, hand, landmark, texture) - Optional proprietaryAnimations for vendor-specific schemes

Metadata Object: - Avatar-level descriptive information (name, unique identifier, age, gender) - Used for experience adaptation and policy/access control

Components Section: - Skeleton: Defines joints with inverse bind matrices, optional animationInfo - Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4×4 matrix transformations - Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights - Mesh: Geometric primitives with name, ID, optional path, geometry data items - BlendshapeSets: Shape targets for base mesh with optional animationInfo - LandmarkSets: Vertex/face indices with barycentric weights for tracked landmarks - TextureSets: Material resources with texture targets and animation links

Container Formats

Two container formats are supported:

ISOBMFF containers (ISO/IEC 14496-12):
ARF document in ISOBMFF item in top-level MetaBox
Additional items for each component
May include animation track with time-based samples
Zip-based containers (ISO/IEC 21320-1):
Top-level ARF document
Component files referenced relative to document location

Both formats support partial access to avatar components.

Integration with MPEG Scene Description

Scene Description Integration

ARF designed to work with MPEG Scene Description (ISO/IEC 23090-14) based on glTF
Not limited to MPEG Scene Description; can integrate with any scene description solution
ISO/IEC 23090-14 defines MPEG_node_avatar extension
ISO/IEC 23090-39 extends MPEG_node_avatar for better ARF integration

Reference Client Architecture

The reference architecture (Figure 13) includes:

Avatar Pipeline: Part of Media Access Function (MAF)
Retrieves avatar model and associated information
Fetches ARF container and animation streams
Animates and reconstructs avatar
Provides reconstructed avatar to Presentation Engine through buffers containing 3D mesh components

Animation Bitstream Format

Avatar Animation Units (AAUs)

The animation stream format uses AAUs as the fundamental structure (Figure 14):

AAU Structure: - Header: - AAU type (7-bit code) - AAU payload length (bytes) - Payload: - 32-bit timestamp in "ticks" - Type-specific data - Optional padding for byte alignment

AAU Types: - AAU_CONFIG: Configuration unit - AAU_BLENDSHAPE: Facial animation sample - AAU_JOINT: Body/hand joint animation sample - AAU_LANDMARK: Landmark animation sample - AAU_TEXTURE: Texture animation sample

Configuration Units

Configuration AAUs communicate stream-level parameters: - Animation profile string (UTF-8 encoded) - Timescale value (32-bit float, ticks per second)

Facial Animation Samples (AAU_BLENDSHAPE)

Structure includes: - Target blendshape set identifier - Per-blendshape confidence flag - Number of blendshape entries - For each entry: blendshape index, weight (32-bit float), optional confidence (32-bit float)

Deformation Formula: v = v₀ + Σₖ wₖ · Δvₖ Where: - v₀: base vertex position - Δvₖ: offset for blendshape k - wₖ: transmitted blendshape weight

Joint Animation Samples (AAU_JOINT)

Structure includes: - Target joint set identifier - Per-joint velocity flag - Number of joint entries - For each entry: joint index, 4×4 transformation matrix (16 floats), optional 4×4 velocity matrix

Linear Blend Skinning (LBS) Formula: vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰ Where: - wᵢⱼ: weight of joint j on vertex i - Mⱼ: global transformation matrix for joint j - vᵢ⁰: rest position of vertex i

Landmark Animation Samples (AAU_LANDMARK)

Structure includes: - Landmark set ID - Velocity and confidence flags - Dimensionality flag (2D vs. 3D) - Number of landmarks - For each landmark: index, coordinates (2D or 3D), optional velocity and confidence

Use cases: facial tracking overlays, sensor-mesh registration, animation data calibration

Texture Animation Samples (AAU_TEXTURE)

Structure analogous to blendshape samples but applied to texture targets: - Controls parametric texture effects (micro-geometry patterns, makeup, dynamic material variations)

Animation Stream Delivery

Dual delivery modes: 1. Live transmission: Sequences of AAUs for real-time avatar driving 2. Stored format: Avatar animation tracks in ISOBMFF-based ARF container with sample grouping for pre-recorded sequences ("smile," "wave," "dance")

Ongoing Exploration Experiments

The group continues exploration on:

Compression for Animation Streams: Methods for compressing facial and body animation streams
Integrating Geometry Data Components: Specifying integration of avatar data into interoperable container format
Animation Sample Formats: Developing structures for various animation data types (blend shapes, facial landmarks, animation controllers, joint transforms)
Content Discovery and Partial Access: Solutions for content discovery and partial access
Animation Controllers: Study on combining blend shape and joint animation

View PDF View Metadata

TDoc: S4-260277 | PDF

InterDigital Canada

Title

[FS_Avatar_Ph2_MED] Procedures for BAR API Operations

Comprehensive Summary: Procedures for BAR API Operations

1. Introduction and Context

This contribution addresses procedures for Base Avatar Repository (BAR) APIs that were defined in Rel-19 as part of the AvCall-MED work item and integrated into TS 26.264 Annex B. The document was originally presented as S4-251909 at SA4#134 meeting but was redirected to the Rel-20 FS_Avatar_Ph2_MED study. The contribution provides detailed operational procedures for BAR APIs enabling UE or MF interaction with the Base Avatar Repository.

2. Base Avatar Models API Procedures

2.1 Create Base Avatar Model

Procedure Flow: - Requestor (DC AS or MF) invokes Mbar_Management_Avatars_CreateBaseAvatarModel via HTTP POST - Binary ARF container included in request body - BAR authenticates/authorizes request - Upon authorization, BAR stores ARF container locally and creates Avatar resource entity with globally unique identifier - Response: 201 Created with Avatar resource entity

Request Information Elements: - Security credentials (M) - Binary ARF container (M)

Response Information Elements: - Avatar resource entity (CM) - present on successful creation

Note: DC AS or BAR apply restrictions on created avatar container (location access, User ID authentication, etc.)

2.2 Get Base Avatar Model

Procedure Flow: - Requestor invokes Mbar_Management_BaseAvatarModels_GetBaseAvatarModel via HTTP GET - {avatarId} replaced in resource path - BAR retrieves ARF container corresponding to avatarId - Response: 200 OK with Avatar resource and binary ARF container

Request Information Elements: - Security credentials (M)

Response Information Elements: - Avatar resource entity (M) - Binary container (M)

2.3 Update Base Avatar Model

Procedure Flow: - Requestor invokes Mbar_Management_Avatars_UpdateBaseAvatarModel via HTTP PUT or PATCH - HTTP PUT: Binary ARF container in body (full replacement) - HTTP PATCH: Multipart/mixed message per RFC 2046 with asset identifiers list and binary assets - BAR authenticates/authorizes and performs update - Response: 200 OK with updated Avatar resource entity

Request Information Elements: - Security credentials (M) - Binary container (CM) - PUT only - AssetIds (CM) - PATCH only - Assets (CM) - PATCH only

Response Information Elements: - Avatar resource entity (CM) - present on successful update

2.4 Delete Base Avatar Model

Procedure Flow: - Requestor invokes Mbar_Managment_Avatars_DeleteBaseAvatarModel via HTTP DELETE - BAR authenticates/authorizes request - Upon authorization, BAR deletes ARF container and destroys resource - Response: 204 No Content

Request Information Elements: - Security credentials (M)

Response: No payload

3. Assets API Procedures

3.1 Create Asset

Procedure Flow: - Requestor invokes Mbar_Management_Assets_CreateAsset via HTTP POST - {avatarId} replaced in resource path - BAR retrieves avatar container and adds binary asset - Response: 201 Created with assetId

Request Information Elements: - Security credentials (M) - Binary asset (M)

Response Information Elements: - Avatar resource entity (M) - updated container with new asset

3.2 Retrieve Asset

Procedure Flow: - Requestor invokes Mbar_Management_Assets_RetrieveAsset via HTTP GET - {avatarId} and {assetId} replaced in resource path - BAR retrieves ARF container and extracts asset - Response: 200 OK with binary asset

Request Information Elements: - Security credentials (M)

Response Information Elements: - Binary asset (M)

3.3 Update Asset

Procedure Flow: - Requestor invokes Mbar_Management_Assets_UpdateAsset via HTTP PUT or PATCH - HTTP PUT: Binary asset in body (full replacement) - HTTP PATCH: Multipart/mixed message with LoDs/components to replace - BAR authenticates/authorizes and performs update - Response: 200 OK with updated Avatar resource entity

Request Information Elements: - Requestor identifier (M) - Security credentials (M) - Asset (CM)

Response Information Elements: - Avatar resource entity (CM) - present on successful update

3.4 Delete Asset

Procedure Flow: - Requestor invokes Mbar_Management_Assets_DestroyAsset via HTTP DELETE - BAR authenticates/authorizes request - Upon authorization, retrieves ARF container, deletes asset, and may repackage container - Response: 204 No Content

Request Information Elements: - Security credentials (M)

Response: No payload

4. Avatar Representations API Procedures

4.1 Create Avatar Representation

Procedure Flow: - Requestor invokes Mbar_Management_AvatarRepresentations_CreateAvatarRepresentation via HTTP POST - Request body includes Avatar Representation resource without avatarRepresentationId - BAR authenticates/authorizes and creates AvatarRepresentation resource entity with globally unique identifier - Response: 201 Created

Request Information Elements: - Security credentials (M) - Avatar representation (M) - with avatarId and assetIds properties set

Response Information Elements: - AvatarRepresentation resource entity (CM) - present on successful creation

4.2 Retrieve Avatar Representation

Procedure Flow: - Requestor invokes Mbar_Management_AvatarRepresentations_GetAvatarRepresentation via HTTP GET - {avatarId} replaced in resource path - BAR retrieves assets listed in Avatar Representation and compiles container - Response: 200 OK with Avatar Representation resource and binary container

Request Information Elements: - Security credentials (M)

Response Information Elements: - AvatarRepresentation resource entity (M) - Binary container (M)

4.3 Update Avatar Representation

Procedure Flow: - Requestor invokes Mbar_Management_Avatar_Representations_UpdateAvatarRepresentation via HTTP PUT or PATCH - HTTP PUT: Avatar Representation object in body (full replacement) - HTTP PATCH: Multipart/mixed message with asset identifier mappings (source to replacement) or map data structure - Only asset identifiers existing in respective Avatar resource are allowed - BAR authenticates/authorizes and performs update - Response: 200 OK with updated Avatar Representation resource entity

Note: Only avatar representation owner allowed to modify representation

Request Information Elements: - Security credentials (M) - Avatar Representation (CM) - PUT only - Source Asset Ids (CM) - PATCH only - New Asset Ids (CM) - PATCH only

Response Information Elements: - AvatarRepresentation resource entity (CM) - present on successful update

4.4 Destroy Avatar Representation

Procedure Flow: - Requestor invokes Mbar_Management_AvatarRepresentations_DeleteAvatarRepresentation via HTTP DELETE - BAR authenticates/authorizes and deletes Avatar Representation resource - Response: 204 No Content (or 200 OK if response body needed)

Request Information Elements: - Security credentials (M)

Response: No payload

5. Associated Information API Procedures

5.1 Retrieve Associated Information

Procedure Flow: - Requestor invokes Mbar_Management_AssociatedInformation_GetAssoicatedInformation via HTTP GET - {avatarId} replaced in resource path - BAR retrieves AssociatedInfo object from Avatar Representation resource - Response: 200 OK with AssociatedInfo object

Request Information Elements: - Security credentials (M)

Response Information Elements: - AssociatedInfo object (M)

6. Proposal

The contribution proposes to: - Document section 2 contents as new clause 8.3.3.4 in aggregated CR to TR 26.813 - Add editor's note to clause 8.3.3.2 indicating need for updates to reflect BAR APIs defined in TS 26.264

7. References

IETF RFC 2046: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"

View PDF View Metadata

TDoc: S4-260285 | PDF

InterDigital New York

Title

Avatar-udpate to section 6.3.4

Summary of S4-260285: AVATAR-Update to section 6.3.4

Document Overview

This contribution updates section 6.3.4 of the AVATAR specification to align with the current status of MPEG Avatar Representation Format (ARF) work. The document reflects progression of ISO/IEC 23090-39 from early development to Committee Draft International Standard (CDIS) stage.

Main Technical Changes

MPEG ARF Specification Status Update

Reference document updated: From WG03N1316 to WG03N1693
Specification maturity: ISO/IEC 23090-39 has reached Committee Draft International Standard (CDIS) stage
Scope refinement: Editorial improvements to description of avatar representation format scope, including interchange format for computer-generated avatars, containers, and animation stream formats

Avatar Data Model and Representation Format

Restructured Data Model Description

The document significantly restructures how the ARF data model components are described:

High-level Avatar Information (Metadata Object): - Name, Identifier, Age, Gender - Holds avatar-level descriptive information for system adaptation and policy control

Preamble Section (new addition): - Signature string for unique document identification - Version string tied to specific ARF revision - Optional authenticationFeatures with encrypted facial/voice feature vectors and public key URI - supportedAnimations object specifying compatible facial, body, hand, landmark, and texture animation frameworks using URNs - Optional proprietaryAnimations for vendor-specific schemes (e.g., ML-based reconstruction models)

Components Section (detailed expansion): - Skeleton: Defines joints as scene graph nodes subset, references inverse bind matrices data item (Nx16 tensor), optional animationInfo - Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4x4 matrix transformations - Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights tensor (NxM) - Mesh: Geometric primitives with name, ID, optional path, data items containing geometry - BlendshapeSets: Shape targets for base mesh, references geometry-only shapes (GLB files), optional animationInfo - LandmarkSets: Vertex/face indices with barycentric weights for landmark positioning - TextureSets: Material resources linked to texture targets and animation frameworks

Container Format

Supports partial access to avatar components
Two container formats: ISOBMFF (ISO/IEC 14496-12) and Zip-based (ISO/IEC 21320-1)
ISOBMFF containers: ARF document in MetaBox item, may include animation tracks
Zip-based containers: Top-level ARF document with relative component file references

Scene Description Integration

Designed to work with MPEG Scene Description (ISO/IEC 23090-14) based on glTF
Not limited to MPEG Scene Description
ISO/IEC 23090-14 defines MPEG_node_avatar extension
ISO/IEC 23090-39 extends MPEG_node_avatar for better ARF integration

Reference Software (ISO/IEC 23090-43)

Major update from "under development" to defined implementation:

arfref Module (C++ and Python): - Parsing of ARF containers - Helper functions for asset decoding - Partial glTF 2.0 encoding/decoding support for meshes - Animation mapping (AnimationLink objects) - Animation stream decoding - Available through Python language

arfviewer Module: - Avatar Animation Units (AAUs) support - Time-sequence blendshape weights with optional confidence metrics - Joint transformations for skeletal animation - AAU format with chronological data blocks - Inverse kinematics system for missing joint information - Blendshape animator managing neutral mesh vertices and deltas with weighted summation

Reference Client Architecture

Based on ISO/IEC 23090-14 concepts
Avatar pipeline as part of Media Access Function (MAF)
Fetches ARF container and animation streams
Reconstructs avatar and provides to Presentation Engine through 3D mesh component buffers

Animation Bitstream Format

Comprehensive new section detailing AAU-based animation stream format:

Avatar Animation Units (AAUs) Structure

Sequence of AAUs with header, payload, and optional padding
Header: 7-bit AAU type code, AAU payload length in bytes
Payload: 32-bit timestamp in "ticks" plus type-specific data

AAU Types Defined

AAU_CONFIG: Configuration unit
AAU_BLENDSHAPE: Facial animation sample
AAU_JOINT: Body/hand joint animation sample
AAU_LANDMARK: Landmark animation sample
AAU_TEXTURE: Texture animation sample
Reserved ranges for future extensions

Configuration Units

Animation profile string (UTF-8 encoded)
Timescale value (32-bit float) for ticks-per-second conversion
Profile string identifies constraints and options

Facial Animation Samples (AAU_BLENDSHAPE)

Target blendshape set identifier
Per-blendshape confidence flag
Number of blendshape entries
Per entry: index, weight (32-bit float), optional confidence
Deformation formula: v = v₀ + Σₖ wₖ · Δvₖ

Joint Animation Samples (AAU_JOINT)

Target joint set identifier
Per-joint velocity flag
Number of joint entries
Per entry: joint index, 4×4 transformation matrix, optional velocity matrix
Linear Blend Skinning (LBS) formula: vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰

Landmark Animation Samples (AAU_LANDMARK)

Landmark set ID
Velocity and confidence flags
Dimensionality flag (2D vs 3D)
Number of landmarks
Per landmark: index, coordinates, optional velocity and confidence
Use cases: facial tracking overlays, sensor-mesh registration, calibration

Texture Animation Samples (AAU_TEXTURE)

Parametric texture weights for TextureSet targets
Similar structure to blendshape samples
Controls micro-geometry patterns, makeup, dynamic material variations

Animation Stream Delivery

Live transmission as AAU sequences
Storage as avatar animation tracks in ISOBMFF-based ARF containers
Sample grouping for pre-recorded sequences (e.g., "smile," "wave," "dance")
Dual use for real-time communication and offline authoring/replay

Exploration Experiments

Status changed from "initiated" to "continues":

Compression for Animation Streams: Evaluate compression methods for facial and body animations
Integrating Geometry Data Components: Specify integration into interoperable container format
Animation Sample Formats: Develop structures for blend shapes, facial landmarks, animation controllers, joint transforms
Content Discovery and Partial Access: Evaluate solutions
Animation Controllers: Study combination of blend shape and joint animation

Editorial Corrections

Revision 1 corrects reference software status
Various grammatical and formatting improvements throughout
Consistent terminology usage (e.g., "with" instead of "to with")

View PDF View Metadata

All Summaries

TDoc: S4-260120 | PDF

Title

3D Gaussian Splatting Avatar Methods for Real-Time Communication

Introduction

Survey of 3DGS Avatar Methods

Head and Face Avatar Methods

Key Differentiation Axes

Method Comparison

Mesh-Embedded Gaussian Splatting

Gaussian Blendshapes

Hybrid Methods with Small MLPs

Full-Body Avatar Methods

Method Comparison

Animation Compatibility Classification

1. Purely Explicit (no MLPs)

2. Hybrid (small MLPs)

3. Fully Neural

Proposed Architecture for ARF Integration

Four-Step Integration Approach

Deployment Requirements

Proposals

TDoc: S4-260121 | PDF

Title

Summary of S4-260121: Avatar Evaluation Framework and Objective Metrics

Introduction

Evaluation Framework

Design Principles

Testbed Architecture

Objective Metrics for Avatar Evaluation

Visual Quality Metrics

Animation Quality Metrics

Temporal and Synchronization Metrics

Test Content

Proposals

TDoc: S4-260177 | PDF

Title

Interoperability Guidance for ARF

Introduction

Interoperability Framework

Interoperability Principles

Mapping Signalling in ARF

Receiver Processing Procedure

Mapping Mechanisms

Direct Match and Identifier Spaces

Linear Mappings

Non-linear Mappings

Proposal

TDoc: S4-260188 | PDF

Title

3GPP SA4 LS on Compression Aspects of MPEG-I ARF (ISO/IEC DIS 23090-39)

Document Overview

Background Context

Release 19 Baseline

Release 20 Phase 2 Study

Technical Questions to ISO/IEC

Question 1: Existing MPEG Compression Technologies

Question 2: Integration Timeline

Requested Action

TDoc: S4-260190 | PDF

Title

Summary of S4-260190: Considerations on Security Aspects for Avatar Phase 2

Document Overview

Background and Context

Study Item Scope

Current Status in Specifications

Key Observations

Technical Proposal

TDoc: S4-260192 | PDF

Title

Summary of S4-260192: Authentication for Avatar Data

Document Overview

Background and Motivation

Main Technical Contributions

Proposed Security Framework for IMS-based Avatar Calls

Authentication Mechanism

Operational Procedures

BAA Issuance Procedure (Steps 1-7):

Avatar Authentication Procedure (Step 8):

Implementation Example