Meeting: TSGS4_135_India | Agenda Item: 9.8
11 documents found
[FS_Avatar_Ph2_MED] 3D Gaussian Splatting Avatar Methods for Real-Time Communication
This contribution surveys 3D Gaussian Splatting (3DGS) methods for avatar representation in the context of the Avatar Communication Phase 2 study (FS_Avatar_Ph2_MED, SP-251663), specifically addressing Objective 3 on animation techniques for avatar reconstruction and rendering. The document evaluates 3DGS methods for real-time communication scenarios and their compatibility with MPEG Avatar Representation Format (ARF, ISO/IEC 23090-39).
Key Technical Background: - 3DGS represents objects as sets of anisotropic 3D Gaussians (splats) - Each Gaussian stores: 3D mean position, oriented covariance (ellipsoidal footprint), opacity, and appearance parameters (RGB or spherical harmonics coefficients) - Rendering projects 3D Gaussians into screen space as 2D Gaussians with depth-ordered alpha compositing - Achieves real-time rendering at 100-370 FPS on desktop GPUs with quality comparable to neural radiance fields - Maps well to GPU compute and graphics pipelines
Critical Question for Avatar Communication: How Gaussians deform under animation - either by binding to parametric meshes (FLAME for faces, SMPL/SMPL-X for bodies) or using small neural networks for residual motion prediction.
Head/face methods differ along three practical dimensions:
Most interoperable approaches: Fully explicit runtime with Gaussians driven from same blendshape and skeletal parameters as mesh renderers.
| Method | FPS | Gaussians | Parametric Model | Runtime MLP | Key Feature | |--------|-----|-----------|------------------|-------------|-------------| | GaussianBlendshape | 370 | 70K | Custom blendshapes | No | Linear blending identical to mesh blendshapes, 32-39 dB PSNR | | SplattingAvatar | 300+ | ~100K | FLAME mesh | No | Mesh-embedded via barycentric coords, 30 FPS on iPhone 13 | | FlashAvatar | 300 | 10-50K | FLAME | Small MLP | UV-based init on FLAME, small MLPs for expression offsets | | GaussianAvatars | 90-100 | ~100K | FLAME | No | FLAME-rigged, multi-view training, explicit binding | | HHAvatar | ~100 | ~150K | FLAME | Temporal modules | First method for dynamic hair physics modeling | | MeGA | ~90 | ~200K | FLAME (face) + 3DGS (hair) | No | Hybrid mesh+Gaussian, occlusion-aware blending, editable |
Standout Methods for Real-Time Communication: GaussianBlendshape and SplattingAvatar use purely explicit representations with no runtime neural networks, enabling deterministic rendering and direct ARF compatibility.
Technical Approach: - Each Gaussian anchored to animatable mesh supporting standard blendshapes and skeletal skinning - Parameterization: triangle index + barycentric coordinates + optional offset vector in local tangent-normal frame - Runtime: receiver deforms mesh using joint transforms and blendshape weights, then reconstructs Gaussian center from animated triangle vertices using barycentric weights - No per-frame neural inference required - purely algebraic reconstruction ensures deterministic motion
Orientation and Footprint Handling: - Gaussians stored in local frame aligned to triangle (per-axis scales + local rotation) - Local-to-world transform from animated triangle frame transports covariance - Keeps projected splat stable under motion, avoids jitter - Appearance parameters (opacity, color coefficients) remain static unless dynamic effects explicitly modeled
Standardization Advantages: - Reuses same animation signals as mesh avatar - Enables graceful fallback: mesh-only renderers can ignore Gaussian extension and still animate - 3DGS-capable renderers can render Gaussians alone or hybrid mesh+Gaussians composition
Limitation: Coarse driving mesh can restrict fine-scale effects (lip roll, eyelid thickness, hair motion). Addressed by higher resolution parametric meshes, local offsets, or dedicated Gaussian subsets for non-mesh components.
Technical Approach: - Mirrors classical mesh blendshape animation - Each Gaussian has neutral parameters + per-expression deltas for center position, scale, opacity - Runtime computes linear combination identical to mesh blendshape pipeline - Key advantage: Determinism and ARF-friendly control - same blendshape weight stream drives both mesh vertices and Gaussian deltas
Technical Approach: - Parametric model for global control + small neural modules outputting residual offsets conditioned on expression, pose, or time - Improves fine detail and handles effects difficult to capture with purely linear blendshapes
Tradeoff: Runtime inference and model distribution become part of interoperability (model versioning, determinism, platform-specific performance)
Full-body methods have converged on SMPL/SMPL-X parametric body models, enabling compatibility with standard skeletal animation systems.
| Method | FPS | Gaussians | Body Model | Training | Key Feature | |--------|-----|-----------|------------|----------|-------------| | GauHuman | 189 | ~13K | SMPL | 1-2 min | Fastest training, ~3.5 MB storage, KL divergence split/clone | | HUGS | 60 | ~200K | SMPL | 30 min | Disentangles human/scene | | ASH | ~60 | ~100K | SMPL | ~1 hour | 2D texture-space parameterization, Dual Quaternion skinning, motion retargeting | | GART | >150 | ~50K | SMPL | sec-min | Latent bones for non-rigid deformations (dresses, loose clothing) | | ExAvatar | ~60 | ~150K | SMPL-X | ~2 hours | Only SMPL-X method with unified body/face/hand animation |
Standout Methods: - GauHuman: Best combination of minimal storage (~3.5 MB) and fast training (1-2 min) - ExAvatar: Only method providing unified body/face/hand animation through SMPL-X - critical for immersive communication
Animation Architecture: - Body model provides compact, standardized animation interface - Base avatar: static set of Gaussians + binding metadata - Runtime: joint transforms from SMPL/SMPL-X pose parameters deform body via skinning - Gaussian propagation: surface anchoring (barycentric/UV coordinates) or direct skinning weights per Gaussian - Enables motion retargeting by sending only pose stream while keeping high-fidelity Gaussian appearance fixed
Non-Rigid Effects Challenge: - Clothing, long hair, accessories don't follow body surface with rigid skinning - Solutions: latent bones or local deformation modules (additional control points beyond SMPL skeleton) - ARF integration consideration: Distinguish between body-locked Gaussians (fully driven by standardized skeleton) and secondary Gaussians (may require optional control signals or local simulation)
Distribution Size Considerations: - Full-body avatars require tens to hundreds of thousands of Gaussians - Each Gaussian includes geometry and appearance attributes - Compression and level-of-detail essential for real deployments - Practical ARF profile should specify default Gaussian count budget and allow progressive refinement layers for high-end devices
Methods classified into three categories based on runtime architecture:
Methods: SplattingAvatar, GaussianBlendshape, GaussianAvatars - Performance: 300-370 FPS - ARF Compatibility: Direct mapping - Animation: Driven entirely by standard skeletal joints and blendshape weights - Fully compatible with ARF Animation Stream Format
Methods: 3DGS-Avatar, FlashAvatar, HUGS - Performance: 50-100 FPS (near-real-time) - Architecture: Small MLPs add expression-dependent offsets without fundamentally changing animation interface - ARF Integration: Can still be driven by blendshape parameters with MLP weights distributed as part of base avatar
Methods: Gaussian Head Avatar, GaussianHead - Training: 1-2 days - Latency: Higher - ARF Integration: May be integrated into ARF containers as proprietary customized models
Interoperability Key Question: Not whether MLP exists, but whether animation interface remains the same. If driven solely by joints and blendshape weights, ARF Animation Stream Format remains sufficient and decoder only needs renderer choice.
Determinism Considerations: - Explicit methods: Naturally deterministic given fixed floating-point rules, no platform-specific neural inference dependency - Hybrid methods: Viable if MLP is small and shipped as part of base avatar, but conformance should define fixed operator sets and numerical tolerances - Fully neural pipelines: Better treated as optional proprietary components inside ARF container rather than baseline interoperable tool
Step 1: Storage - Store mesh-embedded Gaussians as auxiliary data within glTF/ARF containers - Parameterization: relative to mesh surface using barycentric coordinates (SplattingAvatar) or linear blendshape offsets (GaussianBlendshape) - Preserves backward compatibility with mesh-only renderers
Step 2: Animation - Animate via standard skeletal and blendshape parameters already defined in ARF Animation Stream Format - No changes to animation stream required - Gaussian positions derived from same joint transforms and blendshape weights used for mesh animation
Step 3: Compression - Apply GS compression for Gaussian attributes within base avatar to minimize distribution size
Step 4: Streaming - Stream only AAUs at approximately 40 KB/s for real-time animation - Base avatar (including compressed Gaussian data) distributed once at session establishment - Enables high-quality Gaussian splatting rendering on capable devices while maintaining mesh-based rendering compatibility on constrained devices
Capability Exchange: - Endpoints signal support for 3DGS rendering - Supported attribute sets - Supported Gaussian count budgets - Fallback to mesh rendering if 3DGS not supported or resources constrained - Avoids ecosystem fragmentation and maintains backward compatibility
The document proposes that SA4 considers the following for FS_Avatar_Ph2_MED study:
Acknowledge 3D Gaussian Splatting as a viable rendering primitive for avatar communication
Coordinate with MPEG on integration of Gaussian splatting data within ARF Base Avatar Format (ISO/IEC 23090-39)
Evaluate compression techniques (SPZ, L-GSC, HAC++, Compact3D) for inclusion in study of static and animation data compression (Objective 7)
Define capability signaling and conformance points for 3DGS avatar rendering:
Required numerical tolerances for determinism
Study hybrid approaches with small MLPs - whether they warrant optional ARF profile, and if so, constrain operator sets and model sizes to preserve portability
[FS_Avatar_Ph2_MED] Avatar Evaluation Framework and Objective Metrics
This contribution addresses Objectives 2 and 3 of the Avatar Communication Phase 2 SID (SP-251663), which concern QoE metrics, evaluation frameworks, and evaluation criteria for animation techniques. The document proposes a practical evaluation methodology designed to deliver repeatable, automated, and vendor-neutral results based on a core principle: evaluate what the user actually sees by measuring quality from rendered video output rather than internal system parameters.
The framework is built on four key principles:
The proposed testbed comprises five key components:
The contribution proposes metrics across three quality dimensions:
Video-based computation extracting landmarks and skeletons from rendered output:
Proposed for second phase evaluation due to complexity:
Standardized animation streams should cover:
Each test set should contain reference audio, reference animation streams, and reference rendered video from both high-quality reference pipeline and source capture.
The contribution proposes to:
[FS_Avatar_Ph2_MED] Interoperability guidance for ARF
This contribution addresses the FFS noted in TS 26.264 clause 5.6.1 regarding evaluation of MPEG ARF and interoperability aspects. The key interoperability challenge is mapping: receivers can only animate an avatar if they can correctly map incoming animation parameters to the appropriate Skeleton, BlendshapeSet, and LandmarkSet in the ARF container.
ISO/IEC 23090-39 defines signalling to declare supported animation frameworks and provide mapping tables. This contribution proposes concrete interoperability guidance with detailed examples for both linear and non-linear mappings.
The proposed guidance is based on four core principles:
Single source of truth: The ARF document is the normative description for interpreting an animation stream for a given avatar
Explicit profile identification: Each animation stream identifies its animation profile, and the ARF document lists supported profile URNs
Deterministic mapping: When the stream profile doesn't directly match stored assets, the ARF document provides mapping tables from the stream profile to the target asset set
Sender responsibility: The sender (who owns the ARF container) ensures either direct matching identifiers are used or mappings are present. The receiver is not expected to guess.
ARF provides three signalling layers for mapping between animation frameworks:
SupportedAnimations: Lists supported face, body, hand, landmark, and texture animation profiles as URNs. Each URN identifies a framework and specific parameter set (e.g., blendshape set or joint set).
AnimationInfo and AnimationLink: Each animatable asset in components (Skeleton, BlendshapeSet, LandmarkSet) includes animationInfo. Each AnimationLink points to one SupportedAnimations entry as the target for that asset.
Mapping Objects: When additional frameworks are used for capture or streaming, animationInfo can include Mapping objects that map from a source SupportedAnimations entry to the target entry. Two mapping types are supported: - LinearAssociation: Expresses a weighted sum from multiple source parameters to one target parameter - NonLinearAssociation: Expresses non-linear transforms using one or more channels with lookup tables and interpolation
Mapping indices refer to parameter identifiers in the animation stream (ShapeKey.id for blendshapes, target joint index for joint animation, target landmark index for landmark animation).
The receiver applies the following procedure:
Parse preamble.supportedAnimations and build an index-to-URN map for each animation type
Determine the animation profile used by the received stream and find its index in the corresponding SupportedAnimations list
Select the target Skeleton, BlendshapeSet, or LandmarkSet to animate and find the matching AnimationLink entry
If the stream profile index equals AnimationLink.target, apply stream parameters directly to target assets
Otherwise, find a Mapping entry where Mapping.source equals the stream profile index, then apply the LinearAssociation or NonLinearAssociation rules to compute target parameters
For any target indices not produced by mapping, use neutral defaults (0.0 for blendshape weights, bind pose for joints, neutral position for landmarks)
The simplest case occurs when the sender generates the animation stream using the same framework and parameter set as the target asset in ARF.
| Scenario | Typical Issue | ARF Signalling and Behaviour | |----------|---------------|------------------------------| | Direct match | Stream profile and parameter identifiers match target assets in ARF container | No mapping needed. Receiver applies parameters directly. ARF document declares profile in SupportedAnimations and links target assets with AnimationLink.target | | Subset | Source and target use same semantics but target has fewer parameters | Unmapped target parameters default to neutral values |
Linear mappings are suitable when a target parameter can be expressed as a weighted sum of one or more source parameters. Typical use cases include mirroring left/right shapes, splitting/merging parameters, and simple scaling. Represented in ARF by LinearAssociation with targetIndex, sourceIndices, and weights.
Examples:
| Target Parameter (ARF) | Source Parameters (Stream) | Linear Association | |------------------------|---------------------------|-------------------| | Smile (targetIndex 12) | mouthSmileLeft (5), mouthSmileRight (6) | w12 = 0.5w5 + 0.5w6 | | JawOpen (targetIndex 3) | jawOpen (13) | w3 = 1.0w13 | | MouthCornerPull (targetIndex 20) | mouthSmileLeft (5), mouthSmileRight (6), cheekSquintLeft (26), cheekSquintRight (27) | w20 = 0.4w5 + 0.4w6 + 0.1w26 + 0.1*w27 |
Non-linear mappings are needed when linear blending is insufficient. Typical cases include dead zones, saturation, perceptual calibration curves, and gating where one parameter modulates another. Represented in ARF by NonLinearAssociation. Each channel maps one source parameter through a lookup table defined by Data items. Channel outputs are combined using COMBINATION_SUM or COMBINATION_MUL.
Examples:
| Target Parameter (ARF) | Source Parameter(s) | Non-linear Mapping | |------------------------|---------------------|-------------------| | JawOpen (targetIndex 3) | jawOpen (13) | Piecewise curve with deadzone and saturation. Example input [0.0,0.1,0.4,1.0] maps to output [0.0,0.0,0.7,1.0] with INTERPOLATION_LINEAR | | Blink (targetIndex 7) | eyeBlinkLeft (1), eyeBlinkRight (2) | Each eye uses threshold curve. INTERPOLATION_STEP to convert soft signal into binary blink. Combine with COMBINATION_SUM and clamp to [0,1] | | MouthOpenSmile (targetIndex 30) | jawOpen (13) and Smile (12 after linear mapping) | Use COMBINATION_MUL to gate smile by jaw opening. Channel 1 maps jawOpen through deadzone curve. Channel 2 maps smile through S curve. Multiply channel outputs | | BrowRaise (targetIndex 15) | browInnerUp (9) | Gamma curve to better match target rig. Example output = pow(input, 0.5). Approximated with LUT and INTERPOLATION_CUBICSPLINE | | Landmark mouthMidTop (targetIndex 18) | landmarks 50 and 52 | Non-linear only if needed for stabilization or bias compensation. Example: apply LUT to compress extreme motion before writing 2D or 3D coordinate |
The contribution proposes:
Document the content of sections 2 and 3 in TR 26.813
Add explicit text to TS 26.264 clause 5.6.1 stating that the avatar owner shall ensure identifiers used in avatar animation streams either:
Mapping tables are present in the ARF document to convert from the stream profile to the target assets
Remove the corresponding note from TS 26.264 and declare it as resolved
[FS_Avatar_Ph2_MED] Draft LS on MPEG I ARF compression aspects
This is a Liaison Statement (LS) from 3GPP TSG SA WG4 to ISO/IEC JTC1/SC29/WG7 and WG3 regarding compression aspects of avatar representation formats for Release 20 work on avatar communication Phase 2.
SA4 is seeking clarification on two critical aspects:
Are there existing MPEG technologies that can be utilized to compress: - Avatar static data (especially meshes) - Avatar animation data including: - Blend shape sets - Skeletal animation - Other animation-related information
If such compression technologies exist: - Are there plans to integrate them into ISO/IEC DIS 23090-39? - What is the anticipated timeline for such integration in the context of 3GPP Release 20 schedule?
SA4 formally requests ISO/IEC SC29/WG7 and ISO/IEC SC29/WG3 to provide answers to both questions above, considering the Release 20 timeline constraints.
[FS_Avatar_Ph2_MED] Considerations on security aspects
This contribution from Nokia addresses security-related gaps in the Rel-20 study item FS_Avatar_Ph2_MED, specifically focusing on security mechanisms for Avatar communications in 3GPP systems.
The Rel-20 SID FS_Avatar_Ph2_MED (approved at SA#110, December 2025) aims to address gaps from previous work and resolve open points identified in TS 26.264 Rel-19. Objective 6 specifically mandates collaboration with SA3 to study security implications including: - Identification and authentication (including schemes for Avatar-related APIs) - Privacy preservation - Content protection (e.g., watermarking and DRM) - Secure distribution mechanisms for Avatar data
TS 26.264 Gaps: - No dedicated security clause exists - Clause 5.6.2.2 NOTE 2 identifies content protection aspects as FFS
TR 26.813 Coverage: - Clause 8 describes Access Protection mechanisms for BAR API - Clause 9 addresses security and privacy aspects - However, no exploration of how these methods apply to Avatar calls in 3GPP systems - Conclusion acknowledges need for robust authentication, encryption, and DRM mechanisms with further SA3 collaboration
TS 33.328 Limitations: - New Annex R (Rel-19) specifies security for IMS avatar communication - Covers procedures to prevent UE from providing unauthorized Avatar IDs - Covers authorization for avatar downloads from BAR - Does not cover security controls to prevent sending UE from using fake avatar representations not belonging to the user
The contribution proposes adding a new sub-clause (suggested as 8.3.4) to the base CR for TR 26.813, specifically under Clause 8 (Avatar integration into 3GPP services and enablers). This new sub-clause should:
[FS_Avatar_Ph2_MED] Authentication for avatar data
This contribution from Nokia proposes authentication mechanisms for avatar data in IMS-based avatar calls as part of the FS_Avatar_Ph2_MED study item (Rel-20). The document addresses security gaps identified in Rel-19 TS 26.264, specifically focusing on authentication schemes for avatar-related APIs.
The Rel-20 SID FS_Avatar_Ph2_MED (approved at SA#110, Dec 2025) includes an objective to study security implications in collaboration with SA3, covering: - Identification and authentication (including schemes for Avatar related APIs) - Privacy preservation - Content protection (watermarking and DRM) - Secure distribution mechanisms for Avatar data
Currently, TR 26.813 and TS 33.328 do not address these security aspects.
The contribution proposes adding a new sub-clause 8.3.4 covering security considerations for IMS-based avatar calls.
Core Concept: - Introduces a Digital Credential-based solution using Base Avatar Assertion (BAA) - BAA cryptographically binds the Base Avatar Representation to the avatar owner - Ensures that a base avatar represents the actual user of the avatar
Architecture Components:
Proves possession of private key corresponding to public key in BAA
Issuer:
Base Avatar Assertion (BAA) Structure: - Digital Credential proving a Base Avatar represents a user owning a specific private/public key pair - Generic structure shown in Figure Y (referenced but not detailed in text)
Figure Z provides an example implementation of authenticator and issuer in the current system architecture (specific details not provided in text).
The contribution proposes to add the above content as a base CR to address authentication requirements for avatar data in IMS-based avatar calls.
[FS_Avatar_Ph2_MED] Media Configuration for Avatar Calls
This discussion paper addresses media configuration requirements for AR-MTSI clients supporting Avatar communication within the context of the Study on Avatar communication Phase 2. The Phase 2 study focuses on enabling additional Avatar use cases and enhancing Avatar-based RTC services with emphasis on quality of experience and advanced animation features for photo-realistic and immersive user experiences.
The document reviews existing media configuration requirements defined in TS 26.264 for AR-MTSI clients:
When network animation and rendering is requested, an AR AS shall: - Allocate an MF capable of real-time avatar rendering - Configure the MF with appropriate rendering parameters based on receiving UE's video capabilities - Modify SDP to route avatar animation data to the MF instead of receiving UE, inserting the MF into the media path
The document identifies a critical gap: TS 26.264 has not yet documented the media configuration details for an AR-MTSI client in terminal that intends to participate in an avatar call.
When media configuration details were proposed in S4-251845 at SA4-134 Dallas meeting, feedback indicated that the behavior of IMS network elements is unspecified in the IMS architecture when an MTSI client sends the new Contact header field parameters "+sip.3gpp-ar-support" and/or "+sip.3gpp-avatar-support" in a SIP REGISTER message.
The document proposes to send a Liaison Statement to SA2 requesting: - Definition of IMS network behavior when an AR-MTSI client registers with Contact header field "+sip.3gpp-ar-support" and/or "+sip.3gpp-avatar-support" - Specification of how to provide a suitable MF capable of providing AR rendering and/or avatar rendering support in an IMS session
A draft LS is provided in companion document S4-260227.
[FS_Avatar_Ph2_MED] LS on IMS network behaviour for new Contact header parameters
SA4 has introduced new media configuration requirements in TS 26.264 for Augmented Reality (AR) and avatar-based MTSI clients. These developments have architectural implications for IMS network behavior that require SA2's attention and clarification.
New Contact Header Parameter: +sip.3gpp-ar-support
New Contact Header Parameter: +sip.3gpp-avatar-support
"avatar-capable": Terminal can animate and render avatars locally"avatar-assisted": Network support required for avatar animation and renderingNetwork-Based Avatar Rendering (Clause 7.3.2): - When network-based avatar animation/rendering is requested, an AR Application Server shall: - Allocate a Media Function (MF) capable of real-time avatar rendering - Configure the MF based on receiving UE's video capabilities - Modify SDP to insert the MF into the media path
SA4 has identified that IMS architecture specifications do not currently define the behavior of IMS network elements when MTSI clients include these new Contact header field parameters (+sip.3gpp-ar-support and/or +sip.3gpp-avatar-support) in SIP REGISTER messages.
Is there any architectural guidance on: - Whether or how IMS entities should interpret these parameters? - How suitable Media Functions providing AR rendering and/or avatar rendering support should be selected and invoked?
If guidance is not currently available, SA4 requests SA2 to consider studying, in the IMS architecture specifications, the expected behavior of IMS network elements when an AR-MTSI client registers its capabilities via Contact header field parameters.
This includes (but is not limited to): - Mechanisms for recognizing terminal capabilities during registration - Enabling the provisioning or insertion of appropriate Media Functions to support: - AR rendering - Avatar rendering - Future similar services within an IMS session
SA4 believes such clarification would: - Ensure architectural consistency - Facilitate interoperable deployment of AR and avatar-based services in IMS
SA4 kindly asks SA2 to review the above information and provide guidance on the way forward.
Avatar-udpate to section 6.3.4
This document provides a comprehensive update to section 6.3.4 concerning the MPEG Avatar Representation Format (ARF), now standardized as ISO/IEC 23090-39. The document reflects the progression of the standard from its initial development phase to reaching Committee Draft International Standard (CDIS) stage.
The MPEG WG03 (Systems) workgroup is developing a new standard for avatar representation format with the following scope:
The Phase 1 requirements are categorized with three priority levels (High, Medium, Low) across multiple categories:
High Priority Requirements: - Suitable exchange format for conversion between avatar representation formats - Mesh-based format for representation and animation - Signal coding format - Semantic and signal representation - Multiple levels of detail for geometry - Facial and body animation - Delay-sensitive animation streams - Partial transport of base avatar - Various storage and transport capabilities
Medium Priority Requirements: - DRM protection support - Integration into scene description - Avatar authenticity and user association protection
Low Priority Requirements: - Avatar-avatar, user-avatar, avatar-scene interactions - Storage and replay of animation streams
The ARF data model (Figure 12) includes the following components:
Preamble Section: - Signature string for unique document identification - Version string tied to specific ARF revision - Optional authenticationFeatures (encrypted facial and voice feature vectors with public key URI) - supportedAnimations object specifying compatible animation frameworks (facial, body, hand, landmark, texture) - Optional proprietaryAnimations for vendor-specific schemes
Metadata Object: - Avatar-level descriptive information (name, unique identifier, age, gender) - Used for experience adaptation and policy/access control
Components Section: - Skeleton: Defines joints with inverse bind matrices, optional animationInfo - Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4×4 matrix transformations - Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights - Mesh: Geometric primitives with name, ID, optional path, geometry data items - BlendshapeSets: Shape targets for base mesh with optional animationInfo - LandmarkSets: Vertex/face indices with barycentric weights for tracked landmarks - TextureSets: Material resources with texture targets and animation links
Two container formats are supported:
May include animation track with time-based samples
Zip-based containers (ISO/IEC 21320-1):
Both formats support partial access to avatar components.
The reference architecture (Figure 13) includes:
The animation stream format uses AAUs as the fundamental structure (Figure 14):
AAU Structure: - Header: - AAU type (7-bit code) - AAU payload length (bytes) - Payload: - 32-bit timestamp in "ticks" - Type-specific data - Optional padding for byte alignment
AAU Types: - AAU_CONFIG: Configuration unit - AAU_BLENDSHAPE: Facial animation sample - AAU_JOINT: Body/hand joint animation sample - AAU_LANDMARK: Landmark animation sample - AAU_TEXTURE: Texture animation sample
Configuration AAUs communicate stream-level parameters: - Animation profile string (UTF-8 encoded) - Timescale value (32-bit float, ticks per second)
Structure includes: - Target blendshape set identifier - Per-blendshape confidence flag - Number of blendshape entries - For each entry: blendshape index, weight (32-bit float), optional confidence (32-bit float)
Deformation Formula:
v = v₀ + Σₖ wₖ · Δvₖ
Where:
- v₀: base vertex position
- Δvₖ: offset for blendshape k
- wₖ: transmitted blendshape weight
Structure includes: - Target joint set identifier - Per-joint velocity flag - Number of joint entries - For each entry: joint index, 4×4 transformation matrix (16 floats), optional 4×4 velocity matrix
Linear Blend Skinning (LBS) Formula:
vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰
Where:
- wᵢⱼ: weight of joint j on vertex i
- Mⱼ: global transformation matrix for joint j
- vᵢ⁰: rest position of vertex i
Structure includes: - Landmark set ID - Velocity and confidence flags - Dimensionality flag (2D vs. 3D) - Number of landmarks - For each landmark: index, coordinates (2D or 3D), optional velocity and confidence
Use cases: facial tracking overlays, sensor-mesh registration, animation data calibration
Structure analogous to blendshape samples but applied to texture targets: - Controls parametric texture effects (micro-geometry patterns, makeup, dynamic material variations)
Dual delivery modes: 1. Live transmission: Sequences of AAUs for real-time avatar driving 2. Stored format: Avatar animation tracks in ISOBMFF-based ARF container with sample grouping for pre-recorded sequences ("smile," "wave," "dance")
The group continues exploration on:
[FS_Avatar_Ph2_MED] Procedures for BAR API Operations
This contribution addresses procedures for Base Avatar Repository (BAR) APIs that were defined in Rel-19 as part of the AvCall-MED work item and integrated into TS 26.264 Annex B. The document was originally presented as S4-251909 at SA4#134 meeting but was redirected to the Rel-20 FS_Avatar_Ph2_MED study. The contribution provides detailed operational procedures for BAR APIs enabling UE or MF interaction with the Base Avatar Repository.
Procedure Flow:
- Requestor (DC AS or MF) invokes Mbar_Management_Avatars_CreateBaseAvatarModel via HTTP POST
- Binary ARF container included in request body
- BAR authenticates/authorizes request
- Upon authorization, BAR stores ARF container locally and creates Avatar resource entity with globally unique identifier
- Response: 201 Created with Avatar resource entity
Request Information Elements: - Security credentials (M) - Binary ARF container (M)
Response Information Elements: - Avatar resource entity (CM) - present on successful creation
Note: DC AS or BAR apply restrictions on created avatar container (location access, User ID authentication, etc.)
Procedure Flow:
- Requestor invokes Mbar_Management_BaseAvatarModels_GetBaseAvatarModel via HTTP GET
- {avatarId} replaced in resource path
- BAR retrieves ARF container corresponding to avatarId
- Response: 200 OK with Avatar resource and binary ARF container
Request Information Elements: - Security credentials (M)
Response Information Elements: - Avatar resource entity (M) - Binary container (M)
Procedure Flow:
- Requestor invokes Mbar_Management_Avatars_UpdateBaseAvatarModel via HTTP PUT or PATCH
- HTTP PUT: Binary ARF container in body (full replacement)
- HTTP PATCH: Multipart/mixed message per RFC 2046 with asset identifiers list and binary assets
- BAR authenticates/authorizes and performs update
- Response: 200 OK with updated Avatar resource entity
Request Information Elements: - Security credentials (M) - Binary container (CM) - PUT only - AssetIds (CM) - PATCH only - Assets (CM) - PATCH only
Response Information Elements: - Avatar resource entity (CM) - present on successful update
Procedure Flow:
- Requestor invokes Mbar_Managment_Avatars_DeleteBaseAvatarModel via HTTP DELETE
- BAR authenticates/authorizes request
- Upon authorization, BAR deletes ARF container and destroys resource
- Response: 204 No Content
Request Information Elements: - Security credentials (M)
Response: No payload
Procedure Flow:
- Requestor invokes Mbar_Management_Assets_CreateAsset via HTTP POST
- {avatarId} replaced in resource path
- BAR retrieves avatar container and adds binary asset
- Response: 201 Created with assetId
Request Information Elements: - Security credentials (M) - Binary asset (M)
Response Information Elements: - Avatar resource entity (M) - updated container with new asset
Procedure Flow:
- Requestor invokes Mbar_Management_Assets_RetrieveAsset via HTTP GET
- {avatarId} and {assetId} replaced in resource path
- BAR retrieves ARF container and extracts asset
- Response: 200 OK with binary asset
Request Information Elements: - Security credentials (M)
Response Information Elements: - Binary asset (M)
Procedure Flow:
- Requestor invokes Mbar_Management_Assets_UpdateAsset via HTTP PUT or PATCH
- HTTP PUT: Binary asset in body (full replacement)
- HTTP PATCH: Multipart/mixed message with LoDs/components to replace
- BAR authenticates/authorizes and performs update
- Response: 200 OK with updated Avatar resource entity
Request Information Elements: - Requestor identifier (M) - Security credentials (M) - Asset (CM)
Response Information Elements: - Avatar resource entity (CM) - present on successful update
Procedure Flow:
- Requestor invokes Mbar_Management_Assets_DestroyAsset via HTTP DELETE
- BAR authenticates/authorizes request
- Upon authorization, retrieves ARF container, deletes asset, and may repackage container
- Response: 204 No Content
Request Information Elements: - Security credentials (M)
Response: No payload
Procedure Flow:
- Requestor invokes Mbar_Management_AvatarRepresentations_CreateAvatarRepresentation via HTTP POST
- Request body includes Avatar Representation resource without avatarRepresentationId
- BAR authenticates/authorizes and creates AvatarRepresentation resource entity with globally unique identifier
- Response: 201 Created
Request Information Elements: - Security credentials (M) - Avatar representation (M) - with avatarId and assetIds properties set
Response Information Elements: - AvatarRepresentation resource entity (CM) - present on successful creation
Procedure Flow:
- Requestor invokes Mbar_Management_AvatarRepresentations_GetAvatarRepresentation via HTTP GET
- {avatarId} replaced in resource path
- BAR retrieves assets listed in Avatar Representation and compiles container
- Response: 200 OK with Avatar Representation resource and binary container
Request Information Elements: - Security credentials (M)
Response Information Elements: - AvatarRepresentation resource entity (M) - Binary container (M)
Procedure Flow:
- Requestor invokes Mbar_Management_Avatar_Representations_UpdateAvatarRepresentation via HTTP PUT or PATCH
- HTTP PUT: Avatar Representation object in body (full replacement)
- HTTP PATCH: Multipart/mixed message with asset identifier mappings (source to replacement) or map data structure
- Only asset identifiers existing in respective Avatar resource are allowed
- BAR authenticates/authorizes and performs update
- Response: 200 OK with updated Avatar Representation resource entity
Note: Only avatar representation owner allowed to modify representation
Request Information Elements: - Security credentials (M) - Avatar Representation (CM) - PUT only - Source Asset Ids (CM) - PATCH only - New Asset Ids (CM) - PATCH only
Response Information Elements: - AvatarRepresentation resource entity (CM) - present on successful update
Procedure Flow:
- Requestor invokes Mbar_Management_AvatarRepresentations_DeleteAvatarRepresentation via HTTP DELETE
- BAR authenticates/authorizes and deletes Avatar Representation resource
- Response: 204 No Content (or 200 OK if response body needed)
Request Information Elements: - Security credentials (M)
Response: No payload
Procedure Flow:
- Requestor invokes Mbar_Management_AssociatedInformation_GetAssoicatedInformation via HTTP GET
- {avatarId} replaced in resource path
- BAR retrieves AssociatedInfo object from Avatar Representation resource
- Response: 200 OK with AssociatedInfo object
Request Information Elements: - Security credentials (M)
Response Information Elements: - AssociatedInfo object (M)
The contribution proposes to: - Document section 2 contents as new clause 8.3.3.4 in aggregated CR to TR 26.813 - Add editor's note to clause 8.3.3.2 indicating need for updates to reflect BAR APIs defined in TS 26.264
IETF RFC 2046: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"
Avatar-udpate to section 6.3.4
This contribution updates section 6.3.4 of the AVATAR specification to align with the current status of MPEG Avatar Representation Format (ARF) work. The document reflects progression of ISO/IEC 23090-39 from early development to Committee Draft International Standard (CDIS) stage.
The document significantly restructures how the ARF data model components are described:
High-level Avatar Information (Metadata Object): - Name, Identifier, Age, Gender - Holds avatar-level descriptive information for system adaptation and policy control
Preamble Section (new addition): - Signature string for unique document identification - Version string tied to specific ARF revision - Optional authenticationFeatures with encrypted facial/voice feature vectors and public key URI - supportedAnimations object specifying compatible facial, body, hand, landmark, and texture animation frameworks using URNs - Optional proprietaryAnimations for vendor-specific schemes (e.g., ML-based reconstruction models)
Components Section (detailed expansion): - Skeleton: Defines joints as scene graph nodes subset, references inverse bind matrices data item (Nx16 tensor), optional animationInfo - Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4x4 matrix transformations - Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights tensor (NxM) - Mesh: Geometric primitives with name, ID, optional path, data items containing geometry - BlendshapeSets: Shape targets for base mesh, references geometry-only shapes (GLB files), optional animationInfo - LandmarkSets: Vertex/face indices with barycentric weights for landmark positioning - TextureSets: Material resources linked to texture targets and animation frameworks
Major update from "under development" to defined implementation:
arfref Module (C++ and Python): - Parsing of ARF containers - Helper functions for asset decoding - Partial glTF 2.0 encoding/decoding support for meshes - Animation mapping (AnimationLink objects) - Animation stream decoding - Available through Python language
arfviewer Module: - Avatar Animation Units (AAUs) support - Time-sequence blendshape weights with optional confidence metrics - Joint transformations for skeletal animation - AAU format with chronological data blocks - Inverse kinematics system for missing joint information - Blendshape animator managing neutral mesh vertices and deltas with weighted summation
Comprehensive new section detailing AAU-based animation stream format:
Status changed from "initiated" to "continues":