Meeting: TSGS4_135_India | Agenda Item: 9.8
11 documents found
| TDoc Number | Source | Title | Summarie |
|---|---|---|---|
| Qualcomm Atheros, Inc. |
[FS_Avatar_Ph2_MED] 3D Gaussian Splatting Avatar Methods for Real-Time Communication
|
3D Gaussian Splatting Avatar Methods for Real-Time CommunicationIntroductionThis contribution surveys 3D Gaussian Splatting (3DGS) methods for avatar representation in the context of the Avatar Communication Phase 2 study (FS_Avatar_Ph2_MED, SP-251663), specifically addressing Objective 3 on animation techniques for avatar reconstruction and rendering. The document evaluates 3DGS methods for real-time communication scenarios and their compatibility with MPEG Avatar Representation Format (ARF, ISO/IEC 23090-39). Key Technical Background: - 3DGS represents objects as sets of anisotropic 3D Gaussians (splats) - Each Gaussian stores: 3D mean position, oriented covariance (ellipsoidal footprint), opacity, and appearance parameters (RGB or spherical harmonics coefficients) - Rendering projects 3D Gaussians into screen space as 2D Gaussians with depth-ordered alpha compositing - Achieves real-time rendering at 100-370 FPS on desktop GPUs with quality comparable to neural radiance fields - Maps well to GPU compute and graphics pipelines Critical Question for Avatar Communication: How Gaussians deform under animation - either by binding to parametric meshes (FLAME for faces, SMPL/SMPL-X for bodies) or using small neural networks for residual motion prediction. Survey of 3DGS Avatar MethodsHead and Face Avatar MethodsKey Differentiation AxesHead/face methods differ along three practical dimensions:
Most interoperable approaches: Fully explicit runtime with Gaussians driven from same blendshape and skeletal parameters as mesh renderers. Method Comparison| Method | FPS | Gaussians | Parametric Model | Runtime MLP | Key Feature | |--------|-----|-----------|------------------|-------------|-------------| | GaussianBlendshape | 370 | 70K | Custom blendshapes | No | Linear blending identical to mesh blendshapes, 32-39 dB PSNR | | SplattingAvatar | 300+ | ~100K | FLAME mesh | No | Mesh-embedded via barycentric coords, 30 FPS on iPhone 13 | | FlashAvatar | 300 | 10-50K | FLAME | Small MLP | UV-based init on FLAME, small MLPs for expression offsets | | GaussianAvatars | 90-100 | ~100K | FLAME | No | FLAME-rigged, multi-view training, explicit binding | | HHAvatar | ~100 | ~150K | FLAME | Temporal modules | First method for dynamic hair physics modeling | | MeGA | ~90 | ~200K | FLAME (face) + 3DGS (hair) | No | Hybrid mesh+Gaussian, occlusion-aware blending, editable | Standout Methods for Real-Time Communication: GaussianBlendshape and SplattingAvatar use purely explicit representations with no runtime neural networks, enabling deterministic rendering and direct ARF compatibility. Mesh-Embedded Gaussian SplattingTechnical Approach: - Each Gaussian anchored to animatable mesh supporting standard blendshapes and skeletal skinning - Parameterization: triangle index + barycentric coordinates + optional offset vector in local tangent-normal frame - Runtime: receiver deforms mesh using joint transforms and blendshape weights, then reconstructs Gaussian center from animated triangle vertices using barycentric weights - No per-frame neural inference required - purely algebraic reconstruction ensures deterministic motion Orientation and Footprint Handling: - Gaussians stored in local frame aligned to triangle (per-axis scales + local rotation) - Local-to-world transform from animated triangle frame transports covariance - Keeps projected splat stable under motion, avoids jitter - Appearance parameters (opacity, color coefficients) remain static unless dynamic effects explicitly modeled Standardization Advantages: - Reuses same animation signals as mesh avatar - Enables graceful fallback: mesh-only renderers can ignore Gaussian extension and still animate - 3DGS-capable renderers can render Gaussians alone or hybrid mesh+Gaussians composition Limitation: Coarse driving mesh can restrict fine-scale effects (lip roll, eyelid thickness, hair motion). Addressed by higher resolution parametric meshes, local offsets, or dedicated Gaussian subsets for non-mesh components. Gaussian BlendshapesTechnical Approach: - Mirrors classical mesh blendshape animation - Each Gaussian has neutral parameters + per-expression deltas for center position, scale, opacity - Runtime computes linear combination identical to mesh blendshape pipeline - Key advantage: Determinism and ARF-friendly control - same blendshape weight stream drives both mesh vertices and Gaussian deltas Hybrid Methods with Small MLPsTechnical Approach: - Parametric model for global control + small neural modules outputting residual offsets conditioned on expression, pose, or time - Improves fine detail and handles effects difficult to capture with purely linear blendshapes Tradeoff: Runtime inference and model distribution become part of interoperability (model versioning, determinism, platform-specific performance) Full-Body Avatar MethodsFull-body methods have converged on SMPL/SMPL-X parametric body models, enabling compatibility with standard skeletal animation systems. Method Comparison| Method | FPS | Gaussians | Body Model | Training | Key Feature | |--------|-----|-----------|------------|----------|-------------| | GauHuman | 189 | ~13K | SMPL | 1-2 min | Fastest training, ~3.5 MB storage, KL divergence split/clone | | HUGS | 60 | ~200K | SMPL | 30 min | Disentangles human/scene | | ASH | ~60 | ~100K | SMPL | ~1 hour | 2D texture-space parameterization, Dual Quaternion skinning, motion retargeting | | GART | >150 | ~50K | SMPL | sec-min | Latent bones for non-rigid deformations (dresses, loose clothing) | | ExAvatar | ~60 | ~150K | SMPL-X | ~2 hours | Only SMPL-X method with unified body/face/hand animation | Standout Methods: - GauHuman: Best combination of minimal storage (~3.5 MB) and fast training (1-2 min) - ExAvatar: Only method providing unified body/face/hand animation through SMPL-X - critical for immersive communication Animation Architecture: - Body model provides compact, standardized animation interface - Base avatar: static set of Gaussians + binding metadata - Runtime: joint transforms from SMPL/SMPL-X pose parameters deform body via skinning - Gaussian propagation: surface anchoring (barycentric/UV coordinates) or direct skinning weights per Gaussian - Enables motion retargeting by sending only pose stream while keeping high-fidelity Gaussian appearance fixed Non-Rigid Effects Challenge: - Clothing, long hair, accessories don't follow body surface with rigid skinning - Solutions: latent bones or local deformation modules (additional control points beyond SMPL skeleton) - ARF integration consideration: Distinguish between body-locked Gaussians (fully driven by standardized skeleton) and secondary Gaussians (may require optional control signals or local simulation) Distribution Size Considerations: - Full-body avatars require tens to hundreds of thousands of Gaussians - Each Gaussian includes geometry and appearance attributes - Compression and level-of-detail essential for real deployments - Practical ARF profile should specify default Gaussian count budget and allow progressive refinement layers for high-end devices Animation Compatibility ClassificationMethods classified into three categories based on runtime architecture: 1. Purely Explicit (no MLPs)Methods: SplattingAvatar, GaussianBlendshape, GaussianAvatars - Performance: 300-370 FPS - ARF Compatibility: Direct mapping - Animation: Driven entirely by standard skeletal joints and blendshape weights - Fully compatible with ARF Animation Stream Format 2. Hybrid (small MLPs)Methods: 3DGS-Avatar, FlashAvatar, HUGS - Performance: 50-100 FPS (near-real-time) - Architecture: Small MLPs add expression-dependent offsets without fundamentally changing animation interface - ARF Integration: Can still be driven by blendshape parameters with MLP weights distributed as part of base avatar 3. Fully NeuralMethods: Gaussian Head Avatar, GaussianHead - Training: 1-2 days - Latency: Higher - ARF Integration: May be integrated into ARF containers as proprietary customized models Interoperability Key Question: Not whether MLP exists, but whether animation interface remains the same. If driven solely by joints and blendshape weights, ARF Animation Stream Format remains sufficient and decoder only needs renderer choice. Determinism Considerations: - Explicit methods: Naturally deterministic given fixed floating-point rules, no platform-specific neural inference dependency - Hybrid methods: Viable if MLP is small and shipped as part of base avatar, but conformance should define fixed operator sets and numerical tolerances - Fully neural pipelines: Better treated as optional proprietary components inside ARF container rather than baseline interoperable tool Proposed Architecture for ARF IntegrationFour-Step Integration ApproachStep 1: Storage - Store mesh-embedded Gaussians as auxiliary data within glTF/ARF containers - Parameterization: relative to mesh surface using barycentric coordinates (SplattingAvatar) or linear blendshape offsets (GaussianBlendshape) - Preserves backward compatibility with mesh-only renderers Step 2: Animation - Animate via standard skeletal and blendshape parameters already defined in ARF Animation Stream Format - No changes to animation stream required - Gaussian positions derived from same joint transforms and blendshape weights used for mesh animation Step 3: Compression - Apply GS compression for Gaussian attributes within base avatar to minimize distribution size Step 4: Streaming - Stream only AAUs at approximately 40 KB/s for real-time animation - Base avatar (including compressed Gaussian data) distributed once at session establishment - Enables high-quality Gaussian splatting rendering on capable devices while maintaining mesh-based rendering compatibility on constrained devices Deployment RequirementsCapability Exchange: - Endpoints signal support for 3DGS rendering - Supported attribute sets - Supported Gaussian count budgets - Fallback to mesh rendering if 3DGS not supported or resources constrained - Avoids ecosystem fragmentation and maintains backward compatibility ProposalsThe document proposes that SA4 considers the following for FS_Avatar_Ph2_MED study:
|
|
| Qualcomm Atheros, Inc. |
[FS_Avatar_Ph2_MED] Avatar Evaluation Framework and Objective Metrics
|
Summary of S4-260121: Avatar Evaluation Framework and Objective MetricsIntroductionThis contribution addresses Objectives 2 and 3 of the Avatar Communication Phase 2 SID (SP-251663), which concern QoE metrics, evaluation frameworks, and evaluation criteria for animation techniques. The document proposes a practical evaluation methodology designed to deliver repeatable, automated, and vendor-neutral results based on a core principle: evaluate what the user actually sees by measuring quality from rendered video output rather than internal system parameters. Evaluation FrameworkDesign PrinciplesThe framework is built on four key principles:
Testbed ArchitectureThe proposed testbed comprises five key components:
Objective Metrics for Avatar EvaluationThe contribution proposes metrics across three quality dimensions: Visual Quality Metrics
Animation Quality MetricsVideo-based computation extracting landmarks and skeletons from rendered output:
Temporal and Synchronization MetricsProposed for second phase evaluation due to complexity:
Test ContentStandardized animation streams should cover:
Each test set should contain reference audio, reference animation streams, and reference rendered video from both high-quality reference pipeline and source capture. ProposalsThe contribution proposes to:
|
|
| Qualcomm Atheros, Inc. |
[FS_Avatar_Ph2_MED] Interoperability guidance for ARF
|
Interoperability Guidance for ARFIntroductionThis contribution addresses the FFS noted in TS 26.264 clause 5.6.1 regarding evaluation of MPEG ARF and interoperability aspects. The key interoperability challenge is mapping: receivers can only animate an avatar if they can correctly map incoming animation parameters to the appropriate Skeleton, BlendshapeSet, and LandmarkSet in the ARF container. ISO/IEC 23090-39 defines signalling to declare supported animation frameworks and provide mapping tables. This contribution proposes concrete interoperability guidance with detailed examples for both linear and non-linear mappings. Interoperability FrameworkInteroperability PrinciplesThe proposed guidance is based on four core principles:
Mapping Signalling in ARFARF provides three signalling layers for mapping between animation frameworks: SupportedAnimations: Lists supported face, body, hand, landmark, and texture animation profiles as URNs. Each URN identifies a framework and specific parameter set (e.g., blendshape set or joint set). AnimationInfo and AnimationLink: Each animatable asset in components (Skeleton, BlendshapeSet, LandmarkSet) includes animationInfo. Each AnimationLink points to one SupportedAnimations entry as the target for that asset. Mapping Objects: When additional frameworks are used for capture or streaming, animationInfo can include Mapping objects that map from a source SupportedAnimations entry to the target entry. Two mapping types are supported: - LinearAssociation: Expresses a weighted sum from multiple source parameters to one target parameter - NonLinearAssociation: Expresses non-linear transforms using one or more channels with lookup tables and interpolation Mapping indices refer to parameter identifiers in the animation stream (ShapeKey.id for blendshapes, target joint index for joint animation, target landmark index for landmark animation). Receiver Processing ProcedureThe receiver applies the following procedure:
Mapping MechanismsDirect Match and Identifier SpacesThe simplest case occurs when the sender generates the animation stream using the same framework and parameter set as the target asset in ARF. | Scenario | Typical Issue | ARF Signalling and Behaviour | |----------|---------------|------------------------------| | Direct match | Stream profile and parameter identifiers match target assets in ARF container | No mapping needed. Receiver applies parameters directly. ARF document declares profile in SupportedAnimations and links target assets with AnimationLink.target | | Subset | Source and target use same semantics but target has fewer parameters | Unmapped target parameters default to neutral values | Linear MappingsLinear mappings are suitable when a target parameter can be expressed as a weighted sum of one or more source parameters. Typical use cases include mirroring left/right shapes, splitting/merging parameters, and simple scaling. Represented in ARF by LinearAssociation with targetIndex, sourceIndices, and weights. Examples: | Target Parameter (ARF) | Source Parameters (Stream) | Linear Association | |------------------------|---------------------------|-------------------| | Smile (targetIndex 12) | mouthSmileLeft (5), mouthSmileRight (6) | w12 = 0.5w5 + 0.5w6 | | JawOpen (targetIndex 3) | jawOpen (13) | w3 = 1.0w13 | | MouthCornerPull (targetIndex 20) | mouthSmileLeft (5), mouthSmileRight (6), cheekSquintLeft (26), cheekSquintRight (27) | w20 = 0.4w5 + 0.4w6 + 0.1w26 + 0.1*w27 | Non-linear MappingsNon-linear mappings are needed when linear blending is insufficient. Typical cases include dead zones, saturation, perceptual calibration curves, and gating where one parameter modulates another. Represented in ARF by NonLinearAssociation. Each channel maps one source parameter through a lookup table defined by Data items. Channel outputs are combined using COMBINATION_SUM or COMBINATION_MUL. Examples: | Target Parameter (ARF) | Source Parameter(s) | Non-linear Mapping | |------------------------|---------------------|-------------------| | JawOpen (targetIndex 3) | jawOpen (13) | Piecewise curve with deadzone and saturation. Example input [0.0,0.1,0.4,1.0] maps to output [0.0,0.0,0.7,1.0] with INTERPOLATION_LINEAR | | Blink (targetIndex 7) | eyeBlinkLeft (1), eyeBlinkRight (2) | Each eye uses threshold curve. INTERPOLATION_STEP to convert soft signal into binary blink. Combine with COMBINATION_SUM and clamp to [0,1] | | MouthOpenSmile (targetIndex 30) | jawOpen (13) and Smile (12 after linear mapping) | Use COMBINATION_MUL to gate smile by jaw opening. Channel 1 maps jawOpen through deadzone curve. Channel 2 maps smile through S curve. Multiply channel outputs | | BrowRaise (targetIndex 15) | browInnerUp (9) | Gamma curve to better match target rig. Example output = pow(input, 0.5). Approximated with LUT and INTERPOLATION_CUBICSPLINE | | Landmark mouthMidTop (targetIndex 18) | landmarks 50 and 52 | Non-linear only if needed for stabilization or bias compensation. Example: apply LUT to compress extreme motion before writing 2D or 3D coordinate | ProposalThe contribution proposes:
|
|
| Nokia |
[FS_Avatar_Ph2_MED] Draft LS on MPEG I ARF compression aspects
|
3GPP SA4 LS on Compression Aspects of MPEG-I ARF (ISO/IEC DIS 23090-39)Document OverviewThis is a Liaison Statement (LS) from 3GPP TSG SA WG4 to ISO/IEC JTC1/SC29/WG7 and WG3 regarding compression aspects of avatar representation formats for Release 20 work on avatar communication Phase 2. Background ContextRelease 19 Baseline
Release 20 Phase 2 Study
Technical Questions to ISO/IECSA4 is seeking clarification on two critical aspects: Question 1: Existing MPEG Compression TechnologiesAre there existing MPEG technologies that can be utilized to compress: - Avatar static data (especially meshes) - Avatar animation data including: - Blend shape sets - Skeletal animation - Other animation-related information Question 2: Integration TimelineIf such compression technologies exist: - Are there plans to integrate them into ISO/IEC DIS 23090-39? - What is the anticipated timeline for such integration in the context of 3GPP Release 20 schedule? Requested ActionSA4 formally requests ISO/IEC SC29/WG7 and ISO/IEC SC29/WG3 to provide answers to both questions above, considering the Release 20 timeline constraints. |
|
| Nokia |
[FS_Avatar_Ph2_MED] Considerations on security aspects
|
Summary of S4-260190: Considerations on Security Aspects for Avatar Phase 2Document OverviewThis contribution from Nokia addresses security-related gaps in the Rel-20 study item FS_Avatar_Ph2_MED, specifically focusing on security mechanisms for Avatar communications in 3GPP systems. Background and ContextStudy Item ScopeThe Rel-20 SID FS_Avatar_Ph2_MED (approved at SA#110, December 2025) aims to address gaps from previous work and resolve open points identified in TS 26.264 Rel-19. Objective 6 specifically mandates collaboration with SA3 to study security implications including: - Identification and authentication (including schemes for Avatar-related APIs) - Privacy preservation - Content protection (e.g., watermarking and DRM) - Secure distribution mechanisms for Avatar data Current Status in SpecificationsTS 26.264 Gaps: - No dedicated security clause exists - Clause 5.6.2.2 NOTE 2 identifies content protection aspects as FFS TR 26.813 Coverage: - Clause 8 describes Access Protection mechanisms for BAR API - Clause 9 addresses security and privacy aspects - However, no exploration of how these methods apply to Avatar calls in 3GPP systems - Conclusion acknowledges need for robust authentication, encryption, and DRM mechanisms with further SA3 collaboration TS 33.328 Limitations: - New Annex R (Rel-19) specifies security for IMS avatar communication - Covers procedures to prevent UE from providing unauthorized Avatar IDs - Covers authorization for avatar downloads from BAR - Does not cover security controls to prevent sending UE from using fake avatar representations not belonging to the user Key Observations
Technical ProposalThe contribution proposes adding a new sub-clause (suggested as 8.3.4) to the base CR for TR 26.813, specifically under Clause 8 (Avatar integration into 3GPP services and enablers). This new sub-clause should:
|
|
| Nokia |
[FS_Avatar_Ph2_MED] Authentication for avatar data
|
Summary of S4-260192: Authentication for Avatar DataDocument OverviewThis contribution from Nokia proposes authentication mechanisms for avatar data in IMS-based avatar calls as part of the FS_Avatar_Ph2_MED study item (Rel-20). The document addresses security gaps identified in Rel-19 TS 26.264, specifically focusing on authentication schemes for avatar-related APIs. Background and MotivationThe Rel-20 SID FS_Avatar_Ph2_MED (approved at SA#110, Dec 2025) includes an objective to study security implications in collaboration with SA3, covering: - Identification and authentication (including schemes for Avatar related APIs) - Privacy preservation - Content protection (watermarking and DRM) - Secure distribution mechanisms for Avatar data Currently, TR 26.813 and TS 33.328 do not address these security aspects. Main Technical ContributionsProposed Security Framework for IMS-based Avatar CallsThe contribution proposes adding a new sub-clause 8.3.4 covering security considerations for IMS-based avatar calls. Authentication MechanismCore Concept: - Introduces a Digital Credential-based solution using Base Avatar Assertion (BAA) - BAA cryptographically binds the Base Avatar Representation to the avatar owner - Ensures that a base avatar represents the actual user of the avatar Architecture Components:
Base Avatar Assertion (BAA) Structure: - Digital Credential proving a Base Avatar represents a user owning a specific private/public key pair - Generic structure shown in Figure Y (referenced but not detailed in text) Operational ProceduresBAA Issuance Procedure (Steps 1-7):
Avatar Authentication Procedure (Step 8):
Implementation ExampleFigure Z provides an example implementation of authenticator and issuer in the current system architecture (specific details not provided in text). ProposalThe contribution proposes to add the above content as a base CR to address authentication requirements for avatar data in IMS-based avatar calls. |
|
| InterDigital Pennsylvania |
[FS_Avatar_Ph2_MED] Media Configuration for Avatar Calls
|
Summary of S4-260226: Media Configuration for Avatar CallsIntroductionThis discussion paper addresses media configuration requirements for AR-MTSI clients supporting Avatar communication within the context of the Study on Avatar communication Phase 2. The Phase 2 study focuses on enabling additional Avatar use cases and enhancing Avatar-based RTC services with emphasis on quality of experience and advanced animation features for photo-realistic and immersive user experiences. Background on Existing AR-MTSI Media ConfigurationCurrent AR Support Parameters (TS 26.264 Clause 7)The document reviews existing media configuration requirements defined in TS 26.264 for AR-MTSI clients:
Current Avatar Support Parameters (TS 26.264 Clause 7.3.1)
Network-Assisted Avatar Rendering (TS 26.264 Clause 7.3.2)When network animation and rendering is requested, an AR AS shall: - Allocate an MF capable of real-time avatar rendering - Configure the MF with appropriate rendering parameters based on receiving UE's video capabilities - Modify SDP to route avatar animation data to the MF instead of receiving UE, inserting the MF into the media path Identified GapThe document identifies a critical gap: TS 26.264 has not yet documented the media configuration details for an AR-MTSI client in terminal that intends to participate in an avatar call. When media configuration details were proposed in S4-251845 at SA4-134 Dallas meeting, feedback indicated that the behavior of IMS network elements is unspecified in the IMS architecture when an MTSI client sends the new Contact header field parameters "+sip.3gpp-ar-support" and/or "+sip.3gpp-avatar-support" in a SIP REGISTER message. ProposalThe document proposes to send a Liaison Statement to SA2 requesting: - Definition of IMS network behavior when an AR-MTSI client registers with Contact header field "+sip.3gpp-ar-support" and/or "+sip.3gpp-avatar-support" - Specification of how to provide a suitable MF capable of providing AR rendering and/or avatar rendering support in an IMS session A draft LS is provided in companion document S4-260227. |
|
| InterDigital Pennsylvania |
[FS_Avatar_Ph2_MED] LS on IMS network behaviour for new Contact header parameters
|
LS on IMS Network Behaviour for New Contact Header ParametersDocument Information
Overall DescriptionContext and BackgroundSA4 has introduced new media configuration requirements in TS 26.264 for Augmented Reality (AR) and avatar-based MTSI clients. These developments have architectural implications for IMS network behavior that require SA2's attention and clarification. AR-MTSI Client Requirements (Clause 7 of TS 26.264)New Contact Header Parameter:
Avatar Support Requirements (Clause 7.3 of TS 26.264)New Contact Header Parameter:
Network-Based Avatar Rendering (Clause 7.3.2): - When network-based avatar animation/rendering is requested, an AR Application Server shall: - Allocate a Media Function (MF) capable of real-time avatar rendering - Configure the MF based on receiving UE's video capabilities - Modify SDP to insert the MF into the media path Identified Architectural GapSA4 has identified that IMS architecture specifications do not currently define the behavior of IMS network elements when MTSI clients include these new Contact header field parameters ( Question to SA2Is there any architectural guidance on: - Whether or how IMS entities should interpret these parameters? - How suitable Media Functions providing AR rendering and/or avatar rendering support should be selected and invoked? Request to SA2If guidance is not currently available, SA4 requests SA2 to consider studying, in the IMS architecture specifications, the expected behavior of IMS network elements when an AR-MTSI client registers its capabilities via Contact header field parameters. This includes (but is not limited to): - Mechanisms for recognizing terminal capabilities during registration - Enabling the provisioning or insertion of appropriate Media Functions to support: - AR rendering - Avatar rendering - Future similar services within an IMS session RationaleSA4 believes such clarification would: - Ensure architectural consistency - Facilitate interoperable deployment of AR and avatar-based services in IMS Action RequestedSA4 kindly asks SA2 to review the above information and provide guidance on the way forward. |
|
| InterDigital New York |
Avatar-udpate to section 6.3.4
|
Technical Summary: AVATAR - Update to Section 6.3.4OverviewThis document provides a comprehensive update to section 6.3.4 concerning the MPEG Avatar Representation Format (ARF), now standardized as ISO/IEC 23090-39. The document reflects the progression of the standard from its initial development phase to reaching Committee Draft International Standard (CDIS) stage. MPEG Avatar Representation Format DevelopmentScope and ObjectivesThe MPEG WG03 (Systems) workgroup is developing a new standard for avatar representation format with the following scope:
Requirements and PrioritiesThe Phase 1 requirements are categorized with three priority levels (High, Medium, Low) across multiple categories: High Priority Requirements: - Suitable exchange format for conversion between avatar representation formats - Mesh-based format for representation and animation - Signal coding format - Semantic and signal representation - Multiple levels of detail for geometry - Facial and body animation - Delay-sensitive animation streams - Partial transport of base avatar - Various storage and transport capabilities Medium Priority Requirements: - DRM protection support - Integration into scene description - Avatar authenticity and user association protection Low Priority Requirements: - Avatar-avatar, user-avatar, avatar-scene interactions - Storage and replay of animation streams ARF Data Model and StructureCore ComponentsThe ARF data model (Figure 12) includes the following components: Preamble Section: - Signature string for unique document identification - Version string tied to specific ARF revision - Optional authenticationFeatures (encrypted facial and voice feature vectors with public key URI) - supportedAnimations object specifying compatible animation frameworks (facial, body, hand, landmark, texture) - Optional proprietaryAnimations for vendor-specific schemes Metadata Object: - Avatar-level descriptive information (name, unique identifier, age, gender) - Used for experience adaptation and policy/access control Components Section: - Skeleton: Defines joints with inverse bind matrices, optional animationInfo - Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4×4 matrix transformations - Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights - Mesh: Geometric primitives with name, ID, optional path, geometry data items - BlendshapeSets: Shape targets for base mesh with optional animationInfo - LandmarkSets: Vertex/face indices with barycentric weights for tracked landmarks - TextureSets: Material resources with texture targets and animation links Container FormatsTwo container formats are supported:
Both formats support partial access to avatar components. Integration with MPEG Scene DescriptionScene Description Integration
Reference Client ArchitectureThe reference architecture (Figure 13) includes:
Animation Bitstream FormatAvatar Animation Units (AAUs)The animation stream format uses AAUs as the fundamental structure (Figure 14): AAU Structure: - Header: - AAU type (7-bit code) - AAU payload length (bytes) - Payload: - 32-bit timestamp in "ticks" - Type-specific data - Optional padding for byte alignment AAU Types: - AAU_CONFIG: Configuration unit - AAU_BLENDSHAPE: Facial animation sample - AAU_JOINT: Body/hand joint animation sample - AAU_LANDMARK: Landmark animation sample - AAU_TEXTURE: Texture animation sample Configuration UnitsConfiguration AAUs communicate stream-level parameters: - Animation profile string (UTF-8 encoded) - Timescale value (32-bit float, ticks per second) Facial Animation Samples (AAU_BLENDSHAPE)Structure includes: - Target blendshape set identifier - Per-blendshape confidence flag - Number of blendshape entries - For each entry: blendshape index, weight (32-bit float), optional confidence (32-bit float) Deformation Formula:
Joint Animation Samples (AAU_JOINT)Structure includes: - Target joint set identifier - Per-joint velocity flag - Number of joint entries - For each entry: joint index, 4×4 transformation matrix (16 floats), optional 4×4 velocity matrix Linear Blend Skinning (LBS) Formula:
Landmark Animation Samples (AAU_LANDMARK)Structure includes: - Landmark set ID - Velocity and confidence flags - Dimensionality flag (2D vs. 3D) - Number of landmarks - For each landmark: index, coordinates (2D or 3D), optional velocity and confidence Use cases: facial tracking overlays, sensor-mesh registration, animation data calibration Texture Animation Samples (AAU_TEXTURE)Structure analogous to blendshape samples but applied to texture targets: - Controls parametric texture effects (micro-geometry patterns, makeup, dynamic material variations) Animation Stream DeliveryDual delivery modes: 1. Live transmission: Sequences of AAUs for real-time avatar driving 2. Stored format: Avatar animation tracks in ISOBMFF-based ARF container with sample grouping for pre-recorded sequences ("smile," "wave," "dance") Ongoing Exploration ExperimentsThe group continues exploration on:
|
|
| InterDigital Canada |
[FS_Avatar_Ph2_MED] Procedures for BAR API Operations
|
Comprehensive Summary: Procedures for BAR API Operations1. Introduction and ContextThis contribution addresses procedures for Base Avatar Repository (BAR) APIs that were defined in Rel-19 as part of the AvCall-MED work item and integrated into TS 26.264 Annex B. The document was originally presented as S4-251909 at SA4#134 meeting but was redirected to the Rel-20 FS_Avatar_Ph2_MED study. The contribution provides detailed operational procedures for BAR APIs enabling UE or MF interaction with the Base Avatar Repository. 2. Base Avatar Models API Procedures2.1 Create Base Avatar ModelProcedure Flow:
- Requestor (DC AS or MF) invokes Request Information Elements: - Security credentials (M) - Binary ARF container (M) Response Information Elements: - Avatar resource entity (CM) - present on successful creation Note: DC AS or BAR apply restrictions on created avatar container (location access, User ID authentication, etc.) 2.2 Get Base Avatar ModelProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response Information Elements: - Avatar resource entity (M) - Binary container (M) 2.3 Update Base Avatar ModelProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) - Binary container (CM) - PUT only - AssetIds (CM) - PATCH only - Assets (CM) - PATCH only Response Information Elements: - Avatar resource entity (CM) - present on successful update 2.4 Delete Base Avatar ModelProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response: No payload 3. Assets API Procedures3.1 Create AssetProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) - Binary asset (M) Response Information Elements: - Avatar resource entity (M) - updated container with new asset 3.2 Retrieve AssetProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response Information Elements: - Binary asset (M) 3.3 Update AssetProcedure Flow:
- Requestor invokes Request Information Elements: - Requestor identifier (M) - Security credentials (M) - Asset (CM) Response Information Elements: - Avatar resource entity (CM) - present on successful update 3.4 Delete AssetProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response: No payload 4. Avatar Representations API Procedures4.1 Create Avatar RepresentationProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) - Avatar representation (M) - with avatarId and assetIds properties set Response Information Elements: - AvatarRepresentation resource entity (CM) - present on successful creation 4.2 Retrieve Avatar RepresentationProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response Information Elements: - AvatarRepresentation resource entity (M) - Binary container (M) 4.3 Update Avatar RepresentationProcedure Flow:
- Requestor invokes Note: Only avatar representation owner allowed to modify representation Request Information Elements: - Security credentials (M) - Avatar Representation (CM) - PUT only - Source Asset Ids (CM) - PATCH only - New Asset Ids (CM) - PATCH only Response Information Elements: - AvatarRepresentation resource entity (CM) - present on successful update 4.4 Destroy Avatar RepresentationProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response: No payload 5. Associated Information API Procedures5.1 Retrieve Associated InformationProcedure Flow:
- Requestor invokes Request Information Elements: - Security credentials (M) Response Information Elements: - AssociatedInfo object (M) 6. ProposalThe contribution proposes to: - Document section 2 contents as new clause 8.3.3.4 in aggregated CR to TR 26.813 - Add editor's note to clause 8.3.3.2 indicating need for updates to reflect BAR APIs defined in TS 26.264 7. ReferencesIETF RFC 2046: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types" |
|
| InterDigital New York |
Avatar-udpate to section 6.3.4
|
Summary of S4-260285: AVATAR-Update to section 6.3.4Document OverviewThis contribution updates section 6.3.4 of the AVATAR specification to align with the current status of MPEG Avatar Representation Format (ARF) work. The document reflects progression of ISO/IEC 23090-39 from early development to Committee Draft International Standard (CDIS) stage. Main Technical ChangesMPEG ARF Specification Status Update
Avatar Data Model and Representation FormatRestructured Data Model DescriptionThe document significantly restructures how the ARF data model components are described: High-level Avatar Information (Metadata Object): - Name, Identifier, Age, Gender - Holds avatar-level descriptive information for system adaptation and policy control Preamble Section (new addition): - Signature string for unique document identification - Version string tied to specific ARF revision - Optional authenticationFeatures with encrypted facial/voice feature vectors and public key URI - supportedAnimations object specifying compatible facial, body, hand, landmark, and texture animation frameworks using URNs - Optional proprietaryAnimations for vendor-specific schemes (e.g., ML-based reconstruction models) Components Section (detailed expansion): - Skeleton: Defines joints as scene graph nodes subset, references inverse bind matrices data item (Nx16 tensor), optional animationInfo - Node: Scene graph objects with names, IDs, parent/child relations, semantic mappings, TRS or 4x4 matrix transformations - Skin: Links mesh to skeleton, optional blendshape/landmark/texture sets, per-vertex joint weights tensor (NxM) - Mesh: Geometric primitives with name, ID, optional path, data items containing geometry - BlendshapeSets: Shape targets for base mesh, references geometry-only shapes (GLB files), optional animationInfo - LandmarkSets: Vertex/face indices with barycentric weights for landmark positioning - TextureSets: Material resources linked to texture targets and animation frameworks Container Format
Scene Description Integration
Reference Software (ISO/IEC 23090-43)Major update from "under development" to defined implementation: arfref Module (C++ and Python): - Parsing of ARF containers - Helper functions for asset decoding - Partial glTF 2.0 encoding/decoding support for meshes - Animation mapping (AnimationLink objects) - Animation stream decoding - Available through Python language arfviewer Module: - Avatar Animation Units (AAUs) support - Time-sequence blendshape weights with optional confidence metrics - Joint transformations for skeletal animation - AAU format with chronological data blocks - Inverse kinematics system for missing joint information - Blendshape animator managing neutral mesh vertices and deltas with weighted summation Reference Client Architecture
Animation Bitstream FormatComprehensive new section detailing AAU-based animation stream format: Avatar Animation Units (AAUs) Structure
AAU Types Defined
Configuration Units
Facial Animation Samples (AAU_BLENDSHAPE)
Joint Animation Samples (AAU_JOINT)
Landmark Animation Samples (AAU_LANDMARK)
Texture Animation Samples (AAU_TEXTURE)
Animation Stream Delivery
Exploration ExperimentsStatus changed from "initiated" to "continues":
Editorial Corrections
|
Total Summaries: 11 | PDFs Available: 11