[FS_Avatar_Ph2_MED] 3D Gaussian Splatting Avatar Methods for Real-Time Communication
This contribution surveys 3D Gaussian Splatting (3DGS) methods for avatar representation in the context of the Avatar Communication Phase 2 study (FS_Avatar_Ph2_MED, SP-251663), specifically addressing Objective 3 on animation techniques for avatar reconstruction and rendering. The document evaluates 3DGS methods for real-time communication scenarios and their compatibility with MPEG Avatar Representation Format (ARF, ISO/IEC 23090-39).
Key Technical Background:
- 3DGS represents objects as sets of anisotropic 3D Gaussians (splats)
- Each Gaussian stores: 3D mean position, oriented covariance (ellipsoidal footprint), opacity, and appearance parameters (RGB or spherical harmonics coefficients)
- Rendering projects 3D Gaussians into screen space as 2D Gaussians with depth-ordered alpha compositing
- Achieves real-time rendering at 100-370 FPS on desktop GPUs with quality comparable to neural radiance fields
- Maps well to GPU compute and graphics pipelines
Critical Question for Avatar Communication: How Gaussians deform under animation - either by binding to parametric meshes (FLAME for faces, SMPL/SMPL-X for bodies) or using small neural networks for residual motion prediction.
Head/face methods differ along three practical dimensions:
Most interoperable approaches: Fully explicit runtime with Gaussians driven from same blendshape and skeletal parameters as mesh renderers.
| Method | FPS | Gaussians | Parametric Model | Runtime MLP | Key Feature |
|--------|-----|-----------|------------------|-------------|-------------|
| GaussianBlendshape | 370 | 70K | Custom blendshapes | No | Linear blending identical to mesh blendshapes, 32-39 dB PSNR |
| SplattingAvatar | 300+ | ~100K | FLAME mesh | No | Mesh-embedded via barycentric coords, 30 FPS on iPhone 13 |
| FlashAvatar | 300 | 10-50K | FLAME | Small MLP | UV-based init on FLAME, small MLPs for expression offsets |
| GaussianAvatars | 90-100 | ~100K | FLAME | No | FLAME-rigged, multi-view training, explicit binding |
| HHAvatar | ~100 | ~150K | FLAME | Temporal modules | First method for dynamic hair physics modeling |
| MeGA | ~90 | ~200K | FLAME (face) + 3DGS (hair) | No | Hybrid mesh+Gaussian, occlusion-aware blending, editable |
Standout Methods for Real-Time Communication: GaussianBlendshape and SplattingAvatar use purely explicit representations with no runtime neural networks, enabling deterministic rendering and direct ARF compatibility.
Technical Approach:
- Each Gaussian anchored to animatable mesh supporting standard blendshapes and skeletal skinning
- Parameterization: triangle index + barycentric coordinates + optional offset vector in local tangent-normal frame
- Runtime: receiver deforms mesh using joint transforms and blendshape weights, then reconstructs Gaussian center from animated triangle vertices using barycentric weights
- No per-frame neural inference required - purely algebraic reconstruction ensures deterministic motion
Orientation and Footprint Handling:
- Gaussians stored in local frame aligned to triangle (per-axis scales + local rotation)
- Local-to-world transform from animated triangle frame transports covariance
- Keeps projected splat stable under motion, avoids jitter
- Appearance parameters (opacity, color coefficients) remain static unless dynamic effects explicitly modeled
Standardization Advantages:
- Reuses same animation signals as mesh avatar
- Enables graceful fallback: mesh-only renderers can ignore Gaussian extension and still animate
- 3DGS-capable renderers can render Gaussians alone or hybrid mesh+Gaussians composition
Limitation: Coarse driving mesh can restrict fine-scale effects (lip roll, eyelid thickness, hair motion). Addressed by higher resolution parametric meshes, local offsets, or dedicated Gaussian subsets for non-mesh components.
Technical Approach:
- Mirrors classical mesh blendshape animation
- Each Gaussian has neutral parameters + per-expression deltas for center position, scale, opacity
- Runtime computes linear combination identical to mesh blendshape pipeline
- Key advantage: Determinism and ARF-friendly control - same blendshape weight stream drives both mesh vertices and Gaussian deltas
Technical Approach:
- Parametric model for global control + small neural modules outputting residual offsets conditioned on expression, pose, or time
- Improves fine detail and handles effects difficult to capture with purely linear blendshapes
Tradeoff: Runtime inference and model distribution become part of interoperability (model versioning, determinism, platform-specific performance)
Full-body methods have converged on SMPL/SMPL-X parametric body models, enabling compatibility with standard skeletal animation systems.
| Method | FPS | Gaussians | Body Model | Training | Key Feature |
|--------|-----|-----------|------------|----------|-------------|
| GauHuman | 189 | ~13K | SMPL | 1-2 min | Fastest training, ~3.5 MB storage, KL divergence split/clone |
| HUGS | 60 | ~200K | SMPL | 30 min | Disentangles human/scene |
| ASH | ~60 | ~100K | SMPL | ~1 hour | 2D texture-space parameterization, Dual Quaternion skinning, motion retargeting |
| GART | >150 | ~50K | SMPL | sec-min | Latent bones for non-rigid deformations (dresses, loose clothing) |
| ExAvatar | ~60 | ~150K | SMPL-X | ~2 hours | Only SMPL-X method with unified body/face/hand animation |
Standout Methods:
- GauHuman: Best combination of minimal storage (~3.5 MB) and fast training (1-2 min)
- ExAvatar: Only method providing unified body/face/hand animation through SMPL-X - critical for immersive communication
Animation Architecture:
- Body model provides compact, standardized animation interface
- Base avatar: static set of Gaussians + binding metadata
- Runtime: joint transforms from SMPL/SMPL-X pose parameters deform body via skinning
- Gaussian propagation: surface anchoring (barycentric/UV coordinates) or direct skinning weights per Gaussian
- Enables motion retargeting by sending only pose stream while keeping high-fidelity Gaussian appearance fixed
Non-Rigid Effects Challenge:
- Clothing, long hair, accessories don't follow body surface with rigid skinning
- Solutions: latent bones or local deformation modules (additional control points beyond SMPL skeleton)
- ARF integration consideration: Distinguish between body-locked Gaussians (fully driven by standardized skeleton) and secondary Gaussians (may require optional control signals or local simulation)
Distribution Size Considerations:
- Full-body avatars require tens to hundreds of thousands of Gaussians
- Each Gaussian includes geometry and appearance attributes
- Compression and level-of-detail essential for real deployments
- Practical ARF profile should specify default Gaussian count budget and allow progressive refinement layers for high-end devices
Methods classified into three categories based on runtime architecture:
Methods: SplattingAvatar, GaussianBlendshape, GaussianAvatars
- Performance: 300-370 FPS
- ARF Compatibility: Direct mapping
- Animation: Driven entirely by standard skeletal joints and blendshape weights
- Fully compatible with ARF Animation Stream Format
Methods: 3DGS-Avatar, FlashAvatar, HUGS
- Performance: 50-100 FPS (near-real-time)
- Architecture: Small MLPs add expression-dependent offsets without fundamentally changing animation interface
- ARF Integration: Can still be driven by blendshape parameters with MLP weights distributed as part of base avatar
Methods: Gaussian Head Avatar, GaussianHead
- Training: 1-2 days
- Latency: Higher
- ARF Integration: May be integrated into ARF containers as proprietary customized models
Interoperability Key Question: Not whether MLP exists, but whether animation interface remains the same. If driven solely by joints and blendshape weights, ARF Animation Stream Format remains sufficient and decoder only needs renderer choice.
Determinism Considerations:
- Explicit methods: Naturally deterministic given fixed floating-point rules, no platform-specific neural inference dependency
- Hybrid methods: Viable if MLP is small and shipped as part of base avatar, but conformance should define fixed operator sets and numerical tolerances
- Fully neural pipelines: Better treated as optional proprietary components inside ARF container rather than baseline interoperable tool
Step 1: Storage
- Store mesh-embedded Gaussians as auxiliary data within glTF/ARF containers
- Parameterization: relative to mesh surface using barycentric coordinates (SplattingAvatar) or linear blendshape offsets (GaussianBlendshape)
- Preserves backward compatibility with mesh-only renderers
Step 2: Animation
- Animate via standard skeletal and blendshape parameters already defined in ARF Animation Stream Format
- No changes to animation stream required - Gaussian positions derived from same joint transforms and blendshape weights used for mesh animation
Step 3: Compression
- Apply GS compression for Gaussian attributes within base avatar to minimize distribution size
Step 4: Streaming
- Stream only AAUs at approximately 40 KB/s for real-time animation
- Base avatar (including compressed Gaussian data) distributed once at session establishment
- Enables high-quality Gaussian splatting rendering on capable devices while maintaining mesh-based rendering compatibility on constrained devices
Capability Exchange:
- Endpoints signal support for 3DGS rendering
- Supported attribute sets
- Supported Gaussian count budgets
- Fallback to mesh rendering if 3DGS not supported or resources constrained
- Avoids ecosystem fragmentation and maintains backward compatibility
The document proposes that SA4 considers the following for FS_Avatar_Ph2_MED study:
Acknowledge 3D Gaussian Splatting as a viable rendering primitive for avatar communication
Coordinate with MPEG on integration of Gaussian splatting data within ARF Base Avatar Format (ISO/IEC 23090-39)
Evaluate compression techniques (SPZ, L-GSC, HAC++, Compact3D) for inclusion in study of static and animation data compression (Objective 7)
Define capability signaling and conformance points for 3DGS avatar rendering:
Required numerical tolerances for determinism
Study hybrid approaches with small MLPs - whether they warrant optional ARF profile, and if so, constrain operator sets and model sizes to preserve portability