3GPP Technical Document Summary: CR 0003 Rev 1 for TS 26.813 (Avatar Services)
Document Overview
This is a Category F (correction) Change Request for TS 26.813 Release 19, version 19.1.0, addressing editorial corrections and reference updates across multiple sections of the Avatar specification.
Main Technical Contributions
Section 2: References Updates
- Corrected normative and informative references including:
- Updated 3GPP TR 22.856 to V19.2.0 (Localized Mobile Metaverse Services)
- Updated 3GPP TS 23.228 to V19.2.0 (IMS Stage 2)
- Corrected formatting and accessibility information for external references (AFLW, LFPW, WFLW datasets)
- Added new reference [34] for I.-J. Hirsh and C. E. Sherrick study on perceived order in different sense modalities
- Added new reference [35] for Hyojin Park et al. study on lip movements and brain oscillations
Section 5: Use Cases and Potential Requirements
UC1: Avatar Communication
Editorial corrections:
- Removed duplicate requirement regarding bidirectional transitioning between video and avatar media
- Clarified voice-driven avatar animation requirements
- Enhanced description of audio-visual synchronization for accessibility
Key technical aspects maintained:
- Real-time facial expression and body movement metadata extraction based on ASR/NLP
- Voice-driven 2D/3D avatar animation generation and encoding/decoding
- Avatar sharing, modification, and real-time composing capabilities
- Lip movement synchronization for enhanced intelligibility in noisy environments
UC2: Multi-party Shared Experiences
Corrections:
- Refined description of collaborative working scenarios
- Added specific use case for classroom environments with differential avatar expression control
- Added teenage social gathering scenario with selective expression filtering
Technical requirements:
- Secure end-to-end avatar expression delivery for specific audiences
- Network-based capability for filtering/replacing avatar expressions for selected audiences
- Support for filtering or replacing selected subsets of face and body expressions during avatar animation generation
UC3: Multi-user Gaming
Editorial improvements:
- Clarified synchronization requirements between different modalities
- Updated KPI requirements with proper formatting
Key synchronization thresholds maintained:
- Audio-tactile: 12ms (audio first), 25ms (tactile first)
- Visual-tactile: 30ms (video first), 20ms (tactile first)
- Audio-visual: 20ms (audio first), 20ms (video first)
KPI requirements:
- End-to-end latency: 5-20ms
- Service bit rate: 1-1000 Mbit/s
- Positioning accuracy: <1m
UC4: Avatar Generation, Storage, and Access
Corrections:
- Reformatted requirements section for clarity
- Consolidated requirements from TR 22.856
- Enhanced description of avatar lifecycle management
Technical requirements:
- Avatar generation, update, and animation support (UE and/or network-based)
- Avatar metadata for parental control and access management
- Discovery and negotiation of network functions for avatar processing
- Upload, storage, search, transmission, and update capabilities
- Avatar identification and mapping to subscribers
- Authorization and usage rights management
UC5: AI-Based Avatar
Editorial updates:
- Clarified autonomous virtual alter ego concept
- Enhanced description of AI avatar capabilities
Technical requirements:
- AI/ML model transfer per TR 23.700-80 framework
- Secure registration, storage, and updating of AI-based avatars
- Third-party authentication assistance for avatar usage
- Network storage for application-specific user data
- Multi-modal instruction transmission for real-time interaction
- Charging information collection for avatar-associated communications
Section 6.3.4: MPEG Avatar Representation Format
Corrections:
- Updated reference to ISO/IEC 23090-39 (ARF specification)
- Clarified ARF data model components
- Enhanced description of ARF container formats
Technical content maintained:
- ARF data model including skeletons, meshes, blendshapes, skins, landmarks, and nodes
- Support for ISOBMFF and Zip-based containers
- Integration with MPEG Scene Description (ISO/IEC 23090-14)
- Reference client architecture with Avatar pipeline in Media Access Function
- Ongoing exploration experiments for compression, geometry integration, animation formats, content discovery, and animation controllers
Section 7.1: General Architecture and Call Flows
Corrections:
- Removed leftover editorial brackets in text
- Clarified avatar function descriptions
Key architectural components:
- Avatar Storage: Base avatar storage with access control and authorization
- Avatar Animation: Retrieves base avatar and performs animation using format-specific data streams
- Scene Management: Creates and composes shared 3D scenes with avatar integration
- Animation Data Generation: Converts raw signals (camera, microphone, sensors) to animation data
- Base Avatar Generation: Creates base avatar from captured inputs
Workflow distribution:
- Multiple workflow examples showing different distributions of avatar functions between sending UE, network, and receiving UE
- Workflow selection depends on service requirements, UE capabilities, and network configuration
Impact Assessment
The CR addresses editorial errors and incorrect references that would otherwise remain in the specification. No functional changes are introduced, only corrections and clarifications to existing content.