Summary of 3GPP Change Request S4-260168
Document Information
- Source: Tencent
- Title: Pseudo-CR on 3DGS renderer and performance benchmarking
- Specification: 3GPP TR 26.958 v0.1.1
- Study: FS_3DGS_MED (3D Gaussian Splats for mobile)
Main Objective
This change request proposes adding technical content to TR 26.958 regarding a reference implementation of a 3DGS player for mobile platforms, including mobile renderer features and preliminary experimental benchmark results obtained on commercial mobile devices.
Technical Contributions
1. Mobile Renderer Architecture (Section 12.4.1)
The document proposes a hybrid architecture for the 3DGS mobile player:
2. Rendering Process Details (Section 12.4.1 - second subsection)
Key technical aspects of the mobile rendering pipeline:
- Depth Sorting: Critical back-to-front sorting performed by CPU each frame for proper alpha blending (unlike Z-buffer-based mesh rendering)
- Sorting Implementation: CPU-based Radix Sort preferred over GPU Compute Shaders on mobile for thermal balance and driver compatibility
- Data Management:
- Gaussian attributes loaded into VRAM at startup
- FP32 textures/buffers for precision in covariance and color calculations
- Only sorted indices transferred CPU→GPU per frame
- Vertex shader uses texelFetch for direct reads from persistent buffers
- Minimizes CPU-GPU bandwidth while maintaining visual fidelity
3. Benchmark Methodology (Section 12.4.2)
Proposed benchmarking approach:
- Dynamic parameter modification during runtime
- Thermal management API usage for consistent clock speeds
- AR runtime disabled during benchmarking for fair comparison
- Variable parameters:
- Number of Gaussians: 5,000 to 485,436 points
- Spherical Harmonics degree: 0 (diffuse only) to 3 (full view-dependence)
4. Experimental Results (Section 12.4.3)
Test Configuration
- Device: Google Pixel 9a (Tensor G4, mid-range, March 2025)
- Application: Tencent 3DGS mobile player
- Build: Release mode with optimizations
- Test duration: 30 seconds per configuration for thermal stability
- Model: bicycle.ply (485,436 points)
- Power measurement: Android Battery Manager API
Impact of Number of Points (SH degree=3)
Key findings from Table 1 and Figure 2:
- 5,000 points: 355 FPS, 24% CPU, 6% GPU, 1.45W
- 150,000 points: 56 FPS, 47% CPU, 88% GPU, 1.47W (approaching GPU saturation)
- 200,000 points: 45 FPS, 48% CPU, 99% GPU, 1.33W (GPU saturated)
- 485,436 points: 19 FPS, 55% CPU, 100% GPU, 1.22W
Conclusion: GPU saturation occurs at ~150k points (87% load) and full saturation at 200k points. Beyond saturation, frame rate decreases linearly with point count.
Impact of Spherical Harmonics Degree (485k points)
Key findings from Table 2 and Figure 3:
- SH Degree 0: 20.41 FPS, 55% CPU, 100% GPU, 1.45W
- SH Degree 3: 18.05 FPS, 55% CPU, 100% GPU, 0.99W
- Performance impact: ~10.8% FPS reduction from degree 0 to 3
Conclusion: Moderate frame rate impact when increasing SH degree from 0 to 3.
5. Overall Analysis (Section 12.4.2.3)
Key conclusions:
- Real-time rendering of complex 3DGS scenes is feasible on current-generation mobile hardware
- Scene complexity management required (< 200k visible points recommended)
- Performance variations observed between identical experiments due to:
- Background processes
- Dynamic power management
- Results should be considered as trends rather than fixed values
Editor's note: Additional benchmarks planned to evaluate impact of other improvements (memory optimization, quantization, sorting algorithms, etc.)
Rationale for Change
- Provides concrete data to validate real-time 3DGS feasibility on mobile hardware
- Identifies performance bottlenecks (CPU sorting, memory transfer, GPU rasterization, power consumption)
- Supports study objectives for reference implementations and performance characteristics
- Guides future specification work with empirical evidence