# Summary of 3GPP Change Request S4-260168

## Document Information
- **Source:** Tencent
- **Title:** Pseudo-CR on 3DGS renderer and performance benchmarking
- **Specification:** 3GPP TR 26.958 v0.1.1
- **Study:** FS_3DGS_MED (3D Gaussian Splats for mobile)

## Main Objective

This change request proposes adding technical content to TR 26.958 regarding a reference implementation of a 3DGS player for mobile platforms, including mobile renderer features and preliminary experimental benchmark results obtained on commercial mobile devices.

## Technical Contributions

### 1. Mobile Renderer Architecture (Section 12.4.1)

The document proposes a hybrid architecture for the 3DGS mobile player:

- **Native Layer (C++):** 
  - Implements core rendering using OpenGL ES 3.2
  - Tile-based rasterizer inspired by original 3DGS method
  - CPU sorting or Compute Shaders for parallel sorting (e.g., Radix sort)
  - Vertex and Fragment shaders for rendering

- **Application Layer (Java/Kotlin):**
  - UI management
  - AR runtime lifecycle for camera tracking
  - Resource management

- **Capabilities:**
  - Supports standard .ply file loading
  - Real-time interaction (rotation, translation, scaling)
  - Benchmarking mode with dynamic parameter variation

### 2. Rendering Process Details (Section 12.4.1 - second subsection)

Key technical aspects of the mobile rendering pipeline:

- **Depth Sorting:** Critical back-to-front sorting performed by CPU each frame for proper alpha blending (unlike Z-buffer-based mesh rendering)
- **Sorting Implementation:** CPU-based Radix Sort preferred over GPU Compute Shaders on mobile for thermal balance and driver compatibility
- **Data Management:**
  - Gaussian attributes loaded into VRAM at startup
  - FP32 textures/buffers for precision in covariance and color calculations
  - Only sorted indices transferred CPU→GPU per frame
  - Vertex shader uses texelFetch for direct reads from persistent buffers
  - Minimizes CPU-GPU bandwidth while maintaining visual fidelity

### 3. Benchmark Methodology (Section 12.4.2)

Proposed benchmarking approach:

- **Dynamic parameter modification** during runtime
- **Thermal management API** usage for consistent clock speeds
- **AR runtime disabled** during benchmarking for fair comparison
- **Variable parameters:**
  - Number of Gaussians: 5,000 to 485,436 points
  - Spherical Harmonics degree: 0 (diffuse only) to 3 (full view-dependence)

### 4. Experimental Results (Section 12.4.3)

#### Test Configuration
- **Device:** Google Pixel 9a (Tensor G4, mid-range, March 2025)
- **Application:** Tencent 3DGS mobile player
- **Build:** Release mode with optimizations
- **Test duration:** 30 seconds per configuration for thermal stability
- **Model:** bicycle.ply (485,436 points)
- **Power measurement:** Android Battery Manager API

#### Impact of Number of Points (SH degree=3)

Key findings from Table 1 and Figure 2:

- **5,000 points:** 355 FPS, 24% CPU, 6% GPU, 1.45W
- **150,000 points:** 56 FPS, 47% CPU, 88% GPU, 1.47W (approaching GPU saturation)
- **200,000 points:** 45 FPS, 48% CPU, 99% GPU, 1.33W (GPU saturated)
- **485,436 points:** 19 FPS, 55% CPU, 100% GPU, 1.22W

**Conclusion:** GPU saturation occurs at ~150k points (87% load) and full saturation at 200k points. Beyond saturation, frame rate decreases linearly with point count.

#### Impact of Spherical Harmonics Degree (485k points)

Key findings from Table 2 and Figure 3:

- **SH Degree 0:** 20.41 FPS, 55% CPU, 100% GPU, 1.45W
- **SH Degree 3:** 18.05 FPS, 55% CPU, 100% GPU, 0.99W
- **Performance impact:** ~10.8% FPS reduction from degree 0 to 3

**Conclusion:** Moderate frame rate impact when increasing SH degree from 0 to 3.

### 5. Overall Analysis (Section 12.4.2.3)

**Key conclusions:**

- Real-time rendering of complex 3DGS scenes is **feasible on current-generation mobile hardware**
- Scene complexity management required (< 200k visible points recommended)
- Performance variations observed between identical experiments due to:
  - Background processes
  - Dynamic power management
  - Results should be considered as trends rather than fixed values

**Editor's note:** Additional benchmarks planned to evaluate impact of other improvements (memory optimization, quantization, sorting algorithms, etc.)

## Rationale for Change

- Provides concrete data to validate real-time 3DGS feasibility on mobile hardware
- Identifies performance bottlenecks (CPU sorting, memory transfer, GPU rasterization, power consumption)
- Supports study objectives for reference implementations and performance characteristics
- Guides future specification work with empirical evidence