S4-260147

[FS_3DGS_MED] Pseudo-CR on Enhanced Scenario for Avatar Communication Use Case

Source: Pengcheng Laboratory, China Mobile Com. Corporation
Meeting: TSGS4_135_India
Agenda Item: 9.6

All Metadata

Agenda item description	FS_3DGS_MED (Study on 3D Gaussian splats)
Doc type	pCR
For action	Agreement
Release	Rel-20
Specification	26.958
Version	0.1.1
Related WIs	FS_3DGS_MED
download_url	Download Original
For	Agreement
Spec	26.958
Type	pCR
Contact	chaofan he
Uploaded	2026-02-03T12:25:40.917000
Contact ID	107635
TDoc Status	noted
Reservation date	03/02/2026 12:20:56
Agenda item sort order	41

Review Comments

manager - 2026-02-09 04:37

[Technical] The proposal introduces a “static 3DGS representation” that “follows mesh deformation” at the receiver, but it does not specify a deformation model for Gaussians (e.g., per-Gaussian skinning weights, attachment to mesh surface, or a learned deformation field), making interoperability and feasibility unclear.

[Technical] “Spatial alignment” between the deformable mesh and 3DGS is asserted without defining the coordinate frames, calibration requirements, and how alignment is maintained under pose/expression changes; this is a core missing element for a normative scenario description.

[Technical] The transmission strategy lacks a concrete definition of what constitutes the “base avatar” payload versus “animation parameters” (parameter sets, units, ranges, timing model), so the claimed bandwidth/latency benefits cannot be evaluated or compared to other TR 26.958 scenarios.

[Technical] The document assumes SMPL‑X/FLAME parameter extraction in real time but does not address model licensing/IP, standardization suitability, or whether the scenario is intended to be model-agnostic; referencing specific proprietary/de facto models may conflict with 3GPP’s technology-neutral TR positioning.

[Technical] “3DGS updated at lower frequency than animation parameters” is underspecified: no triggers (appearance change, lighting change, topology change), update granularity (full set vs patches), or drift/consistency handling are described, which is critical for interactive bidirectional use.

[Technical] The receiver rendering is described as “composite” (mesh shading + 3DGS appearance) but no compositing rules are given (occlusion, depth ordering, alpha blending, shadowing), risking ambiguous visual results and undermining the scenario’s reproducibility.

[Technical] “Viewpoint adaptation supported within application-defined constraints” is too vague for a TR scenario; it should at least state whether free-viewpoint is expected, what baseline view range is assumed, and how artifacts are handled when extrapolating beyond capture coverage.

[Technical] The capture assumptions (“one or more cameras”) omit key constraints that drive feasibility (mono vs multi-view, depth availability, required resolution/frame rate, lighting), which are necessary to justify real-time parameter extraction and 3DGS generation.

[Technical] The proposal does not discuss error resilience and synchronization between the low-latency animation stream and the lower-rate 3DGS updates (e.g., timestamping, buffering, late/early update handling), which is essential for interactive communication scenarios.

[Technical] There is no discussion of how identity personalization is handled (e.g., per-user mesh/3DGS creation, enrollment time, update cadence), yet “base avatar transmitted once” implies a prior creation pipeline that should be captured in the scenario.

[Editorial] As a “Pseudo-CR,” the contribution summary does not indicate the exact TR 26.958 clause(s) to be updated, nor does it provide proposed text; without clause-level changes, SA4 cannot efficiently assess consistency with existing scenarios and terminology.

[Editorial] Several terms are introduced without definition or alignment to TR terminology (e.g., “deformation propagation,” “appearance contributions,” “application-defined constraints”), which should be tightened to avoid multiple interpretations across implementers.

[Editorial] The summary claims “efficient bandwidth utilization” but provides no qualitative comparison point (e.g., versus full 3DGS streaming or mesh+texture video), making the motivation read as aspirational rather than supported by scenario requirements.

<ol>
<li>
[Technical] The proposal introduces a “static 3DGS representation” that “follows mesh deformation” at the receiver, but it does not specify a deformation model for Gaussians (e.g., per-Gaussian skinning weights, attachment to mesh surface, or a learned deformation field), making interoperability and feasibility unclear.
</li>
<li>
[Technical] “Spatial alignment” between the deformable mesh and 3DGS is asserted without defining the coordinate frames, calibration requirements, and how alignment is maintained under pose/expression changes; this is a core missing element for a normative scenario description.
</li>
<li>
[Technical] The transmission strategy lacks a concrete definition of what constitutes the “base avatar” payload versus “animation parameters” (parameter sets, units, ranges, timing model), so the claimed bandwidth/latency benefits cannot be evaluated or compared to other TR 26.958 scenarios.
</li>
<li>
[Technical] The document assumes SMPL‑X/FLAME parameter extraction in real time but does not address model licensing/IP, standardization suitability, or whether the scenario is intended to be model-agnostic; referencing specific proprietary/de facto models may conflict with 3GPP’s technology-neutral TR positioning.
</li>
<li>
[Technical] “3DGS updated at lower frequency than animation parameters” is underspecified: no triggers (appearance change, lighting change, topology change), update granularity (full set vs patches), or drift/consistency handling are described, which is critical for interactive bidirectional use.
</li>
<li>
[Technical] The receiver rendering is described as “composite” (mesh shading + 3DGS appearance) but no compositing rules are given (occlusion, depth ordering, alpha blending, shadowing), risking ambiguous visual results and undermining the scenario’s reproducibility.
</li>
<li>
[Technical] “Viewpoint adaptation supported within application-defined constraints” is too vague for a TR scenario; it should at least state whether free-viewpoint is expected, what baseline view range is assumed, and how artifacts are handled when extrapolating beyond capture coverage.
</li>
<li>
[Technical] The capture assumptions (“one or more cameras”) omit key constraints that drive feasibility (mono vs multi-view, depth availability, required resolution/frame rate, lighting), which are necessary to justify real-time parameter extraction and 3DGS generation.
</li>
<li>
[Technical] The proposal does not discuss error resilience and synchronization between the low-latency animation stream and the lower-rate 3DGS updates (e.g., timestamping, buffering, late/early update handling), which is essential for interactive communication scenarios.
</li>
<li>
[Technical] There is no discussion of how identity personalization is handled (e.g., per-user mesh/3DGS creation, enrollment time, update cadence), yet “base avatar transmitted once” implies a prior creation pipeline that should be captured in the scenario.
</li>
<li>
[Editorial] As a “Pseudo-CR,” the contribution summary does not indicate the exact TR 26.958 clause(s) to be updated, nor does it provide proposed text; without clause-level changes, SA4 cannot efficiently assess consistency with existing scenarios and terminology.
</li>
<li>
[Editorial] Several terms are introduced without definition or alignment to TR terminology (e.g., “deformation propagation,” “appearance contributions,” “application-defined constraints”), which should be tightened to avoid multiple interpretations across implementers.
</li>
<li>
[Editorial] The summary claims “efficient bandwidth utilization” but provides no qualitative comparison point (e.g., versus full 3DGS streaming or mesh+texture video), making the motivation read as aspirational rather than supported by scenario requirements.
</li>
</ol>