S4-260164

[FS_3DGS_MED] Pseudo-CR on objective metrics for 3DGS

Source: Tencent Cloud
Meeting: TSGS4_135_India
Agenda Item: 9.6

All Metadata

Agenda item description	FS_3DGS_MED (Study on 3D Gaussian splats)
Doc type	pCR
For action	Agreement
Release	Rel-20
Specification	26.958
Version	0.1.1
Related WIs	FS_3DGS_MED
download_url	Download Original
For	Agreement
Spec	26.958
Type	pCR
Contact	Julien Ricard
Uploaded	2026-02-03T21:41:18.920000
Contact ID	109076
TDoc Status	noted
Reservation date	03/02/2026 14:22:19
Agenda item sort order	41

Review Comments

manager - 2026-02-09 04:37

[Technical] TR 26.958 is a Study Item TR; introducing “recommended for normative results” and “bit-exact rendering” language (CPU rasterizer) risks implying normative conformance where none exists—wording should be strictly informative and aligned with TR scope.

[Technical] The proposal relies on a fork of MPEG’s mpeg-gsc-metrics but provides no evidence of licensing/IPR compatibility, long-term maintenance plan, or reproducibility guarantees within 3GPP Git (e.g., pinned commit, dependency versions), which is critical if it becomes the de facto reference.

[Technical] “Bit-exact rendering regardless of hardware/OS” for a CPU rasterizer is a strong claim that is typically false without strict control of floating-point behavior, compiler flags, SIMD paths, and math libraries; the CR should specify determinism constraints and validation methodology.

[Technical] The metric definitions are underspecified: PSNR/MSE “in RGB and YUV with weighted averages” needs explicit color conversion (matrix, range, primaries, transfer), bit depth, rounding, and weighting (e.g., 4:2:0 vs 4:4:4, luma/chroma weights) to ensure cross-implementation comparability.

[Technical] “Object Masked (OM) metrics” based on “union of object masks” is ambiguous (union of source+decoded masks vs per-view mask generation); this choice materially affects scores and should be precisely defined, including how masks are generated from splats and how occlusions/background are handled.

[Technical] Viewpoint generation “from original PLY or explicit definition” is not sufficiently constrained; without a standardized view sampling strategy (count, distribution, near/far planes, FOV, resolution), results across companies will remain non-comparable despite a common tool.

[Technical] The --useCameraPosition approach depends on non-standard PLY header comments “typically inserted by tools”; without a defined schema/grammar and required fields (intrinsics/extrinsics conventions, coordinate system, units), interoperability will be poor and results non-repeatable.

[Technical] The CR mixes evaluation of “source vs decoded PLY” but does not clarify whether “source” is the original capture point cloud, the encoder input 3DGS, or a rendered reference; for codec evaluation, the reference should be the encoder input representation, not necessarily the original reconstruction.

[Technical] GPU rasterizer based on OpenGL is inherently non-deterministic across drivers and platforms; the CR should clearly restrict GPU mode to non-comparative visualization only and prevent accidental use in reported results (e.g., tool default, output labeling).

[Technical] Inclusion of IVSSIM is mentioned but not defined (version, parameters, viewing conditions); perceptual metrics often have multiple variants—without parameter locking, reported numbers will not be comparable.

[Technical] “Occupancy rate = valid pixel coverage percentage” needs a precise definition of “valid pixel” (alpha threshold? depth test? mask generation?) and whether it is computed on source, decoded, or combined; otherwise it can be gamed and misinterpreted.

[Editorial] Section numbering (new 6.4.1 and 12.4) may conflict with existing TR 26.958 structure; the CR should show exact insertion points and ensure consistent cross-references, rather than describing “new sections” abstractly.

[Editorial] The contribution reads like a “pseudo-CR” and tool user guide; it should clearly separate (a) metric definitions, (b) reference implementation description, and (c) example commands/outputs, and avoid promotional phrasing (“comprehensive”, “ensures exact”) unless substantiated.

[Editorial] Example results (Bartender, 1920×1080, PSNR/SSIM, 100% occupancy) are not traceable without specifying dataset version, rendering settings, and tool version/commit; examples should be labeled as illustrative and reproducible inputs should be referenced.

<ol>
<li>
[Technical] TR 26.958 is a Study Item TR; introducing “recommended for normative results” and “bit-exact rendering” language (CPU rasterizer) risks implying normative conformance where none exists—wording should be strictly informative and aligned with TR scope.
</li>
<li>
[Technical] The proposal relies on a fork of MPEG’s mpeg-gsc-metrics but provides no evidence of licensing/IPR compatibility, long-term maintenance plan, or reproducibility guarantees within 3GPP Git (e.g., pinned commit, dependency versions), which is critical if it becomes the de facto reference.
</li>
<li>
[Technical] “Bit-exact rendering regardless of hardware/OS” for a CPU rasterizer is a strong claim that is typically false without strict control of floating-point behavior, compiler flags, SIMD paths, and math libraries; the CR should specify determinism constraints and validation methodology.
</li>
<li>
[Technical] The metric definitions are underspecified: PSNR/MSE “in RGB and YUV with weighted averages” needs explicit color conversion (matrix, range, primaries, transfer), bit depth, rounding, and weighting (e.g., 4:2:0 vs 4:4:4, luma/chroma weights) to ensure cross-implementation comparability.
</li>
<li>
[Technical] “Object Masked (OM) metrics” based on “union of object masks” is ambiguous (union of source+decoded masks vs per-view mask generation); this choice materially affects scores and should be precisely defined, including how masks are generated from splats and how occlusions/background are handled.
</li>
<li>
[Technical] Viewpoint generation “from original PLY or explicit definition” is not sufficiently constrained; without a standardized view sampling strategy (count, distribution, near/far planes, FOV, resolution), results across companies will remain non-comparable despite a common tool.
</li>
<li>
[Technical] The <code>--useCameraPosition</code> approach depends on non-standard PLY header comments “typically inserted by tools”; without a defined schema/grammar and required fields (intrinsics/extrinsics conventions, coordinate system, units), interoperability will be poor and results non-repeatable.
</li>
<li>
[Technical] The CR mixes evaluation of “source vs decoded PLY” but does not clarify whether “source” is the original capture point cloud, the encoder input 3DGS, or a rendered reference; for codec evaluation, the reference should be the encoder input representation, not necessarily the original reconstruction.
</li>
<li>
[Technical] GPU rasterizer based on OpenGL is inherently non-deterministic across drivers and platforms; the CR should clearly restrict GPU mode to non-comparative visualization only and prevent accidental use in reported results (e.g., tool default, output labeling).
</li>
<li>
[Technical] Inclusion of IVSSIM is mentioned but not defined (version, parameters, viewing conditions); perceptual metrics often have multiple variants—without parameter locking, reported numbers will not be comparable.
</li>
<li>
[Technical] “Occupancy rate = valid pixel coverage percentage” needs a precise definition of “valid pixel” (alpha threshold? depth test? mask generation?) and whether it is computed on source, decoded, or combined; otherwise it can be gamed and misinterpreted.
</li>
<li>
[Editorial] Section numbering (new 6.4.1 and 12.4) may conflict with existing TR 26.958 structure; the CR should show exact insertion points and ensure consistent cross-references, rather than describing “new sections” abstractly.
</li>
<li>
[Editorial] The contribution reads like a “pseudo-CR” and tool user guide; it should clearly separate (a) metric definitions, (b) reference implementation description, and (c) example commands/outputs, and avoid promotional phrasing (“comprehensive”, “ensures exact”) unless substantiated.
</li>
<li>
[Editorial] Example results (Bartender, 1920×1080, PSNR/SSIM, 100% occupancy) are not traceable without specifying dataset version, rendering settings, and tool version/commit; examples should be labeled as illustrative and reproducible inputs should be referenced.
</li>
</ol>