S4-260285

Avatar-udpate to section 6.3.4

Source: InterDigital New York
Meeting: TSGS4_135_India
Agenda Item: 9.8

All Metadata

Agenda item description	FS_Avatar_Ph2_MED (Study on Avatar communication Phase 2)
Doc type	discussion
For action	Agreement
Release	Rel-20
Specification	26.813
download_url	Download Original
For	Agreement
Spec	26.813
Type	discussion
Contact	Gaelle Martin-Cocher
Uploaded	2026-02-04T14:56:21.570000
Contact ID	91571
TDoc Status	agreed
Is revision of	S4-260251
Reservation date	04/02/2026 14:53:40
Agenda item sort order	43

Review Comments

manager - 2026-02-09 04:58

[Technical] The contribution appears to introduce a fully specified AAU animation bitstream format (types, headers, payload syntax, formulas) into AVATAR §6.3.4, but it is unclear whether AVATAR is intended to normatively define MPEG ARF bitstream syntax versus only referencing ISO/IEC 23090-39/43; this risks duplicating or conflicting with MPEG normative text and creating maintenance divergence.

[Technical] The AAU header definition (“7-bit AAU type code” + “payload length in bytes”) is underspecified for interoperability: no endianness, alignment/padding rules, length field size, and no explicit framing/escape mechanism are stated, making parsing ambiguous across transports and files.

[Technical] The timestamp/timescale design is inconsistent and potentially incorrect: payload uses a 32-bit timestamp in “ticks” while configuration uses a 32-bit float “timescale” for ticks-per-second; using float for a clock rate is unusual and can cause rounding/non-determinism—an integer timescale (and/or rational) is typically required for exact sync.

[Technical] The AAU_CONFIG “animation profile string” is too open-ended to ensure interoperability; without a registry, versioning rules, and normative constraints tied to each profile, receivers cannot reliably validate or negotiate capabilities.

[Technical] The joint animation sample uses a per-joint 4×4 matrix (and optional velocity matrix) but does not specify coordinate system conventions (right/left-handed, units), matrix layout (row/column-major), composition order, or whether transforms are local-to-parent vs model-space—these are critical to avoid mismatched animation playback.

[Technical] The LBS formula provided (vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰) omits the standard bind-pose correction (inverse bind matrices) and does not clarify whether Mⱼ already includes that; given the earlier mention of inverse bind matrices, the text should align or it will mislead implementers.

[Technical] Blendshape sample semantics are incomplete: no constraints on weight ranges, normalization, additive vs absolute interpretation, and no definition of how “confidence” affects rendering/estimation; this can lead to incompatible behavior across clients.

[Technical] Landmark AAU dimensionality (2D vs 3D) is defined but not the reference frame (image plane vs normalized device coords vs mesh local/world), nor units and origin; without this, landmarks cannot be used consistently for overlays/registration.

[Technical] Texture animation samples are described as “parametric texture weights” controlling “micro-geometry patterns, makeup, dynamic material variations,” but there is no normative mapping to material models (e.g., glTF PBR parameters) or to ARF TextureSet targets, so different implementations will interpret the same stream differently.

[Technical] The container statements (“ARF document in MetaBox item, may include animation tracks”; zip-based with relative references) lack the key identifiers needed for interoperability (item types/brands, track handler/sample entry, MIME/URN mapping, and security considerations for zip path traversal), and may conflict with existing ISOBMFF conventions if not aligned.

[Technical] The “preamble” addition includes “authenticationFeatures with encrypted facial/voice feature vectors and public key URI,” which raises privacy/security and regulatory implications; without specifying threat model, key management, encryption scheme, and consent/usage constraints, this is risky to introduce even as descriptive text.

[Technical] The data model details (e.g., tensors “Nx16”, “NxM”, GLB references for blendshapes) read like normative schema requirements but do not define encoding (binary layout, precision, indexing, limits) or how these map to glTF/23090-14 constructs, risking inconsistent implementations.

[Editorial] The update of the MPEG ARF reference from WG03N1316 to WG03N1693 and the claim that 23090-39 is at CDIS stage should be verified against the exact cited document and date; AVATAR text should avoid hard-coding maturity statements that will quickly become outdated.

[Editorial] Terminology is inconsistent between “ARF document,” “preamble,” “metadata object,” “components section,” and “MetaBox item”; if these are intended as formal objects/boxes, they should be capitalized/defined consistently, otherwise phrasing should be clearly informative-only.

[Editorial] Several parts read like implementation notes (“available through Python language,” “inverse kinematics system for missing joint information”) rather than specification text; AVATAR §6.3.4 should separate normative requirements from informative examples to avoid implying mandatory behavior.

<ol>
<li>
[Technical] The contribution appears to introduce a fully specified AAU animation bitstream format (types, headers, payload syntax, formulas) into AVATAR §6.3.4, but it is unclear whether AVATAR is intended to normatively define MPEG ARF bitstream syntax versus only referencing ISO/IEC 23090-39/43; this risks duplicating or conflicting with MPEG normative text and creating maintenance divergence.
</li>
<li>
[Technical] The AAU header definition (“7-bit AAU type code” + “payload length in bytes”) is underspecified for interoperability: no endianness, alignment/padding rules, length field size, and no explicit framing/escape mechanism are stated, making parsing ambiguous across transports and files.
</li>
<li>
[Technical] The timestamp/timescale design is inconsistent and potentially incorrect: payload uses a 32-bit timestamp in “ticks” while configuration uses a 32-bit float “timescale” for ticks-per-second; using float for a clock rate is unusual and can cause rounding/non-determinism—an integer timescale (and/or rational) is typically required for exact sync.
</li>
<li>
[Technical] The AAU_CONFIG “animation profile string” is too open-ended to ensure interoperability; without a registry, versioning rules, and normative constraints tied to each profile, receivers cannot reliably validate or negotiate capabilities.
</li>
<li>
[Technical] The joint animation sample uses a per-joint 4×4 matrix (and optional velocity matrix) but does not specify coordinate system conventions (right/left-handed, units), matrix layout (row/column-major), composition order, or whether transforms are local-to-parent vs model-space—these are critical to avoid mismatched animation playback.
</li>
<li>
[Technical] The LBS formula provided (vᵢ = Σⱼ wᵢⱼ · Mⱼ · vᵢ⁰) omits the standard bind-pose correction (inverse bind matrices) and does not clarify whether Mⱼ already includes that; given the earlier mention of inverse bind matrices, the text should align or it will mislead implementers.
</li>
<li>
[Technical] Blendshape sample semantics are incomplete: no constraints on weight ranges, normalization, additive vs absolute interpretation, and no definition of how “confidence” affects rendering/estimation; this can lead to incompatible behavior across clients.
</li>
<li>
[Technical] Landmark AAU dimensionality (2D vs 3D) is defined but not the reference frame (image plane vs normalized device coords vs mesh local/world), nor units and origin; without this, landmarks cannot be used consistently for overlays/registration.
</li>
<li>
[Technical] Texture animation samples are described as “parametric texture weights” controlling “micro-geometry patterns, makeup, dynamic material variations,” but there is no normative mapping to material models (e.g., glTF PBR parameters) or to ARF TextureSet targets, so different implementations will interpret the same stream differently.
</li>
<li>
[Technical] The container statements (“ARF document in MetaBox item, may include animation tracks”; zip-based with relative references) lack the key identifiers needed for interoperability (item types/brands, track handler/sample entry, MIME/URN mapping, and security considerations for zip path traversal), and may conflict with existing ISOBMFF conventions if not aligned.
</li>
<li>
[Technical] The “preamble” addition includes “authenticationFeatures with encrypted facial/voice feature vectors and public key URI,” which raises privacy/security and regulatory implications; without specifying threat model, key management, encryption scheme, and consent/usage constraints, this is risky to introduce even as descriptive text.
</li>
<li>
[Technical] The data model details (e.g., tensors “Nx16”, “NxM”, GLB references for blendshapes) read like normative schema requirements but do not define encoding (binary layout, precision, indexing, limits) or how these map to glTF/23090-14 constructs, risking inconsistent implementations.
</li>
<li>
[Editorial] The update of the MPEG ARF reference from WG03N1316 to WG03N1693 and the claim that 23090-39 is at CDIS stage should be verified against the exact cited document and date; AVATAR text should avoid hard-coding maturity statements that will quickly become outdated.
</li>
<li>
[Editorial] Terminology is inconsistent between “ARF document,” “preamble,” “metadata object,” “components section,” and “MetaBox item”; if these are intended as formal objects/boxes, they should be capitalized/defined consistently, otherwise phrasing should be clearly informative-only.
</li>
<li>
[Editorial] Several parts read like implementation notes (“available through Python language,” “inverse kinematics system for missing joint information”) rather than specification text; AVATAR §6.3.4 should separate normative requirements from informative examples to avoid implying mandatory behavior.
</li>
</ol>