S4-260177 - AI Summary

[FS_Avatar_Ph2_MED] Interoperability guidance for ARF

AI-Generated Summary AI

Interoperability Guidance for ARF

Introduction

This contribution addresses the FFS noted in TS 26.264 clause 5.6.1 regarding evaluation of MPEG ARF and interoperability aspects. The key interoperability challenge is mapping: receivers can only animate an avatar if they can correctly map incoming animation parameters to the appropriate Skeleton, BlendshapeSet, and LandmarkSet in the ARF container.

ISO/IEC 23090-39 defines signalling to declare supported animation frameworks and provide mapping tables. This contribution proposes concrete interoperability guidance with detailed examples for both linear and non-linear mappings.

Interoperability Framework

Interoperability Principles

The proposed guidance is based on four core principles:

Single source of truth: The ARF document is the normative description for interpreting an animation stream for a given avatar
Explicit profile identification: Each animation stream identifies its animation profile, and the ARF document lists supported profile URNs
Deterministic mapping: When the stream profile doesn't directly match stored assets, the ARF document provides mapping tables from the stream profile to the target asset set
Sender responsibility: The sender (who owns the ARF container) ensures either direct matching identifiers are used or mappings are present. The receiver is not expected to guess.

Mapping Signalling in ARF

ARF provides three signalling layers for mapping between animation frameworks:

SupportedAnimations: Lists supported face, body, hand, landmark, and texture animation profiles as URNs. Each URN identifies a framework and specific parameter set (e.g., blendshape set or joint set).

AnimationInfo and AnimationLink: Each animatable asset in components (Skeleton, BlendshapeSet, LandmarkSet) includes animationInfo. Each AnimationLink points to one SupportedAnimations entry as the target for that asset.

Mapping Objects: When additional frameworks are used for capture or streaming, animationInfo can include Mapping objects that map from a source SupportedAnimations entry to the target entry. Two mapping types are supported:
- LinearAssociation: Expresses a weighted sum from multiple source parameters to one target parameter
- NonLinearAssociation: Expresses non-linear transforms using one or more channels with lookup tables and interpolation

Mapping indices refer to parameter identifiers in the animation stream (ShapeKey.id for blendshapes, target joint index for joint animation, target landmark index for landmark animation).

Receiver Processing Procedure

The receiver applies the following procedure:

Parse preamble.supportedAnimations and build an index-to-URN map for each animation type
Determine the animation profile used by the received stream and find its index in the corresponding SupportedAnimations list
Select the target Skeleton, BlendshapeSet, or LandmarkSet to animate and find the matching AnimationLink entry
If the stream profile index equals AnimationLink.target, apply stream parameters directly to target assets
Otherwise, find a Mapping entry where Mapping.source equals the stream profile index, then apply the LinearAssociation or NonLinearAssociation rules to compute target parameters
For any target indices not produced by mapping, use neutral defaults (0.0 for blendshape weights, bind pose for joints, neutral position for landmarks)

Mapping Mechanisms

Direct Match and Identifier Spaces

The simplest case occurs when the sender generates the animation stream using the same framework and parameter set as the target asset in ARF.

| Scenario | Typical Issue | ARF Signalling and Behaviour |
|----------|---------------|------------------------------|
| Direct match | Stream profile and parameter identifiers match target assets in ARF container | No mapping needed. Receiver applies parameters directly. ARF document declares profile in SupportedAnimations and links target assets with AnimationLink.target |
| Subset | Source and target use same semantics but target has fewer parameters | Unmapped target parameters default to neutral values |

Linear Mappings

Linear mappings are suitable when a target parameter can be expressed as a weighted sum of one or more source parameters. Typical use cases include mirroring left/right shapes, splitting/merging parameters, and simple scaling. Represented in ARF by LinearAssociation with targetIndex, sourceIndices, and weights.

Examples:

| Target Parameter (ARF) | Source Parameters (Stream) | Linear Association |
|------------------------|---------------------------|-------------------|
| Smile (targetIndex 12) | mouthSmileLeft (5), mouthSmileRight (6) | w12 = 0.5w5 + 0.5w6 |
| JawOpen (targetIndex 3) | jawOpen (13) | w3 = 1.0w13 |
| MouthCornerPull (targetIndex 20) | mouthSmileLeft (5), mouthSmileRight (6), cheekSquintLeft (26), cheekSquintRight (27) | w20 = 0.4w5 + 0.4w6 + 0.1w26 + 0.1*w27 |

Non-linear Mappings

Non-linear mappings are needed when linear blending is insufficient. Typical cases include dead zones, saturation, perceptual calibration curves, and gating where one parameter modulates another. Represented in ARF by NonLinearAssociation. Each channel maps one source parameter through a lookup table defined by Data items. Channel outputs are combined using COMBINATION_SUM or COMBINATION_MUL.

Examples:

| Target Parameter (ARF) | Source Parameter(s) | Non-linear Mapping |
|------------------------|---------------------|-------------------|
| JawOpen (targetIndex 3) | jawOpen (13) | Piecewise curve with deadzone and saturation. Example input [0.0,0.1,0.4,1.0] maps to output [0.0,0.0,0.7,1.0] with INTERPOLATION_LINEAR |
| Blink (targetIndex 7) | eyeBlinkLeft (1), eyeBlinkRight (2) | Each eye uses threshold curve. INTERPOLATION_STEP to convert soft signal into binary blink. Combine with COMBINATION_SUM and clamp to [0,1] |
| MouthOpenSmile (targetIndex 30) | jawOpen (13) and Smile (12 after linear mapping) | Use COMBINATION_MUL to gate smile by jaw opening. Channel 1 maps jawOpen through deadzone curve. Channel 2 maps smile through S curve. Multiply channel outputs |
| BrowRaise (targetIndex 15) | browInnerUp (9) | Gamma curve to better match target rig. Example output = pow(input, 0.5). Approximated with LUT and INTERPOLATION_CUBICSPLINE |
| Landmark mouthMidTop (targetIndex 18) | landmarks 50 and 52 | Non-linear only if needed for stabilization or bias compensation. Example: apply LUT to compress extreme motion before writing 2D or 3D coordinate |

Proposal

The contribution proposes:

Document the content of sections 2 and 3 in TR 26.813
Add explicit text to TS 26.264 clause 5.6.1 stating that the avatar owner shall ensure identifiers used in avatar animation streams either:
Directly match the identifiers of target Skeleton, BlendshapeSet, and LandmarkSet stored in ARF, OR
Mapping tables are present in the ARF document to convert from the stream profile to the target assets
Remove the corresponding note from TS 26.264 and declare it as resolved

Document Information

TDoc:
S4-260177

Source:
Qualcomm Atheros, Inc.

Type:
discussion

Original Document:
View on 3GPP

Title: [FS_Avatar_Ph2_MED] Interoperability guidance for ARF

Agenda item: 9.8

Agenda item description: FS_Avatar_Ph2_MED (Study on Avatar communication Phase 2)

Doc type: discussion

Contact: Imed Bouazizi

Uploaded: 2026-02-03T21:49:01.107000

Contact ID: 84417

TDoc Status: agreed

Reservation date: 03/02/2026 15:42:46

Agenda item sort order: 43