S4-260121 - AI Summary

[FS_Avatar_Ph2_MED] Avatar Evaluation Framework and Objective Metrics

Back to Agenda Download Summary
AI-Generated Summary AI

Summary of S4-260121: Avatar Evaluation Framework and Objective Metrics

Introduction

This contribution addresses Objectives 2 and 3 of the Avatar Communication Phase 2 SID (SP-251663), which concern QoE metrics, evaluation frameworks, and evaluation criteria for animation techniques. The document proposes a practical evaluation methodology designed to deliver repeatable, automated, and vendor-neutral results based on a core principle: evaluate what the user actually sees by measuring quality from rendered video output rather than internal system parameters.

Evaluation Framework

Design Principles

The framework is built on four key principles:

  1. Black-box evaluation: Metrics computed from rendered output video, not internal system states, ensuring cross-vendor comparability
  2. Reproducibility: Fixed test content, deterministic rendering conditions, and standardized capture workflows for consistent results
  3. Automation: All metrics computable without human intervention for large-scale testing
  4. (Note: Only three principles explicitly detailed despite mentioning four)

Testbed Architecture

The proposed testbed comprises five key components:

  • Stimulus player: Feeds avatar system with animation streams (blendshape weights, landmarks, joint poses)
  • Render configuration: Locks camera intrinsics, lighting, background, and resolution to eliminate variability
  • Capture module: Records rendered frames using lossless/visually lossless compression with frame-accurate timestamps
  • Network emulator: Applies controlled latency, jitter, bandwidth limits, and packet loss for transport testing
  • Metrics engine: Computes frame-level and clip-level objective metrics from captured assets

Objective Metrics for Avatar Evaluation

The contribution proposes metrics across three quality dimensions:

Visual Quality Metrics

  • PSNR (dB): Peak signal-to-noise ratio between reference and test frames
  • SSIM (0-1): Structural similarity index

Animation Quality Metrics

Video-based computation extracting landmarks and skeletons from rendered output:

  • Lip Vertex Error (LVE) (pixels/mm): RMS error of mouth landmarks; critical for lip sync evaluation
  • Facial Distance Deviation (FDD) (pixels/mm): Deviation of expression-related landmark distances; measures facial expression accuracy
  • Motion Vertex Error (MVE) (pixels/mm): RMS error of body joint positions; evaluates full-body animation fidelity

Temporal and Synchronization Metrics

Proposed for second phase evaluation due to complexity:

  • Rendering Frame Rate (FPS): Computed from frame timestamp deltas
  • Dropped Frame Ratio (%): Percentage of missing or repeated frame indices
  • Motion-to-Photon Latency (ms): Time from input motion event to visible response
  • End-to-End Latency (ms): Total delay from sender capture to receiver presentation
  • Audio-Visual Sync Offset (ms): Offset between mouth motion and corresponding audio via cross-correlation

Test Content

Standardized animation streams should cover:

  • Neutral speech: Clear visemes and steady head motion for baseline lip sync
  • Expressive speech: Emotions (happiness, surprise, concern) for facial expression testing
  • Conversational turn-taking: Gaze shifts, nods, backchannel gestures
  • Non-verbal body motion: Pointing, waving, posture changes, idle animation

Each test set should contain reference audio, reference animation streams, and reference rendered video from both high-quality reference pipeline and source capture.

Proposals

The contribution proposes to:

  1. Adopt objective evaluation approach based on rendered video output as primary evaluation method for reproducibility
  2. Include the proposed metric set (visual quality, animation fidelity, temporal performance) in TR 26.813
  3. Define normative capture workflow using lossless recording, timecode embedding, and reference alignment for consistent metric computation across implementations
Document Information
Source:
Qualcomm Atheros, Inc.
Type:
discussion
Original Document:
View on 3GPP
Title: [FS_Avatar_Ph2_MED] Avatar Evaluation Framework and Objective Metrics
Agenda item: 9.8
Agenda item description: FS_Avatar_Ph2_MED (Study on Avatar communication Phase 2)
Doc type: discussion
Contact: Imed Bouazizi
Uploaded: 2026-02-03T21:49:01.090000
Contact ID: 84417
Revised to: S4-260355
TDoc Status: revised
Reservation date: 03/02/2026 05:48:54
Agenda item sort order: 43