S4-260100 - AI Summary

Network, QoS and UE Considerations for client side inferencing AIML/IMS

Back to Agenda Download Summary
AI-Generated Summary AI

Network, QoS and UE Considerations for Client Side Inferencing AIML/IMS

1. Introduction

This contribution addresses network-related issues in the previously discussed call flow for client/UE side inferencing (S4aR260004a). The main concerns relate to steps 12-16 of the draft call flow, which involve model download and deployment for UE-based AI inferencing.

2. Network Related Issues

2.1 Model Size

Problem Identification:
- TR 26.927 indicates models are approximately 40 MB (Table 6.6.2-1)
- Current publicly available models for practical use cases are significantly larger (100+ GB)
- Example: Hunyuan Image generation model set is 169 GB (available on Hugging Face)
- Simple language models (e.g., single language translation) are approximately 100 MB

Required Action:
Details on supported model sizes and required response times need to be defined.

2.2 Network QoS Support

Problem Identification:
- For real-time request-response (500 ms or even 1000 ms), current mobile networks cannot support required bit-rates
- Example calculation: 100 GB model with 1000 ms response time requires ~800 Gbps
- Such bit-rates are not realistic in current mobile networks

Required Actions:
- Define supported model size and transfer time requirements
- Identify appropriate QoS profile (5QI)
- If no suitable 5QI exists, request SA2 to update 5QI specifications for this use case

2.3 Compression and UE Support

Problem Identification:
- TR 26.927 details NN compression with 2-20% compression ratios
- Even with compression, resulting bit-rates remain infeasible for mobile networks
- No UE capabilities for NN codec support have been defined
- Cannot assume UE support for such capabilities

Required Action:
Clarify whether NNC is required for client-side inferencing and document related requirements.

2.4 Protocol Support Issue

Problem Identification:
- S4aR260004a mentions HTTP for download
- HTTP/TCP is suboptimal for large, quick data downloads due to:
- TCP slow start
- Congestion control introducing additional latency
- Tail latency from head-of-line blocking

Proposed Solutions:
- Consider alternative protocols:
- RTP protocol with 3GPP burst QoS
- QUIC (has bindings to 5G XRM framework for improved QoS support)
- Leverage 3GPP XRM QoS support for bursty data transfer (HTTP/3 with QUIC or RTP)

2.5 Caching and Bandwidth Wastage

Problem Identification:
- Current call flow indicates model download for every request
- No explicit caching or model update mechanism
- Results in:
- Huge bandwidth wastage
- Impossible network bit-rate requirements in current mobile networks

Required Action:
Include model updates and caching mechanisms in call flow rather than requesting new model from network each time.

3. Suggested Way Forward

The contribution emphasizes that the intention is not to exclude UE inferencing (as agreed for the work item), but to clarify limitations and requirements before agreeing to a CR detailing such call flows.

Proposed Actions:

  1. Scope Limitation: Add note that client-side inferencing only works for simple cases:
  2. Explicitly exclude complex VLM/LLM
  3. Define maximum model size limits
  4. Specify applicable use cases for smaller models

  5. Latency Requirements: Clarify end-to-end latency requirements and derive required bit-rate/latency and loss profiles

  6. Protocol Clarification: Clarify correct protocol usage (typically not HTTP/TCP) to support the use case with required latency

  7. SA2 Coordination: Ask SA2:

  8. How such bursts can be supported
  9. Whether new QoS profile is needed or if existing profiles suffice

  10. Codec Support: Clarify required neural network codec support (if any) for the UE

  11. Caching Mechanism: Add caching and model update mechanisms in call flow to avoid downloading model for each task

Document Information
Source:
Huawei Tech.(UK) Co.. Ltd
Type:
discussion
For:
Agreement
Original Document:
View on 3GPP
Title: Network, QoS and UE Considerations for client side inferencing AIML/IMS
Agenda item: 10.5
Agenda item description: AI_IMS-MED (Media aspects for AI/ML in IMS services)
Doc type: discussion
For action: Agreement
Release: Rel-20
Contact: Rufail Mekuria
Uploaded: 2026-02-02T16:28:29.453000
Contact ID: 104180
Revised to: S4-260421
TDoc Status: revised
Reservation date: 02/02/2026 16:15:36
Agenda item sort order: 52