S4-260100 - AI Summary

Network, QoS and UE Considerations for client side inferencing AIML/IMS

AI-Generated Summary AI

Network, QoS and UE Considerations for Client Side Inferencing AIML/IMS

1. Introduction

This contribution addresses network-related issues in the previously discussed call flow for client/UE side inferencing (S4aR260004a). The main concerns relate to steps 12-16 of the draft call flow, which involve model download and deployment for UE-based AI inferencing.

2. Network Related Issues

2.1 Model Size

Problem Identification:
- TR 26.927 indicates models are approximately 40 MB (Table 6.6.2-1)
- Current publicly available models for practical use cases are significantly larger (100+ GB)
- Example: Hunyuan Image generation model set is 169 GB (available on Hugging Face)
- Simple language models (e.g., single language translation) are approximately 100 MB

Required Action:
Details on supported model sizes and required response times need to be defined.

2.2 Network QoS Support

Problem Identification:
- For real-time request-response (500 ms or even 1000 ms), current mobile networks cannot support required bit-rates
- Example calculation: 100 GB model with 1000 ms response time requires ~800 Gbps
- Such bit-rates are not realistic in current mobile networks

Required Actions:
- Define supported model size and transfer time requirements
- Identify appropriate QoS profile (5QI)
- If no suitable 5QI exists, request SA2 to update 5QI specifications for this use case

2.3 Compression and UE Support

Problem Identification:
- TR 26.927 details NN compression with 2-20% compression ratios
- Even with compression, resulting bit-rates remain infeasible for mobile networks
- No UE capabilities for NN codec support have been defined
- Cannot assume UE support for such capabilities

Required Action:
Clarify whether NNC is required for client-side inferencing and document related requirements.

2.4 Protocol Support Issue

Problem Identification:
- S4aR260004a mentions HTTP for download
- HTTP/TCP is suboptimal for large, quick data downloads due to:
- TCP slow start
- Congestion control introducing additional latency
- Tail latency from head-of-line blocking

Proposed Solutions:
- Consider alternative protocols:
- RTP protocol with 3GPP burst QoS
- QUIC (has bindings to 5G XRM framework for improved QoS support)
- Leverage 3GPP XRM QoS support for bursty data transfer (HTTP/3 with QUIC or RTP)

2.5 Caching and Bandwidth Wastage

Problem Identification:
- Current call flow indicates model download for every request
- No explicit caching or model update mechanism
- Results in:
- Huge bandwidth wastage
- Impossible network bit-rate requirements in current mobile networks

Required Action:
Include model updates and caching mechanisms in call flow rather than requesting new model from network each time.

3. Suggested Way Forward

The contribution emphasizes that the intention is not to exclude UE inferencing (as agreed for the work item), but to clarify limitations and requirements before agreeing to a CR detailing such call flows.

Proposed Actions:

Scope Limitation: Add note that client-side inferencing only works for simple cases:
Explicitly exclude complex VLM/LLM
Define maximum model size limits
Specify applicable use cases for smaller models
Latency Requirements: Clarify end-to-end latency requirements and derive required bit-rate/latency and loss profiles
Protocol Clarification: Clarify correct protocol usage (typically not HTTP/TCP) to support the use case with required latency
SA2 Coordination: Ask SA2:
How such bursts can be supported
Whether new QoS profile is needed or if existing profiles suffice
Codec Support: Clarify required neural network codec support (if any) for the UE
Caching Mechanism: Add caching and model update mechanisms in call flow to avoid downloading model for each task

Document Information

TDoc:
S4-260100

Source:
Huawei Tech.(UK) Co.. Ltd

Type:
discussion

For:
Agreement

Original Document:
View on 3GPP

Title: Network, QoS and UE Considerations for client side inferencing AIML/IMS

Agenda item: 10.5

Agenda item description: AI_IMS-MED (Media aspects for AI/ML in IMS services)

Doc type: discussion

For action: Agreement

Release: Rel-20

Contact: Rufail Mekuria

Uploaded: 2026-02-02T16:28:29.453000

Contact ID: 104180

Revised to: S4-260421

TDoc Status: revised

Reservation date: 02/02/2026 16:15:36

Agenda item sort order: 52