Unknown
S4-260100 / TSGS4_135_India / 10.5 / Huawei Tech.(UK) Co.. Ltd / Network, QoS and UE Considerations for client...
Next Edit
S4-260100

Network, QoS and UE Considerations for client side inferencing AIML/IMS

Source: Huawei Tech.(UK) Co.. Ltd
Meeting: TSGS4_135_India
Agenda Item: 10.5

All Metadata
Agenda item description AI_IMS-MED (Media aspects for AI/ML in IMS services)
Doc type discussion
For action Agreement
Release Rel-20
download_url Download Original
For Agreement
Type discussion
Contact Rufail Mekuria
Uploaded 2026-02-02T16:28:29.453000
Contact ID 104180
Revised to S4-260421
TDoc Status revised
Reservation date 02/02/2026 16:15:36
Agenda item sort order 52
Review Comments
manager - 2026-02-09 04:01


  1. [Technical] The core premise in §2.2 (downloading a 100 GB model within 500–1000 ms) is not a realistic or relevant requirement for UE-side inferencing; the call flow should instead assume pre-provisioned/on-device models or background download over minutes/hours, otherwise the derived “800 Gbps” conclusion is a strawman that will derail the discussion.




  2. [Technical] §2.1 cites TR 26.927 Table 6.6.2-1 (~40 MB) but then jumps to “public models 100+ GB” without mapping to the IMS/AIML use cases under discussion; the contribution needs to distinguish between (i) UE inference models intended for mobile deployment and (ii) data-center class generative models, otherwise the “required action” to define supported sizes is ungrounded.




  3. [Technical] The text conflates “real-time request-response latency” with “model transfer time” (§2.2); in most architectures the model download is not on the critical path of a single inference transaction, so QoS/latency requirements should be split into (a) inference transaction latency and (b) model acquisition/update latency.




  4. [Technical] Requesting SA2 to “update 5QI specifications” (§2.2) is premature and underspecified: the contribution does not identify whether the model transfer is best treated as GBR/non-GBR, what packet delay budget/jitter/loss are needed, or whether existing 5QIs (e.g., for TCP-based data) are insufficient; without concrete QoS characteristics, SA2 cannot act.




  5. [Technical] §2.4’s protocol critique is internally inconsistent: proposing RTP for “large, quick data downloads” is atypical and ignores reliability, congestion control, and content integrity needs; if the issue is TCP behavior, the more relevant comparison is HTTP/2 vs HTTP/3 (QUIC) and/or segmented download with application-layer pacing rather than RTP.




  6. [Technical] The claim that QUIC “has bindings to 5G XRM framework for improved QoS support” (§2.4) is vague and risks being incorrect/misleading in 3GPP terms; if the intent is to leverage 5G QoS (5QI/ARP/reflective QoS) or ATSSS, the contribution should reference the specific 3GPP mechanisms and how they apply to QUIC flows.




  7. [Technical] §2.3 cites “2–20% compression ratios” from TR 26.927 but does not clarify whether this refers to bitrate reduction, model size reduction, or accuracy trade-offs; without specifying the compression target and acceptable quality loss, the conclusion “still infeasible” is not technically supported.




  8. [Technical] The “No UE capabilities for NN codec support have been defined” point (§2.3) is valid, but the required action is incomplete: it should propose where UE capability signaling would live (e.g., NAS/IMS/UE capability exchange) and what minimum interoperability baseline is assumed if NNC is optional.




  9. [Technical] §2.5 asserts the call flow “indicates model download for every request,” but does not quote the exact steps 12–16 behavior; if the original flow already implies model reuse or versioning, this criticism may be inaccurate—please pinpoint the exact normative/diagram text that mandates per-request download.




  10. [Technical] The proposed “scope limitation” to exclude “complex VLM/LLM” (§3.1) is not actionable without objective criteria (parameter count, model size on disk, compute class, or use-case categories); otherwise it becomes a subjective exclusion that is hard to standardize.




  11. [Technical] The contribution focuses almost entirely on downlink throughput but omits other critical feasibility constraints for UE inferencing (compute, memory footprint, thermal/power, storage, and model integrity/attestation); these are central to whether UE-side inferencing is viable and should be at least acknowledged if the goal is “network, QoS and UE considerations.”




  12. [Editorial] Several “Required Action(s)” are phrased as open-ended requests (“need to be defined”, “clarify correct protocol usage”) without proposing concrete spec text, assumptions, or a target spec/TR clause; as written it reads more like a discussion note than a contribution ready to drive a CR.




  13. [Editorial] Terminology is inconsistent/unclear (e.g., “client/UE side inferencing”, “client-side inferencing”, “UE-based AI inferencing”, “AIML/IMS”) and should be aligned with the agreed WI terminology and the referenced flow (S4aR260004a) to avoid ambiguity about the architecture being critiqued.



Sign in to add comments.