Meeting: TSGS4_135_India | Agenda Item: 10.5
Network, QoS and UE Considerations for client side inferencing AIML/IMS
Huawei Tech.(UK) Co.. Ltd
discussion
Agreement
Rel-20
| TDoc | S4-260100 |
| Title | Network, QoS and UE Considerations for client side inferencing AIML/IMS |
| Source | Huawei Tech.(UK) Co.. Ltd |
| Agenda item | 10.5 |
| Agenda item description | AI_IMS-MED (Media aspects for AI/ML in IMS services) |
| Doc type | discussion |
| For action | Agreement |
| Release | Rel-20 |
| download_url | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260100.zip |
| For | Agreement |
| Type | discussion |
| Contact | Rufail Mekuria |
| Uploaded | 2026-02-02T16:28:29.453000 |
| Contact ID | 104180 |
| Revised to | S4-260421 |
| TDoc Status | revised |
| Reservation date | 02/02/2026 16:15:36 |
| Agenda item sort order | 52 |
[Technical] The core premise in §2.2 (downloading a 100 GB model within 500–1000 ms) is not a realistic or relevant requirement for UE-side inferencing; the call flow should instead assume pre-provisioned/on-device models or background download over minutes/hours, otherwise the derived “800 Gbps” conclusion is a strawman that will derail the discussion.
[Technical] §2.1 cites TR 26.927 Table 6.6.2-1 (~40 MB) but then jumps to “public models 100+ GB” without mapping to the IMS/AIML use cases under discussion; the contribution needs to distinguish between (i) UE inference models intended for mobile deployment and (ii) data-center class generative models, otherwise the “required action” to define supported sizes is ungrounded.
[Technical] The text conflates “real-time request-response latency” with “model transfer time” (§2.2); in most architectures the model download is not on the critical path of a single inference transaction, so QoS/latency requirements should be split into (a) inference transaction latency and (b) model acquisition/update latency.
[Technical] Requesting SA2 to “update 5QI specifications” (§2.2) is premature and underspecified: the contribution does not identify whether the model transfer is best treated as GBR/non-GBR, what packet delay budget/jitter/loss are needed, or whether existing 5QIs (e.g., for TCP-based data) are insufficient; without concrete QoS characteristics, SA2 cannot act.
[Technical] §2.4’s protocol critique is internally inconsistent: proposing RTP for “large, quick data downloads” is atypical and ignores reliability, congestion control, and content integrity needs; if the issue is TCP behavior, the more relevant comparison is HTTP/2 vs HTTP/3 (QUIC) and/or segmented download with application-layer pacing rather than RTP.
[Technical] The claim that QUIC “has bindings to 5G XRM framework for improved QoS support” (§2.4) is vague and risks being incorrect/misleading in 3GPP terms; if the intent is to leverage 5G QoS (5QI/ARP/reflective QoS) or ATSSS, the contribution should reference the specific 3GPP mechanisms and how they apply to QUIC flows.
[Technical] §2.3 cites “2–20% compression ratios” from TR 26.927 but does not clarify whether this refers to bitrate reduction, model size reduction, or accuracy trade-offs; without specifying the compression target and acceptable quality loss, the conclusion “still infeasible” is not technically supported.
[Technical] The “No UE capabilities for NN codec support have been defined” point (§2.3) is valid, but the required action is incomplete: it should propose where UE capability signaling would live (e.g., NAS/IMS/UE capability exchange) and what minimum interoperability baseline is assumed if NNC is optional.
[Technical] §2.5 asserts the call flow “indicates model download for every request,” but does not quote the exact steps 12–16 behavior; if the original flow already implies model reuse or versioning, this criticism may be inaccurate—please pinpoint the exact normative/diagram text that mandates per-request download.
[Technical] The proposed “scope limitation” to exclude “complex VLM/LLM” (§3.1) is not actionable without objective criteria (parameter count, model size on disk, compute class, or use-case categories); otherwise it becomes a subjective exclusion that is hard to standardize.
[Technical] The contribution focuses almost entirely on downlink throughput but omits other critical feasibility constraints for UE inferencing (compute, memory footprint, thermal/power, storage, and model integrity/attestation); these are central to whether UE-side inferencing is viable and should be at least acknowledged if the goal is “network, QoS and UE considerations.”
[Editorial] Several “Required Action(s)” are phrased as open-ended requests (“need to be defined”, “clarify correct protocol usage”) without proposing concrete spec text, assumptions, or a target spec/TR clause; as written it reads more like a discussion note than a contribution ready to drive a CR.
[Editorial] Terminology is inconsistent/unclear (e.g., “client/UE side inferencing”, “client-side inferencing”, “UE-based AI inferencing”, “AIML/IMS”) and should be aligned with the agreed WI terminology and the referenced flow (S4aR260004a) to avoid ambiguity about the architecture being critiqued.