S4-260100

Network, QoS and UE Considerations for client side inferencing AIML/IMS

Source: Huawei Tech.(UK) Co.. Ltd
Meeting: TSGS4_135_India
Agenda Item: 10.5

All Metadata

Agenda item description	AI_IMS-MED (Media aspects for AI/ML in IMS services)
Doc type	discussion
For action	Agreement
Release	Rel-20
download_url	Download Original
For	Agreement
Type	discussion
Contact	Rufail Mekuria
Uploaded	2026-02-02T16:28:29.453000
Contact ID	104180
Revised to	S4-260421
TDoc Status	revised
Reservation date	02/02/2026 16:15:36
Agenda item sort order	52

Review Comments

manager - 2026-02-09 04:01

[Technical] The core premise in §2.2 (downloading a 100 GB model within 500–1000 ms) is not a realistic or relevant requirement for UE-side inferencing; the call flow should instead assume pre-provisioned/on-device models or background download over minutes/hours, otherwise the derived “800 Gbps” conclusion is a strawman that will derail the discussion.

[Technical] §2.1 cites TR 26.927 Table 6.6.2-1 (~40 MB) but then jumps to “public models 100+ GB” without mapping to the IMS/AIML use cases under discussion; the contribution needs to distinguish between (i) UE inference models intended for mobile deployment and (ii) data-center class generative models, otherwise the “required action” to define supported sizes is ungrounded.

[Technical] The text conflates “real-time request-response latency” with “model transfer time” (§2.2); in most architectures the model download is not on the critical path of a single inference transaction, so QoS/latency requirements should be split into (a) inference transaction latency and (b) model acquisition/update latency.

[Technical] Requesting SA2 to “update 5QI specifications” (§2.2) is premature and underspecified: the contribution does not identify whether the model transfer is best treated as GBR/non-GBR, what packet delay budget/jitter/loss are needed, or whether existing 5QIs (e.g., for TCP-based data) are insufficient; without concrete QoS characteristics, SA2 cannot act.

[Technical] §2.4’s protocol critique is internally inconsistent: proposing RTP for “large, quick data downloads” is atypical and ignores reliability, congestion control, and content integrity needs; if the issue is TCP behavior, the more relevant comparison is HTTP/2 vs HTTP/3 (QUIC) and/or segmented download with application-layer pacing rather than RTP.

[Technical] The claim that QUIC “has bindings to 5G XRM framework for improved QoS support” (§2.4) is vague and risks being incorrect/misleading in 3GPP terms; if the intent is to leverage 5G QoS (5QI/ARP/reflective QoS) or ATSSS, the contribution should reference the specific 3GPP mechanisms and how they apply to QUIC flows.

[Technical] §2.3 cites “2–20% compression ratios” from TR 26.927 but does not clarify whether this refers to bitrate reduction, model size reduction, or accuracy trade-offs; without specifying the compression target and acceptable quality loss, the conclusion “still infeasible” is not technically supported.

[Technical] The “No UE capabilities for NN codec support have been defined” point (§2.3) is valid, but the required action is incomplete: it should propose where UE capability signaling would live (e.g., NAS/IMS/UE capability exchange) and what minimum interoperability baseline is assumed if NNC is optional.

[Technical] §2.5 asserts the call flow “indicates model download for every request,” but does not quote the exact steps 12–16 behavior; if the original flow already implies model reuse or versioning, this criticism may be inaccurate—please pinpoint the exact normative/diagram text that mandates per-request download.

[Technical] The proposed “scope limitation” to exclude “complex VLM/LLM” (§3.1) is not actionable without objective criteria (parameter count, model size on disk, compute class, or use-case categories); otherwise it becomes a subjective exclusion that is hard to standardize.

[Technical] The contribution focuses almost entirely on downlink throughput but omits other critical feasibility constraints for UE inferencing (compute, memory footprint, thermal/power, storage, and model integrity/attestation); these are central to whether UE-side inferencing is viable and should be at least acknowledged if the goal is “network, QoS and UE considerations.”

[Editorial] Several “Required Action(s)” are phrased as open-ended requests (“need to be defined”, “clarify correct protocol usage”) without proposing concrete spec text, assumptions, or a target spec/TR clause; as written it reads more like a discussion note than a contribution ready to drive a CR.

[Editorial] Terminology is inconsistent/unclear (e.g., “client/UE side inferencing”, “client-side inferencing”, “UE-based AI inferencing”, “AIML/IMS”) and should be aligned with the agreed WI terminology and the referenced flow (S4aR260004a) to avoid ambiguity about the architecture being critiqued.

<ol>
<li>
[Technical] The core premise in §2.2 (downloading a 100 GB model within 500–1000 ms) is not a realistic or relevant requirement for UE-side inferencing; the call flow should instead assume pre-provisioned/on-device models or background download over minutes/hours, otherwise the derived “800 Gbps” conclusion is a strawman that will derail the discussion.
</li>
<li>
[Technical] §2.1 cites TR 26.927 Table 6.6.2-1 (~40 MB) but then jumps to “public models 100+ GB” without mapping to the IMS/AIML use cases under discussion; the contribution needs to distinguish between (i) UE inference models intended for mobile deployment and (ii) data-center class generative models, otherwise the “required action” to define supported sizes is ungrounded.
</li>
<li>
[Technical] The text conflates “real-time request-response latency” with “model transfer time” (§2.2); in most architectures the model download is not on the critical path of a single inference transaction, so QoS/latency requirements should be split into (a) inference transaction latency and (b) model acquisition/update latency.
</li>
<li>
[Technical] Requesting SA2 to “update 5QI specifications” (§2.2) is premature and underspecified: the contribution does not identify whether the model transfer is best treated as GBR/non-GBR, what packet delay budget/jitter/loss are needed, or whether existing 5QIs (e.g., for TCP-based data) are insufficient; without concrete QoS characteristics, SA2 cannot act.
</li>
<li>
[Technical] §2.4’s protocol critique is internally inconsistent: proposing RTP for “large, quick data downloads” is atypical and ignores reliability, congestion control, and content integrity needs; if the issue is TCP behavior, the more relevant comparison is HTTP/2 vs HTTP/3 (QUIC) and/or segmented download with application-layer pacing rather than RTP.
</li>
<li>
[Technical] The claim that QUIC “has bindings to 5G XRM framework for improved QoS support” (§2.4) is vague and risks being incorrect/misleading in 3GPP terms; if the intent is to leverage 5G QoS (5QI/ARP/reflective QoS) or ATSSS, the contribution should reference the specific 3GPP mechanisms and how they apply to QUIC flows.
</li>
<li>
[Technical] §2.3 cites “2–20% compression ratios” from TR 26.927 but does not clarify whether this refers to bitrate reduction, model size reduction, or accuracy trade-offs; without specifying the compression target and acceptable quality loss, the conclusion “still infeasible” is not technically supported.
</li>
<li>
[Technical] The “No UE capabilities for NN codec support have been defined” point (§2.3) is valid, but the required action is incomplete: it should propose where UE capability signaling would live (e.g., NAS/IMS/UE capability exchange) and what minimum interoperability baseline is assumed if NNC is optional.
</li>
<li>
[Technical] §2.5 asserts the call flow “indicates model download for every request,” but does not quote the exact steps 12–16 behavior; if the original flow already implies model reuse or versioning, this criticism may be inaccurate—please pinpoint the exact normative/diagram text that mandates per-request download.
</li>
<li>
[Technical] The proposed “scope limitation” to exclude “complex VLM/LLM” (§3.1) is not actionable without objective criteria (parameter count, model size on disk, compute class, or use-case categories); otherwise it becomes a subjective exclusion that is hard to standardize.
</li>
<li>
[Technical] The contribution focuses almost entirely on downlink throughput but omits other critical feasibility constraints for UE inferencing (compute, memory footprint, thermal/power, storage, and model integrity/attestation); these are central to whether UE-side inferencing is viable and should be at least acknowledged if the goal is “network, QoS and UE considerations.”
</li>
<li>
[Editorial] Several “Required Action(s)” are phrased as open-ended requests (“need to be defined”, “clarify correct protocol usage”) without proposing concrete spec text, assumptions, or a target spec/TR clause; as written it reads more like a discussion note than a contribution ready to drive a CR.
</li>
<li>
[Editorial] Terminology is inconsistent/unclear (e.g., “client/UE side inferencing”, “client-side inferencing”, “UE-based AI inferencing”, “AIML/IMS”) and should be aligned with the agreed WI terminology and the referenced flow (S4aR260004a) to avoid ambiguity about the architecture being critiqued.
</li>
</ol>