Network, QoS and UE Considerations for client side inferencing AIML/IMS
Proposal 1: Add a note that this can only work for very simple cases excluding complex VLM/LLM explicitly in the text and limited to a model size, and what use cases this can be used for that can use such smaller models.
Proposal 2: Clarify end-end latency requirements and derive required bit-rate/latency and loss profiles
Proposal 3: Clarify the correct protocol usage to support this use case and the required latency, typically not HTTP/TCP.
Proposal 4: Ask SA2 how such burst can be supported and if a new QoS profile is needed or if existing.
Proposal 5: Clarify the required support of neural network codec if any for the UE
Proposal 6: Consider adding caching and model updates in the call flow to avoid downloading a model for each task.