S4-260108 - AI Summary

[FS_6G_MED] LLM-based AI services

AI-Generated Summary AI

LLM-based AI Services for 6G Media Study

Introduction

This contribution addresses Work Task 2 objective (d) of FS_6G_MED, which focuses on media communication for emerging AI services. The objective aims to:

Collect and study AI representation formats and traffic characteristics for AI-related media services
Examine use cases including agents, multi-modal large language models, and diffusion models
Identify gaps in 3GPP specifications (e.g., QoS requirements, dynamic traffic characteristics, AI-representation formats)

The contribution notes that SA1 TR 22.870 contains over 60 AI-related use cases, many referencing "tokens" as basic units for Gen-AI models. While tokenized traffic over networks is not yet widely deployed, the fast-paced research warrants SA4's attention to elaborate these terms and establish a framework.

Generic Workflow and Architecture for LLMs

Background on LLMs and MLLMs

The document proposes a more generic architecture than the voice translation-specific model in TR 26.847. Key definitions:

Large Language Models (LLM): AI systems capable of processing and generating natural language, based on transformer architecture with self-attention
Generative Pre-trained Transformers (GPT): Type of LLM forming the basis of modern AI systems (ChatGPT, Gemini, Deepseek, Claude)
Multimodal Large Language Models (MLLM): Models processing multiple input/output modalities (text, images, audio, video) with learned cross-modal alignment

Proposed Generic Architecture

The contribution presents a generic (M)LLM architecture (Figure X.1) with the following components:

Input Processing:
- Tokenizer: Function that converts data of a particular modality into tokens (e.g., words, image patches)
- Modality Encoder: AI/ML model that encodes tokens into token embeddings (e.g., OpenAI's CLIP for images and text)

Processing:
- Combination Layer: Combines input token embeddings with contextual token embeddings, potentially using techniques like RAG for context window management

Output Processing:
- Media Decoder/Generator: Processes LLM output token embeddings into desired format (e.g., natural language)

Key Definitions

Tokens: Discrete units of information in a given modality (words in text, audio frames, image patches) representing meaningful components of AI/ML data with clearly defined boundaries.

Token embeddings (or embeddings): Dense numerical tensors encoding semantic properties, relationships, and contextual meaning of tokens. Transform discrete tokens into continuous mathematical spaces where semantic relationships can be computed through vector operations.

Important Notes

NOTE 1: Current popular AI applications do not generate network traffic composed of token embeddings. Feasibility of such transport using existing protocols is FFS.

NOTE 2: Modern LLM services charge based on number of tokens processed (outcome of modality encoding and combination layers), but user input consists of traditional media (text, images, audio) in the form of prompts.

NOTE 3: In current AI applications, all components on the server side (right of dashed line in architecture) run on the server.

Abbreviations

LLM: Large Language Model
MLLM: Multimodal Large Language Model
RAG: Retrieval-Augmented Generation

Proposal

The document proposes to discuss and agree on the generic architecture and definitions for LLM-based AI applications in Clause 3 as a basis for further work in the study.

Document Information

TDoc:
S4-260108

Source:
Nokia

Type:
discussion

Original Document:
View on 3GPP

Title: [FS_6G_MED] LLM-based AI services

Agenda item: 11.1

Agenda item description: FS_6G_MED (Study on Media aspects for 6G System)

Doc type: discussion

Contact: Saba Ahsan

Uploaded: 2026-02-03T19:37:43.990000

Contact ID: 81411

Revised to: S4aP260018

TDoc Status: noted

Reservation date: 02/02/2026 17:47:14

Agenda item sort order: 60