S4-260108 - AI Summary

[FS_6G_MED] LLM-based AI services

Back to Agenda Download Summary
AI-Generated Summary AI

LLM-based AI Services for 6G Media Study

Introduction

This contribution addresses Work Task 2 objective (d) of FS_6G_MED, which focuses on media communication for emerging AI services. The objective aims to:

  • Collect and study AI representation formats and traffic characteristics for AI-related media services
  • Examine use cases including agents, multi-modal large language models, and diffusion models
  • Identify gaps in 3GPP specifications (e.g., QoS requirements, dynamic traffic characteristics, AI-representation formats)

The contribution notes that SA1 TR 22.870 contains over 60 AI-related use cases, many referencing "tokens" as basic units for Gen-AI models. While tokenized traffic over networks is not yet widely deployed, the fast-paced research warrants SA4's attention to elaborate these terms and establish a framework.

Generic Workflow and Architecture for LLMs

Background on LLMs and MLLMs

The document proposes a more generic architecture than the voice translation-specific model in TR 26.847. Key definitions:

  • Large Language Models (LLM): AI systems capable of processing and generating natural language, based on transformer architecture with self-attention
  • Generative Pre-trained Transformers (GPT): Type of LLM forming the basis of modern AI systems (ChatGPT, Gemini, Deepseek, Claude)
  • Multimodal Large Language Models (MLLM): Models processing multiple input/output modalities (text, images, audio, video) with learned cross-modal alignment

Proposed Generic Architecture

The contribution presents a generic (M)LLM architecture (Figure X.1) with the following components:

Input Processing:
- Tokenizer: Function that converts data of a particular modality into tokens (e.g., words, image patches)
- Modality Encoder: AI/ML model that encodes tokens into token embeddings (e.g., OpenAI's CLIP for images and text)

Processing:
- Combination Layer: Combines input token embeddings with contextual token embeddings, potentially using techniques like RAG for context window management

Output Processing:
- Media Decoder/Generator: Processes LLM output token embeddings into desired format (e.g., natural language)

Key Definitions

Tokens: Discrete units of information in a given modality (words in text, audio frames, image patches) representing meaningful components of AI/ML data with clearly defined boundaries.

Token embeddings (or embeddings): Dense numerical tensors encoding semantic properties, relationships, and contextual meaning of tokens. Transform discrete tokens into continuous mathematical spaces where semantic relationships can be computed through vector operations.

Important Notes

NOTE 1: Current popular AI applications do not generate network traffic composed of token embeddings. Feasibility of such transport using existing protocols is FFS.

NOTE 2: Modern LLM services charge based on number of tokens processed (outcome of modality encoding and combination layers), but user input consists of traditional media (text, images, audio) in the form of prompts.

NOTE 3: In current AI applications, all components on the server side (right of dashed line in architecture) run on the server.

Abbreviations

  • LLM: Large Language Model
  • MLLM: Multimodal Large Language Model
  • RAG: Retrieval-Augmented Generation

Proposal

The document proposes to discuss and agree on the generic architecture and definitions for LLM-based AI applications in Clause 3 as a basis for further work in the study.

Document Information
Source:
Nokia
Type:
discussion
Original Document:
View on 3GPP
Title: [FS_6G_MED] LLM-based AI services
Agenda item: 11.1
Agenda item description: FS_6G_MED (Study on Media aspects for 6G System)
Doc type: discussion
Contact: Saba Ahsan
Uploaded: 2026-02-03T19:37:43.990000
Contact ID: 81411
Revised to: S4aP260018
TDoc Status: noted
Reservation date: 02/02/2026 17:47:14
Agenda item sort order: 60