Meeting: TSGS4_135_India | Agenda Item: 11.1
6GMedia - AI terminology
InterDigital New York
discussion
Agreement
| TDoc | S4-260273 |
| Title | 6GMedia - AI terminology |
| Source | InterDigital New York |
| Agenda item | 11.1 |
| Agenda item description | FS_6G_MED (Study on Media aspects for 6G System) |
| Doc type | discussion |
| For action | Agreement |
| Release | Rel-20 |
| Specification | 26.87 |
| download_url | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_135_India/Docs/S4-260273.zip |
| For | Agreement |
| Spec | 26.87 |
| Type | discussion |
| Contact | Gaelle Martin-Cocher |
| Uploaded | 2026-02-03T22:05:34.897000 |
| Contact ID | 91571 |
| TDoc Status | noted |
| Reservation date | 03/02/2026 21:55:32 |
| Agenda item sort order | 60 |
[Technical] The proposal is not framed as normative 3GPP terminology (no alignment to TR 21.905 style, no indication of scope/authority), so inserting these definitions into TR 26.870 risks creating conflicting “official” definitions versus existing SA4/SA2 terms (e.g., “feature”, “descriptor”, “intermediate data”) without a clear governance statement.
[Technical] “Soft token” is defined as a continuous vector that “replaces or augments a hard token” and is “processed similarly,” but in many architectures tokens remain discrete indices while the embedding is continuous; the current text blurs token vs embedding and will confuse traffic characterization discussions (bits on the wire are typically hard-token indices or coded latents, not “soft tokens”).
[Technical] The definition “Embedding… Not inherently part of a token sequence” is incorrect/incomplete for common transformer pipelines where embeddings are exactly the per-token continuous representations forming the input sequence; this contradiction undermines the intended clarity between “token”, “embedding”, and “latent”.
[Technical] “Learned based media compression (representation)” is described as “syntax-defined coded form derived from latent representation after quantization and entropy coding,” which excludes important learned codecs that do not use explicit entropy coding or use arithmetic coding over discrete tokens; the definition should be generalized or explicitly scoped to avoid being wrong for major classes of neural codecs.
[Technical] The “Model exchange representation” examples include GGUF, which is primarily a model file/container format for specific inference stacks rather than an interoperable operator-graph exchange format like ONNX/NNEF; mixing these may mislead SA4 on what is realistically exchangeable across vendors.
[Technical] The “Internal vs external representation” matrix is largely FFS and asserts “Model exchange representation: Not internal,” but model formats can be internal to a system component boundary (e.g., between orchestrator and accelerator runtime); the internal/external dichotomy needs a 3GPP-entity/interface context (UE/AF/AS/NEF, etc.) to be meaningful.
[Technical] “Intermediate data” is said to “include intermediate coded representation, feature representation or descriptors” and references TR 26.927, but the contribution does not quote or ensure consistency with the exact TR 26.927 definition; this risks redefining an existing SA4 term and should be cross-checked verbatim.
[Technical] The applicability matrix makes strong modality claims (e.g., “Text: prevalent method hard tokens”, “Audio: prevalent method latents + embeddings”) that are architecture-dependent and not stable enough for a TR unless clearly labeled as informative examples; otherwise it may bias later requirements/traffic models incorrectly.
[Technical] “Inference results” includes “W3C Media Annotations” as an example, but that is a metadata framework rather than an AI inference output format; the example set should be constrained to representations relevant to 3GPP media workflows (e.g., bounding boxes, masks, captions) and to what is exchanged over 3GPP interfaces.
[Editorial] Several terms are introduced without consistent naming/grammar (“Learned based…” vs “Learned-based…”, “latent representation (latent)”, “exchangeable/external representation”), which will read poorly in TR 26.870 and complicate cross-referencing.
[Editorial] The contribution proposes “include sections 1 to 3” but does not specify the exact target clause/subclause in TR 26.870, nor provide proposed text with numbering and definitions formatting; this makes it hard to assess integration impact and creates editorial ambiguity for rapporteurs.
[Editorial] Examples mix standards and non-standards inconsistently (JPEG AI, MPEG AI-PCC, ONNX, NNEF, GGUF, NNC) without citations; TR text should either cite stable references or avoid listing volatile ecosystem artifacts that may date quickly.