Review: 11.1

FS_6G_MED (Study on Media aspects for 6G System)

Meeting:

Table Views Available

View all summaries and proposals in a convenient tabular format:

All Summaries (Table) All Proposals (Table)

Show columns:

TDoc Number & Links	Title	Source	Proposals	Comments
S4-260009 PDF Edit	[FS_6G_MED] Work Plan for Media Aspects for 6G System	Qualcomm Incorporated (Rapporteur)	Extracted Proposals This document does not contain any proposals. The document is a work plan for the Study on Media Aspects for 6G System (FS_6G_MED) and includes objectives, timelines, work topics, and meeting schedules, but no formal proposals are presented.	You should sign in to be able to post reviews Sign In
S4-260010 PDF Edit	TR skeleton for FS_6G_MED	VODAFONE Group Plc	This document does not contain any technical proposals. The document is a skeleton/template for 3GPP TR 26.870 V0.0.1 "Study on Media Aspects for 6G System" and consists only of structural elements (table of contents, editor's notes, and empty sections) without any actual proposals.	You should sign in to be able to post reviews Sign In
S4-260057 PDF Edit	[FS_6G_MED] Some considerations on ways of working	Qualcomm Incorporated (Rapporteur)	Extracted Proposals Proposal 1: Agree on basic assumptions in clause 2 and invite for documenting those in TR 26.870. Proposal 2: Use the SA4 TOR as guidance for the work in the FS_6G_MED study, and continue to collect inputs to improve media specifications taking into account the aspects in clause 3. Proposal 3: Agree on the organization of the work based on clause 4. Parts of this is covered by the work plan in S4aP26002. Proposal 4: Agree on the skeleton of TR 26.870 based on the rationales in clause 5. A separate contribution is submitted in S4aP26004. Proposal 5: Agree on the main work topics and the priorities provided in clause 6. Proposal 6: Agree on the handling of AI traffic characteristics based on clause 7. Proposal 7: Invite for contributions on for larger themes for 6G that SA4 is expected to contribute technologies and use those provided in clause 8 as starting point.	Previous Reviews: manager 2026-02-09 04:23:03 [Technical] The proposal that SA4 should “define AI formats (tokens, embeddings, latents…)” risks stepping outside SA4’s remit and duplicating/contradicting work in external SDOs (IETF/MPEG) and other 3GPP groups; it should be reframed as characterizing traffic/QoS/QoE requirements and media-relevant payload properties rather than defining formats. [Technical] “6G Media is 5G Media unless agreed differently” is too strong as a baseline for a feasibility study and may pre-empt legitimate 6G-driven requirements (e.g., compute-in-the-loop, sensing-media fusion, new security/privacy models); it should be qualified to avoid constraining the study outcome before gap analysis. [Technical] The AI traffic work item lacks clear boundaries vs SA1 (service requirements) and SA2 (architecture), and the document does not specify concrete SA4 deliverables (e.g., traffic models, QoE metrics, codec/transport implications) or interfaces to other WGs, making it hard to avoid overlap and ensure timely, actionable output. [Technical] The statement that traffic characteristics should be developed “independent of access network” conflicts with the intent to identify “opportunities for future networks,” since key AI-media KPIs (latency/jitter/loss, uplink/downlink asymmetry, edge compute placement) are inherently coupled to access and system architecture; at minimum, the study should define a set of reference network profiles rather than claiming independence. [Technical] WT#3 monitoring of SA2 “Key Issues #20-#22 in TR 23.801-01” is underspecified: it does not identify what SA4-specific outputs are expected (e.g., media/sensing data models, timing constraints, synchronization, compute offload impacts on media pipelines), so “minimal priority” risks missing early architectural hooks that later become hard to change. [Technical] WT#4 “Media for ubiquitous access” focuses on data rates/scheduling over NTN but omits core SA4 media aspects that typically dominate feasibility (buffering strategies, playout adaptation, FEC/repair, long RTT impacts on interactive media, multicast/broadcast applicability), so the scope as written is not sufficient for meaningful SA4 conclusions. [Technical] WT#5 “Trusted and private media communication” is framed as potentially separate but provides no linkage to existing 3GPP security/privacy mechanisms (e.g., identity, key management, E2E media protection, lawful intercept constraints), risking a vague annex with no actionable recommendations. [Technical] The proposed TR structure (“each work topic may use a dedicated Annex”) combined with “opportunistic” collection risks producing a set of disconnected notes; the TR needs an explicit consolidation mechanism (common terminology, KPI set, cross-topic dependency mapping, and a single recommendations clause with traceability to use cases/requirements). [Technical] The plan to accept “slide decks and workshop-style contributions” without stating how they will be normalized into TR text (definitions, assumptions, evaluation methodology) risks non-reproducible conclusions, especially for QoE/AI measurements where methodology consistency is critical. [Technical] The document references reuse of “other 5G media studies” and “FS_AMD_Ph2” but does not identify which outputs are intended to be baselined into TR 26.870 (e.g., specific clauses, findings, KPIs), making it unclear how reuse will be operationalized and avoiding re-litigation. [Editorial] Several priorities are expressed qualitatively (“low to medium,” “minimal,” “near-separate study”) without criteria or decision gates; for a ways-of-working paper, add explicit triggers (e.g., SA1 requirement availability, SA2 dependency milestones) to justify reprioritization. [Editorial] The document repeatedly mixes “6G media,” “media delivery architecture,” and “AI traffic” without defining terms or scope boundaries (e.g., what counts as “media” vs generic data/AI payload), which will cause inconsistent interpretation when drafting TR 26.870. [Editorial] References are informal (e.g., “TR 23.801-01,” “ULBC work”) and should be cited with correct identifiers/titles and versioning expectations to avoid ambiguity when the TR is drafted and maintained over multiple releases. <ol> <li> <p><strong>[Technical]</strong> The proposal that SA4 should “define AI formats (tokens, embeddings, latents…)” risks stepping outside SA4’s remit and duplicating/contradicting work in external SDOs (IETF/MPEG) and other 3GPP groups; it should be reframed as characterizing traffic/QoS/QoE requirements and media-relevant payload properties rather than defining formats.</p> </li> <li> <p><strong>[Technical]</strong> “6G Media is 5G Media unless agreed differently” is too strong as a baseline for a feasibility study and may pre-empt legitimate 6G-driven requirements (e.g., compute-in-the-loop, sensing-media fusion, new security/privacy models); it should be qualified to avoid constraining the study outcome before gap analysis.</p> </li> <li> <p><strong>[Technical]</strong> The AI traffic work item lacks clear boundaries vs SA1 (service requirements) and SA2 (architecture), and the document does not specify concrete SA4 deliverables (e.g., traffic models, QoE metrics, codec/transport implications) or interfaces to other WGs, making it hard to avoid overlap and ensure timely, actionable output.</p> </li> <li> <p><strong>[Technical]</strong> The statement that traffic characteristics should be developed “independent of access network” conflicts with the intent to identify “opportunities for future networks,” since key AI-media KPIs (latency/jitter/loss, uplink/downlink asymmetry, edge compute placement) are inherently coupled to access and system architecture; at minimum, the study should define a set of reference network profiles rather than claiming independence.</p> </li> <li> <p><strong>[Technical]</strong> WT#3 monitoring of SA2 “Key Issues #20-#22 in TR 23.801-01” is underspecified: it does not identify what SA4-specific outputs are expected (e.g., media/sensing data models, timing constraints, synchronization, compute offload impacts on media pipelines), so “minimal priority” risks missing early architectural hooks that later become hard to change.</p> </li> <li> <p><strong>[Technical]</strong> WT#4 “Media for ubiquitous access” focuses on data rates/scheduling over NTN but omits core SA4 media aspects that typically dominate feasibility (buffering strategies, playout adaptation, FEC/repair, long RTT impacts on interactive media, multicast/broadcast applicability), so the scope as written is not sufficient for meaningful SA4 conclusions.</p> </li> <li> <p><strong>[Technical]</strong> WT#5 “Trusted and private media communication” is framed as potentially separate but provides no linkage to existing 3GPP security/privacy mechanisms (e.g., identity, key management, E2E media protection, lawful intercept constraints), risking a vague annex with no actionable recommendations.</p> </li> <li> <p><strong>[Technical]</strong> The proposed TR structure (“each work topic may use a dedicated Annex”) combined with “opportunistic” collection risks producing a set of disconnected notes; the TR needs an explicit consolidation mechanism (common terminology, KPI set, cross-topic dependency mapping, and a single recommendations clause with traceability to use cases/requirements).</p> </li> <li> <p><strong>[Technical]</strong> The plan to accept “slide decks and workshop-style contributions” without stating how they will be normalized into TR text (definitions, assumptions, evaluation methodology) risks non-reproducible conclusions, especially for QoE/AI measurements where methodology consistency is critical.</p> </li> <li> <p><strong>[Technical]</strong> The document references reuse of “other 5G media studies” and “FS_AMD_Ph2” but does not identify which outputs are intended to be baselined into TR 26.870 (e.g., specific clauses, findings, KPIs), making it unclear how reuse will be operationalized and avoiding re-litigation.</p> </li> <li> <p><strong>[Editorial]</strong> Several priorities are expressed qualitatively (“low to medium,” “minimal,” “near-separate study”) without criteria or decision gates; for a ways-of-working paper, add explicit triggers (e.g., SA1 requirement availability, SA2 dependency milestones) to justify reprioritization.</p> </li> <li> <p><strong>[Editorial]</strong> The document repeatedly mixes “6G media,” “media delivery architecture,” and “AI traffic” without defining terms or scope boundaries (e.g., what counts as “media” vs generic data/AI payload), which will cause inconsistent interpretation when drafting TR 26.870.</p> </li> <li> <p><strong>[Editorial]</strong> References are informal (e.g., “TR 23.801-01,” “ULBC work”) and should be cited with correct identifiers/titles and versioning expectations to avoid ambiguity when the TR is drafted and maintained over multiple releases.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260058 PDF Edit	[FS_6G_MED] Preliminaries: assumptions and requirements	Qualcomm Korea	Based on my analysis of the document, there are no proposals in the traditional format that you've described. The document contains: - A "Conclusions" section (Section 3) that states "This serves as starting point" but contains no formal proposals - A "Proposal" section (Section 4) that states "It is proposed to agree the following changes to 3GPP TR 26.870v0.0.1" followed by technical content changes However, Section 4 is describing a change request proposal for the document itself (proposing to add/modify content), rather than containing numbered or formatted proposals within the technical content. The technical content that follows (the changes between "First Change" markers) consists of references, definitions, assumptions, and requirements, but does not contain any statements formatted as "Proposal X:", "Proposal X.", "Proposal:", etc. Therefore, this document does not contain any proposals in the specified formats.	Previous Reviews: manager 2026-02-09 04:23:29 [Technical] The “Single Core Network / standalone architecture” assumption (Clause 4.1, item 2) is too strong and risks contradicting ongoing 6G architecture exploration (e.g., interworking/multi-core evolution paths); it should be framed as a working assumption with explicit scope/limitations or aligned verbatim to TR 23.801-01 wording. [Technical] The “No duplication between 6G RAN and 6G CN” assumption (Clause 4.1, item 4) is not generally valid for media delivery, where edge/RAN functions (e.g., caching, adaptation, compute offload) may intentionally duplicate or complement CN functions; the text should clarify what “duplication” means and whether it excludes edge media functions. [Technical] “IMS for real-time services / MMTel voice/video services provided by IMS” (Clause 4.1, item 5) is overly prescriptive for a 6G media delivery study and conflicts with the later emphasis on “Real-time Communication beyond IMS”; it should be rephrased to avoid constraining non-IMS RTC service models. [Technical] The mapping “TS 26.114 as baseline for Voice Services” (Clause 4.1, item 11) is conceptually off: TS 26.114 specifies media handling for IMS conversational services, not “voice services” in general; if the intent is conversational media, say “IMS conversational services,” and separately address non-IMS RTC media. [Technical] The “Native TN/NTN support” assumption (Clause 4.1, item 6) is not actionable for SA4 without identifying media-specific NTN implications (latency/jitter, forward error correction, buffering, bitrate adaptation, multicast/broadcast, QoE); the contribution should at least list the media delivery impacts that motivate the assumption. [Technical] The “Key Issues baseline” list (Clause 4.1) appears selectively copied and may be incomplete/misaligned (e.g., missing security/privacy, energy efficiency, mobility continuity, multicast/broadcast evolution) for media delivery; the document should justify why these specific SA2 key issues are the baseline for FS_6G_MED. [Technical] Clause 4.2 “Requirements” is a placeholder with no normative structure (e.g., requirement identifiers, categories, traceability to SA1 use cases); for a study TR, at least a requirements framework and initial high-level requirements should be included or the clause deferred until inputs exist. [Technical] The “Existing Media Services” classification into “full media services” vs “media service enablers” is useful but currently ambiguous: several items (e.g., 5GMS, 5G RTC) can be seen as both service and enabler depending on deployment; the text should define criteria (e.g., Stage-1/2/3 completeness, interoperability scope) to avoid inconsistent categorization. [Technical] The XR section (Clause 4.3.2) is described at a very high level and misses key SA4-relevant baselines (e.g., which XR specs/TS/TR are the starting point, what traffic models/codecs/formats are assumed); without concrete references, it is hard to use as a baseline for 6G media work. [Technical] The list of “5G media specifications as starting points” (Clause 4.1, items 8–12) omits important adjacent baselines for media delivery (e.g., QoE/QoS measurement/reporting, media synchronization, accessibility, security/DRM considerations where applicable); the baseline set should be checked for completeness relative to the intended 6G media scope. [Editorial] Several references look non-existent or placeholder (e.g., “TS 22.ABC”, “TR 23.801-01” naming) and should be validated against official 3GPP document identifiers; incorrect references will block approval. [Editorial] The contribution contains many editor’s notes and “to be added” statements (Clauses 4.1–4.3), which weakens it as a CR; either convert to a discussion paper or provide concrete text with clear scope and remove open-ended notes. [Editorial] Terminology is inconsistent (“6G CN”, “single 6G CN type”, “5GC SBA as starting point”, “6G RAN connects to single core”) and may confuse readers; align terms with SA2 conventions and define abbreviations (TN/NTN, RTC, MSE, MBS, XR) at first use. [Editorial] The “References section adds comprehensive normative and informative references” claim is not substantiated by the summary and mixes normative/informative without rationale; the CR should clearly separate normative vs informative references and justify why each is needed for TR 26.870. <ol> <li> <p><strong>[Technical]</strong> The “Single Core Network / standalone architecture” assumption (Clause 4.1, item 2) is too strong and risks contradicting ongoing 6G architecture exploration (e.g., interworking/multi-core evolution paths); it should be framed as a working assumption with explicit scope/limitations or aligned verbatim to TR 23.801-01 wording.</p> </li> <li> <p><strong>[Technical]</strong> The “No duplication between 6G RAN and 6G CN” assumption (Clause 4.1, item 4) is not generally valid for media delivery, where edge/RAN functions (e.g., caching, adaptation, compute offload) may intentionally duplicate or complement CN functions; the text should clarify what “duplication” means and whether it excludes edge media functions.</p> </li> <li> <p><strong>[Technical]</strong> “IMS for real-time services / MMTel voice/video services provided by IMS” (Clause 4.1, item 5) is overly prescriptive for a 6G media delivery study and conflicts with the later emphasis on “Real-time Communication beyond IMS”; it should be rephrased to avoid constraining non-IMS RTC service models.</p> </li> <li> <p><strong>[Technical]</strong> The mapping “TS 26.114 as baseline for Voice Services” (Clause 4.1, item 11) is conceptually off: TS 26.114 specifies media handling for IMS conversational services, not “voice services” in general; if the intent is conversational media, say “IMS conversational services,” and separately address non-IMS RTC media.</p> </li> <li> <p><strong>[Technical]</strong> The “Native TN/NTN support” assumption (Clause 4.1, item 6) is not actionable for SA4 without identifying media-specific NTN implications (latency/jitter, forward error correction, buffering, bitrate adaptation, multicast/broadcast, QoE); the contribution should at least list the media delivery impacts that motivate the assumption.</p> </li> <li> <p><strong>[Technical]</strong> The “Key Issues baseline” list (Clause 4.1) appears selectively copied and may be incomplete/misaligned (e.g., missing security/privacy, energy efficiency, mobility continuity, multicast/broadcast evolution) for media delivery; the document should justify why these specific SA2 key issues are the baseline for FS_6G_MED.</p> </li> <li> <p><strong>[Technical]</strong> Clause 4.2 “Requirements” is a placeholder with no normative structure (e.g., requirement identifiers, categories, traceability to SA1 use cases); for a study TR, at least a requirements framework and initial high-level requirements should be included or the clause deferred until inputs exist.</p> </li> <li> <p><strong>[Technical]</strong> The “Existing Media Services” classification into “full media services” vs “media service enablers” is useful but currently ambiguous: several items (e.g., 5GMS, 5G RTC) can be seen as both service and enabler depending on deployment; the text should define criteria (e.g., Stage-1/2/3 completeness, interoperability scope) to avoid inconsistent categorization.</p> </li> <li> <p><strong>[Technical]</strong> The XR section (Clause 4.3.2) is described at a very high level and misses key SA4-relevant baselines (e.g., which XR specs/TS/TR are the starting point, what traffic models/codecs/formats are assumed); without concrete references, it is hard to use as a baseline for 6G media work.</p> </li> <li> <p><strong>[Technical]</strong> The list of “5G media specifications as starting points” (Clause 4.1, items 8–12) omits important adjacent baselines for media delivery (e.g., QoE/QoS measurement/reporting, media synchronization, accessibility, security/DRM considerations where applicable); the baseline set should be checked for completeness relative to the intended 6G media scope.</p> </li> <li> <p><strong>[Editorial]</strong> Several references look non-existent or placeholder (e.g., “TS 22.ABC”, “TR 23.801-01” naming) and should be validated against official 3GPP document identifiers; incorrect references will block approval.</p> </li> <li> <p><strong>[Editorial]</strong> The contribution contains many editor’s notes and “to be added” statements (Clauses 4.1–4.3), which weakens it as a CR; either convert to a discussion paper or provide concrete text with clear scope and remove open-ended notes.</p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent (“6G CN”, “single 6G CN type”, “5GC SBA as starting point”, “6G RAN connects to single core”) and may confuse readers; align terms with SA2 conventions and define abbreviations (TN/NTN, RTC, MSE, MBS, XR) at first use.</p> </li> <li> <p><strong>[Editorial]</strong> The “References section adds comprehensive normative and informative references” claim is not substantiated by the summary and mixes normative/informative without rationale; the CR should clearly separate normative vs informative references and justify why each is needed for TR 26.870.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260059 PDF Edit	[FS_6G_MED] Considerations on Work Topic 4: Ubiquitous access	Qualcomm Korea	Proposal 1: It is proposed to agree the following changes to 3GPP TR 26.870v0.0.1.	You should sign in to be able to post reviews Sign In
S4-260060 PDF Edit	[FS_6G_MED] Requirements and associated use cases	IIT Bombay, Free Stream Technologies, One media 3.0	Extracted Proposals This document does not contain any proposals. The document is a 3GPP Change Request that proposes draft content for TR 26.870, but it does not include any sections explicitly marked as "Proposal" with the formatting patterns specified (e.g., "Proposal X:", "Proposal:", etc.). The document contains a section titled "3. Proposal" which states "It is proposed to agree the following changes to the 3GPP Draft TR 26.870," but this is followed by technical content and requirements rather than formatted proposals. The content consists of potential requirements identified from TR 22.870, formatted as "[PR X.X.X-X]" (Potential Requirements), not as "Proposals."	Previous Reviews: manager 2026-02-09 05:02:48 [Technical] The proposed “new normative references” include documents that are not stable or may not exist (e.g., TS 22.ABC, TR 23.801-01), which is not acceptable for normative referencing and will block approval unless replaced with valid 3GPP identifiers and correct release/stage. [Technical] Making TR 22.870 a normative reference in a TR is questionable because TRs typically avoid normative dependencies; if requirements are being derived, TR 22.870 should generally be informative and the text should avoid “shall”-style normative language. [Technical] Clause 4.2 requirements are largely copied from SA1 “PR” statements (e.g., [PR 5.9.8.2-1], [PR 6.50.6-2]) without translating them into media-system-specific requirements for TR 26.870 (e.g., what 26-series functions, interfaces, or media KPIs are impacted), so the content risks being non-actionable for SA4. [Technical] The “Non-3GPP access support” text introduces ATSC/DVB as non‑3GPP access for multicast/broadcast, but it does not specify the assumed integration model (trusted/untrusted non‑3GPP access, service layer vs access layer, UE capabilities), nor how this aligns with 3GPP multicast/broadcast enablers—leaving a major architectural ambiguity. [Technical] Requirement 4 (“UEs determine appropriate access technology when congestion detected, initial access only”) is underspecified and potentially conflicts with 3GPP access selection policy control (ANDSP/URSP/PCF-type mechanisms); it needs clarity on who decides (UE vs network), what inputs are available, and how operator policy is enforced. [Technical] “Interworking with legacy systems” lists IMS/MMTel, broadcast/multicast, and MPS but does not identify the media continuity requirements (codecs, service layer APIs, QoS/QCI/5QI mapping, session continuity, emergency/priority handling), so it reads as a generic service list rather than SA4 requirements. [Technical] The “Enhanced Network Service Awareness” requirements propose per-component service characteristics and differentiated charging, but do not define the granularity (flow, sub-flow, media component), identifiers, or mapping to existing 5GMS/RTM mechanisms (e.g., provisioning, metrics reporting, QoS signaling), risking inconsistency with TS 26.501/26.506 concepts. [Technical] AI-related requirements (e.g., “image to video”, “2D to 3D/avatar”, “text to video”) are framed as network-supported transformations but lack constraints on latency, privacy, content authenticity, and user consent enforcement; without these, the requirements are incomplete and may conflict with regulatory/security expectations. [Technical] Several AI requirements mix responsibilities across “network”, “application enablement layer”, “Service Hosting Environment”, and “IMS” without defining which 3GPP entity provides the function (e.g., 5GMS AF/MDF, RTM AS, edge hosting), creating scope creep and unclear ownership for SA4. [Technical] The “video-based AI inference” requirement uses unusual media KPIs (“packet error rate per video frame”) that do not map cleanly to 3GPP QoS models; it should be re-expressed in terms of standardized QoS/QoE metrics (latency, jitter, loss, throughput, reliability) and media-layer metrics where applicable. [Technical] The “guarantee user experience” requirement for combined AI+communication services is not testable as written; it needs measurable targets (e.g., end-to-end latency bounds, resolution/bitrate targets, inference accuracy reporting intervals) and conditions (mobility, congestion, handover). [Editorial] The contribution is described as a “pseudo-CR” and includes an editor’s note that content is pending SA1 consolidation; this weakens change control discipline—either provide concrete draft spec text with stable references or keep it as a discussion paper rather than CR-like material. [Editorial] Terminology is inconsistent and sometimes non-3GPP (e.g., “6G network”, “application enablement layer”, “authorized 3rd party”); the text should align with 3GPP-defined terms and abbreviations and avoid introducing new layers without definition. [Editorial] The requirements are labeled with SA1-style PR identifiers (e.g., “[PR 6.42.6-1]”) but placed into TR 26.870 Clause 4.2 without a clear numbering/traceability scheme for SA4; add a consistent SA4 requirement ID format and explicit traceability to TR 22.870 clauses. <ol> <li> <p><strong>[Technical]</strong> The proposed “new normative references” include documents that are not stable or may not exist (e.g., <strong>TS 22.ABC</strong>, <strong>TR 23.801-01</strong>), which is not acceptable for normative referencing and will block approval unless replaced with valid 3GPP identifiers and correct release/stage. </p> </li> <li> <p><strong>[Technical]</strong> Making <strong>TR 22.870</strong> a <em>normative</em> reference in a TR is questionable because TRs typically avoid normative dependencies; if requirements are being derived, TR 22.870 should generally be informative and the text should avoid “shall”-style normative language. </p> </li> <li> <p><strong>[Technical]</strong> Clause 4.2 requirements are largely copied from SA1 “PR” statements (e.g., <strong>[PR 5.9.8.2-1]</strong>, <strong>[PR 6.50.6-2]</strong>) without translating them into media-system-specific requirements for TR 26.870 (e.g., what 26-series functions, interfaces, or media KPIs are impacted), so the content risks being non-actionable for SA4. </p> </li> <li> <p><strong>[Technical]</strong> The “Non-3GPP access support” text introduces <strong>ATSC/DVB</strong> as non‑3GPP access for multicast/broadcast, but it does not specify the assumed integration model (trusted/untrusted non‑3GPP access, service layer vs access layer, UE capabilities), nor how this aligns with 3GPP multicast/broadcast enablers—leaving a major architectural ambiguity. </p> </li> <li> <p><strong>[Technical]</strong> Requirement 4 (“UEs determine appropriate access technology when congestion detected, initial access only”) is underspecified and potentially conflicts with 3GPP access selection policy control (ANDSP/URSP/PCF-type mechanisms); it needs clarity on who decides (UE vs network), what inputs are available, and how operator policy is enforced. </p> </li> <li> <p><strong>[Technical]</strong> “Interworking with legacy systems” lists IMS/MMTel, broadcast/multicast, and MPS but does not identify the <em>media</em> continuity requirements (codecs, service layer APIs, QoS/QCI/5QI mapping, session continuity, emergency/priority handling), so it reads as a generic service list rather than SA4 requirements. </p> </li> <li> <p><strong>[Technical]</strong> The “Enhanced Network Service Awareness” requirements propose per-component service characteristics and differentiated charging, but do not define the granularity (flow, sub-flow, media component), identifiers, or mapping to existing 5GMS/RTM mechanisms (e.g., provisioning, metrics reporting, QoS signaling), risking inconsistency with TS 26.501/26.506 concepts. </p> </li> <li> <p><strong>[Technical]</strong> AI-related requirements (e.g., “image to video”, “2D to 3D/avatar”, “text to video”) are framed as network-supported transformations but lack constraints on latency, privacy, content authenticity, and user consent enforcement; without these, the requirements are incomplete and may conflict with regulatory/security expectations. </p> </li> <li> <p><strong>[Technical]</strong> Several AI requirements mix responsibilities across “network”, “application enablement layer”, “Service Hosting Environment”, and “IMS” without defining which 3GPP entity provides the function (e.g., 5GMS AF/MDF, RTM AS, edge hosting), creating scope creep and unclear ownership for SA4. </p> </li> <li> <p><strong>[Technical]</strong> The “video-based AI inference” requirement uses unusual media KPIs (“packet error rate per video frame”) that do not map cleanly to 3GPP QoS models; it should be re-expressed in terms of standardized QoS/QoE metrics (latency, jitter, loss, throughput, reliability) and media-layer metrics where applicable. </p> </li> <li> <p><strong>[Technical]</strong> The “guarantee user experience” requirement for combined AI+communication services is not testable as written; it needs measurable targets (e.g., end-to-end latency bounds, resolution/bitrate targets, inference accuracy reporting intervals) and conditions (mobility, congestion, handover). </p> </li> <li> <p><strong>[Editorial]</strong> The contribution is described as a “pseudo-CR” and includes an editor’s note that content is pending SA1 consolidation; this weakens change control discipline—either provide concrete draft spec text with stable references or keep it as a discussion paper rather than CR-like material. </p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent and sometimes non-3GPP (e.g., “6G network”, “application enablement layer”, “authorized 3rd party”); the text should align with 3GPP-defined terms and abbreviations and avoid introducing new layers without definition. </p> </li> <li> <p><strong>[Editorial]</strong> The requirements are labeled with SA1-style PR identifiers (e.g., “[PR 6.42.6-1]”) but placed into TR 26.870 Clause 4.2 without a clear numbering/traceability scheme for SA4; add a consistent SA4 requirement ID format and explicit traceability to TR 22.870 clauses.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260061 PDF Edit	[FS_6G_MED] Considerations on Work Topic 1: Media Delivery Architecture	Qualcomm Korea	Extracted Proposals Based on my review of the document, there are no explicit proposals in this 3GPP contribution. The document is a starting point contribution for the FS_6G_MED study (Work Topic 1: Media Delivery Architecture) and contains: - Introduction and background - Reason for Change - A Conclusions section that states "This serves as starting point" - Proposed changes to TR 26.870 (technical content additions) However, there are no sections explicitly marked as "Proposal", "Proposal X:", "Proposal X.", etc. The section titled "4. Proposal" only contains the text "It is proposed to agree the following changes to 3GPP TR 26.870v0.0.1" followed by the actual change request content, which is standard CR formatting rather than a formal proposal in the sense typically used in 3GPP contributions.	You should sign in to be able to post reviews Sign In
S4-260080 PDF Edit	pCR [FS_6G_MED] Considerations on Work Topic 1: Media Delivery requirements for intelligent immersive calling	HuaWei Technologies Co., Ltd	Proposal: It is proposed to agree the following changes to 3GPP TR 26.870	Previous Reviews: manager 2026-02-09 04:24:05 [Technical] The contribution proposes adding “IMS extension support” and “review protocol for extensions” as requirements, but TR 26.870 clause 4.2 should state concrete media/system requirements (e.g., session setup, media negotiation, synchronization) rather than vague process statements; specify what IMS/SIP/SDP capabilities are actually needed (new SDP attributes, new media types, MSRP/data channel, etc.). [Technical] Several requirements assume “network performs transcoding from video to immersive media codecs” and “network-based rendering/face rendering,” which is a strong architectural choice; TR 26.870 requirements should be phrased technology/placement-neutral (edge/device/network) unless SA4 has agreed on normative functional split assumptions. [Technical] “Support for 4K + HDR uplink video” is underspecified and potentially unrealistic for uplink in many deployments; it needs frame rate, chroma, bit depth, HDR format (HDR10/HLG/Dolby Vision), latency/jitter targets, and whether it is mandatory or optional/tiered. [Technical] Eye tracking is introduced as a requirement but no performance constraints are given (sampling rate, end-to-end latency budget, accuracy, coordinate system, time-stamping); without these, it is not actionable for codec/transport/synchronization work in SA4. [Technical] “User intention understanding capability (via voice/gesture)” is not a media delivery requirement unless mapped to explicit interfaces and data exchange (event streams, metadata formats, privacy controls, timing alignment with media); otherwise it reads as an AI feature requirement outside SA4 scope. [Technical] The use case mixes classic conversational media with sensor/biometric data (blood pressure, heart rate) but does not specify how such data is transported (RTP/RTCP extensions, data channel, HTTP-based side channel) nor how it is synchronized with audio/video/immersive rendering. [Technical] “Tiered QoE support taking device capabilities into account” needs definition of tiers and the adaptation mechanisms (scalable coding, simulcast, layered rendering, viewport-dependent delivery, bitrate/latency trade-offs); otherwise it duplicates generic QoE statements already present in many TRs. [Technical] The multi-device scenario (smart TV + cameras + watches + sensors) implies multi-stream capture and tight synchronization, but no requirement is stated for common time base, clock sync, or inter-stream lip-sync/scene-sync tolerances—key for immersive calling. [Technical] The “eye-contact enablement” story implies sending “facial picture” plus eye tracking to render a face; this raises identity/spoofing and consent requirements (authenticity of rendered representation, user control, watermarking/indication) that are not addressed but are critical for a calling service. [Technical] The text repeatedly references “6G system” and “AI technologies (e.g., LLM)” in a way that may conflict with TR 26.870’s 5G/3GPP system framing; requirements should be expressed in 3GPP-system-agnostic terms or aligned to the TR’s agreed scope and terminology. [Editorial] The contribution claims “updates to clause 4.2” but provides no proposed change text, no exact requirement wording, and no indication of how the new items integrate with existing 4.2 structure; reviewers cannot assess consistency or redundancy without the actual delta. [Editorial] Terminology is inconsistent/unclear (“immersive media codecs,” “split-rendering,” “spatial computing rendering,” “multi-media transporting”); these should be aligned with existing SA4 terms (e.g., MIV, V3C, RTP-based immersive media, split rendering definitions) or explicitly defined. [Editorial] Several statements are ambiguous or non-testable (e.g., “protocol extensions should be possible,” “review protocol for extensions,” “system fulfills user intention”); requirements in TR 26.870 should be phrased as verifiable capabilities with clear conditions and outcomes. <ol> <li> <p><strong>[Technical]</strong> The contribution proposes adding “IMS extension support” and “review protocol for extensions” as requirements, but TR 26.870 clause 4.2 should state concrete media/system requirements (e.g., session setup, media negotiation, synchronization) rather than vague process statements; specify what IMS/SIP/SDP capabilities are actually needed (new SDP attributes, new media types, MSRP/data channel, etc.). </p> </li> <li> <p><strong>[Technical]</strong> Several requirements assume “network performs transcoding from video to immersive media codecs” and “network-based rendering/face rendering,” which is a strong architectural choice; TR 26.870 requirements should be phrased technology/placement-neutral (edge/device/network) unless SA4 has agreed on normative functional split assumptions. </p> </li> <li> <p><strong>[Technical]</strong> “Support for 4K + HDR uplink video” is underspecified and potentially unrealistic for uplink in many deployments; it needs frame rate, chroma, bit depth, HDR format (HDR10/HLG/Dolby Vision), latency/jitter targets, and whether it is mandatory or optional/tiered. </p> </li> <li> <p><strong>[Technical]</strong> Eye tracking is introduced as a requirement but no performance constraints are given (sampling rate, end-to-end latency budget, accuracy, coordinate system, time-stamping); without these, it is not actionable for codec/transport/synchronization work in SA4. </p> </li> <li> <p><strong>[Technical]</strong> “User intention understanding capability (via voice/gesture)” is not a media delivery requirement unless mapped to explicit interfaces and data exchange (event streams, metadata formats, privacy controls, timing alignment with media); otherwise it reads as an AI feature requirement outside SA4 scope. </p> </li> <li> <p><strong>[Technical]</strong> The use case mixes classic conversational media with sensor/biometric data (blood pressure, heart rate) but does not specify how such data is transported (RTP/RTCP extensions, data channel, HTTP-based side channel) nor how it is synchronized with audio/video/immersive rendering. </p> </li> <li> <p><strong>[Technical]</strong> “Tiered QoE support taking device capabilities into account” needs definition of tiers and the adaptation mechanisms (scalable coding, simulcast, layered rendering, viewport-dependent delivery, bitrate/latency trade-offs); otherwise it duplicates generic QoE statements already present in many TRs. </p> </li> <li> <p><strong>[Technical]</strong> The multi-device scenario (smart TV + cameras + watches + sensors) implies multi-stream capture and tight synchronization, but no requirement is stated for common time base, clock sync, or inter-stream lip-sync/scene-sync tolerances—key for immersive calling. </p> </li> <li> <p><strong>[Technical]</strong> The “eye-contact enablement” story implies sending “facial picture” plus eye tracking to render a face; this raises identity/spoofing and consent requirements (authenticity of rendered representation, user control, watermarking/indication) that are not addressed but are critical for a calling service. </p> </li> <li> <p><strong>[Technical]</strong> The text repeatedly references “6G system” and “AI technologies (e.g., LLM)” in a way that may conflict with TR 26.870’s 5G/3GPP system framing; requirements should be expressed in 3GPP-system-agnostic terms or aligned to the TR’s agreed scope and terminology. </p> </li> <li> <p><strong>[Editorial]</strong> The contribution claims “updates to clause 4.2” but provides no proposed change text, no exact requirement wording, and no indication of how the new items integrate with existing 4.2 structure; reviewers cannot assess consistency or redundancy without the actual delta. </p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent/unclear (“immersive media codecs,” “split-rendering,” “spatial computing rendering,” “multi-media transporting”); these should be aligned with existing SA4 terms (e.g., MIV, V3C, RTP-based immersive media, split rendering definitions) or explicitly defined. </p> </li> <li> <p><strong>[Editorial]</strong> Several statements are ambiguous or non-testable (e.g., “protocol extensions should be possible,” “review protocol for extensions,” “system fulfills user intention”); requirements in TR 26.870 should be phrased as verifiable capabilities with clear conditions and outcomes.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260094 PDF Edit	Media related real-time AI traffic Characteristics	Huawei Tech.(UK) Co.. Ltd	Proposal It is proposed to agree the following changes to 3GPP TR on 6G_MED.	Previous Reviews: manager 2026-02-09 04:24:36 [Technical] The proposal introduces “native AI data units” as a new media format but does not define their syntax/semantics, timing model, or decoder interoperability requirements, making the subsequent packetization and KPI claims non-actionable and hard to align with existing 3GPP media frameworks. [Technical] The end-to-end architecture (UE AI encoder, AS AI decoder) implicitly assumes application-layer processing but does not map to any 3GPP service-based architecture elements (e.g., AF/NEF, edge hosting, QoS flows) or clarify whether this is OTT-only; this weakens consistency with a “media-related TR” and limits how network implications can be derived. [Technical] The “compatibility handling” statement (“AI decoder at AS may be needed if UE’s AI encoder is not compatible with AS’s AI model”) is conceptually inverted/unclear: if the AS model cannot consume the UE representation, adding a decoder alone may not resolve feature-space/model mismatch without a defined common representation or negotiated model/versioning. [Technical] The basic procedure step “UE provides supported AI encoder information” lacks a defined signaling mechanism (SIP/SDP, HTTP APIs, 5G NAS, application protocol), negotiation parameters (model ID, version, quantization, modality set), and fallback behavior, so the call flow is incomplete for reproducible traffic characterization. [Technical] The content delivery model reuses “NALU” terminology and H.26x-like aggregation/fragmentation for latent chunks, but does not specify an RTP payload format, header fields, fragmentation rules, or congestion control behavior; without a defined payload format, the traffic model cannot be consistently implemented or measured. [Technical] The KPI table is internally inconsistent: e.g., “Image GenAI” burst size 15 KB with “service bit rate 8 Mbps” and “max latency 15 ms” implies a much higher instantaneous rate than 8 Mbps, while “Video GenAI” 1.5 MB burst with 120 Mbps and 100 ms similarly needs clarification of averaging window, burst periodicity, and whether uplink/downlink is meant. [Technical] The latency discussion mixes “max latency” and “delay” columns (15 ms vs 20 ms, etc.) without defining one-way vs RTT, E2E vs network-only budget, or inclusion of AS inference time; this undermines the stated conclusion that network latency is “constrained by AS processing time.” [Technical] The claim that ≤20% payload error rate is tolerable for “GenAI applications” is overly broad and not tied to a specific loss model (random vs burst), concealment method, modality, or task metric; for many token/feature-streaming systems, loss can be catastrophic without retransmission/FEC, so the tolerance needs qualification and evidence. [Technical] The “differentiated importance” assertions (e.g., “preceding image data units more critical”) are plausible for some autoregressive tokenizations but not generally true for VQ/VAE-style codebooks or spatial token layouts; the document should specify which encoder families exhibit this property and how importance is signaled for scheduling. [Technical] The evaluation methodology relies on deriving P-traces from RTP header fields, but for non-media AI payloads the timestamp/marker semantics are undefined; without a defined clock rate, frame boundary indication, and packetization rules, the trace extraction method is not robust. [Technical] The proposal recommends RTP/UDP universally, but does not address real-time congestion control (e.g., RTP over QUIC, WebRTC congestion control, or application-layer rate adaptation) which materially affects burstiness, jitter, and loss—key characteristics the clause aims to model. [Technical] The GRACE resilience description (“lost chunks set to zeros, graceful degradation”) is codec-specific and may not generalize; presenting it as a representative mechanism risks misleading conclusions about error propagation and HARQ/FEC needs across AI encoders. [Editorial] Clause numbering placeholders (6.2.6.X.1 … X.7) suggest an insertion but the contribution does not indicate exact placement, dependencies, or whether it modifies existing clauses; this makes it hard to assess consistency with surrounding text and avoid duplication with TR 26.926 methodology already referenced. [Editorial] Several terms are used without definition or with overloaded meaning (“MLM” vs common “multimodal LLM,” “AI data unit,” “native/customized packet format,” “service bit rate”), and the document would benefit from a short terminology subclause to prevent ambiguity. [Editorial] The added references include academic papers and RP material, but it is unclear which are intended as normative vs informative and whether they meet 3GPP referencing rules; the contribution should justify why each reference is required for the TR text rather than background reading. <ol> <li> <p><strong>[Technical]</strong> The proposal introduces “native AI data units” as a new media format but does not define their syntax/semantics, timing model, or decoder interoperability requirements, making the subsequent packetization and KPI claims non-actionable and hard to align with existing 3GPP media frameworks. </p> </li> <li> <p><strong>[Technical]</strong> The end-to-end architecture (UE AI encoder, AS AI decoder) implicitly assumes application-layer processing but does not map to any 3GPP service-based architecture elements (e.g., AF/NEF, edge hosting, QoS flows) or clarify whether this is OTT-only; this weakens consistency with a “media-related TR” and limits how network implications can be derived. </p> </li> <li> <p><strong>[Technical]</strong> The “compatibility handling” statement (“AI decoder at AS may be needed if UE’s AI encoder is not compatible with AS’s AI model”) is conceptually inverted/unclear: if the AS model cannot consume the UE representation, adding a decoder alone may not resolve feature-space/model mismatch without a defined common representation or negotiated model/versioning. </p> </li> <li> <p><strong>[Technical]</strong> The basic procedure step “UE provides supported AI encoder information” lacks a defined signaling mechanism (SIP/SDP, HTTP APIs, 5G NAS, application protocol), negotiation parameters (model ID, version, quantization, modality set), and fallback behavior, so the call flow is incomplete for reproducible traffic characterization. </p> </li> <li> <p><strong>[Technical]</strong> The content delivery model reuses “NALU” terminology and H.26x-like aggregation/fragmentation for latent chunks, but does not specify an RTP payload format, header fields, fragmentation rules, or congestion control behavior; without a defined payload format, the traffic model cannot be consistently implemented or measured. </p> </li> <li> <p><strong>[Technical]</strong> The KPI table is internally inconsistent: e.g., “Image GenAI” burst size 15 KB with “service bit rate 8 Mbps” and “max latency 15 ms” implies a much higher instantaneous rate than 8 Mbps, while “Video GenAI” 1.5 MB burst with 120 Mbps and 100 ms similarly needs clarification of averaging window, burst periodicity, and whether uplink/downlink is meant. </p> </li> <li> <p><strong>[Technical]</strong> The latency discussion mixes “max latency” and “delay” columns (15 ms vs 20 ms, etc.) without defining one-way vs RTT, E2E vs network-only budget, or inclusion of AS inference time; this undermines the stated conclusion that network latency is “constrained by AS processing time.” </p> </li> <li> <p><strong>[Technical]</strong> The claim that ≤20% payload error rate is tolerable for “GenAI applications” is overly broad and not tied to a specific loss model (random vs burst), concealment method, modality, or task metric; for many token/feature-streaming systems, loss can be catastrophic without retransmission/FEC, so the tolerance needs qualification and evidence. </p> </li> <li> <p><strong>[Technical]</strong> The “differentiated importance” assertions (e.g., “preceding image data units more critical”) are plausible for some autoregressive tokenizations but not generally true for VQ/VAE-style codebooks or spatial token layouts; the document should specify which encoder families exhibit this property and how importance is signaled for scheduling. </p> </li> <li> <p><strong>[Technical]</strong> The evaluation methodology relies on deriving P-traces from RTP header fields, but for non-media AI payloads the timestamp/marker semantics are undefined; without a defined clock rate, frame boundary indication, and packetization rules, the trace extraction method is not robust. </p> </li> <li> <p><strong>[Technical]</strong> The proposal recommends RTP/UDP universally, but does not address real-time congestion control (e.g., RTP over QUIC, WebRTC congestion control, or application-layer rate adaptation) which materially affects burstiness, jitter, and loss—key characteristics the clause aims to model. </p> </li> <li> <p><strong>[Technical]</strong> The GRACE resilience description (“lost chunks set to zeros, graceful degradation”) is codec-specific and may not generalize; presenting it as a representative mechanism risks misleading conclusions about error propagation and HARQ/FEC needs across AI encoders. </p> </li> <li> <p><strong>[Editorial]</strong> Clause numbering placeholders (6.2.6.X.1 … X.7) suggest an insertion but the contribution does not indicate exact placement, dependencies, or whether it modifies existing clauses; this makes it hard to assess consistency with surrounding text and avoid duplication with TR 26.926 methodology already referenced. </p> </li> <li> <p><strong>[Editorial]</strong> Several terms are used without definition or with overloaded meaning (“MLM” vs common “multimodal LLM,” “AI data unit,” “native/customized packet format,” “service bit rate”), and the document would benefit from a short terminology subclause to prevent ambiguity. </p> </li> <li> <p><strong>[Editorial]</strong> The added references include academic papers and RP material, but it is unclear which are intended as normative vs informative and whether they meet 3GPP referencing rules; the contribution should justify why each reference is required for the TR text rather than background reading.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260095 PDF Edit	Neural Network Based Video Codec Architecture and Support for Error Resilience	Huawei Tech.(UK) Co.. Ltd	Proposal 1. Document the features of neural network codecs and their application to error resilient AI traffic in 6G MED TR under 6G Media based on the text in clause 2 and 3. Proposal 2. Take the use case of NNC with channel aware source coding training into account for AI traffic characteristics.	Previous Reviews: manager 2026-02-09 04:25:07 [Technical] The proposal places DVC/GRACE under “AI Traffic Characteristics” (Work topic #2d), but most of the added material is codec architecture and performance benchmarking; it would fit better under a media codec/format or “AI-based media processing” clause, otherwise the TR risks mixing traffic characterization with implementation details. [Technical] Claims like “competitive with H.264/H.265” and “MOS up to 38% better than H.264/H.265 with AL-FEC and error concealment” are not framed with the necessary test conditions (bitrate points, resolution, latency budget, encoder presets, FEC overhead, concealment method, GOP structure), making the conclusions non-reproducible and potentially misleading for 3GPP documentation. [Technical] The GRACE “channel-aware training” description lacks clarity on the assumed loss model (random vs burst, packetization unit, reordering, RTT/jitter) and how it maps to 3GPP radio behaviors (e.g., RLC AM/UM, PDCP reordering, HARQ), so the stated resilience benefits may not translate to 3GPP deployments as written. [Technical] The text implies “arithmetic coding mapped to packets” and “independently decodable sub-tensors,” but does not specify the resynchronization strategy (start codes, partition headers, state reset frequency) needed to make arithmetic-coded partitions independently decodable after loss; without this, the error-resilience mechanism is underspecified. [Technical] The proposal does not discuss how NNC bitstreams would be carried in 3GPP systems (RTP payload format, ISO BMFF sample entry, OMAF, DASH/CMAF signaling, SDP/offer-answer), which is essential if the TR is to document applicability to 6G media rather than just summarize papers. [Technical] “Exceptional reduction in tail latency” is asserted but the causal mechanism is not tied to system-level latency contributors (encoder lookahead, buffering, retransmissions, playout delay); additionally, using Google GCC/WebRTC traces is not equivalent to 3GPP QoS flows and scheduler behavior, so the latency claim needs careful qualification. [Technical] The “reconstruction failures due to non-bit-exact arithmetic operations in GPU frameworks” point is important but incomplete: it should explicitly distinguish training-time nondeterminism from inference-time decoder determinism requirements, and identify what must be standardized (fixed-point ops, deterministic kernels, rounding modes) to ensure interoperable decoding. [Technical] The document mentions “Deep Render codec in FFMPEG and VLC” as evidence of industry adoption, but it is unclear whether this is the same DVC lineage, whether it is interoperable, and whether it is actually deployed; this risks overstating maturity without verifiable references. [Technical] The proposal does not address complexity/power trade-offs in a 3GPP-relevant way (device classes, thermal limits, uplink vs downlink split, encoder/decoder placement), and the hardware statements (A40 GPU fps, “real-time on mobile”) are too vague to inform 6G feasibility. [Editorial] Adding the DVC and GRACE papers to “normative references” is likely incorrect for a TR-style descriptive clause; these should be informative references unless the text normatively depends on them, otherwise it creates an unintended compliance implication. [Editorial] The new clause numbering “6.2.4.X” is a placeholder and should be resolved to an actual number consistent with the TR structure; leaving “X” is not acceptable in a contribution proposing spec text. [Editorial] Several statements are promotional or absolute (“exceptional,” “key enabler,” “realistic conditions”) and should be rewritten in neutral 3GPP style with quantified qualifiers and explicit assumptions. [Editorial] The summary references “clauses 2 and 3” as the basis for the documentation request, but the proposed insertion is in clause 6.2.4; the contribution should align the rationale with the actual target clause(s) and ensure cross-references are consistent. [Technical] The “content-specific due to training data dependencies” limitation is noted but not connected to operational mitigations relevant to 3GPP (model update cadence, on-device adaptation, signaling of model/version, backward compatibility), leaving a major deployment issue unaddressed. [Editorial] If architecture diagrams are included, the contribution should ensure they are either original or properly licensed/cited and that figure captions and terminology match 3GPP conventions (e.g., avoid paper-specific module names without definitions). <ol> <li> <p><strong>[Technical]</strong> The proposal places DVC/GRACE under “AI Traffic Characteristics” (Work topic #2d), but most of the added material is codec architecture and performance benchmarking; it would fit better under a media codec/format or “AI-based media processing” clause, otherwise the TR risks mixing traffic characterization with implementation details.</p> </li> <li> <p><strong>[Technical]</strong> Claims like “competitive with H.264/H.265” and “MOS up to 38% better than H.264/H.265 with AL-FEC and error concealment” are not framed with the necessary test conditions (bitrate points, resolution, latency budget, encoder presets, FEC overhead, concealment method, GOP structure), making the conclusions non-reproducible and potentially misleading for 3GPP documentation.</p> </li> <li> <p><strong>[Technical]</strong> The GRACE “channel-aware training” description lacks clarity on the assumed loss model (random vs burst, packetization unit, reordering, RTT/jitter) and how it maps to 3GPP radio behaviors (e.g., RLC AM/UM, PDCP reordering, HARQ), so the stated resilience benefits may not translate to 3GPP deployments as written.</p> </li> <li> <p><strong>[Technical]</strong> The text implies “arithmetic coding mapped to packets” and “independently decodable sub-tensors,” but does not specify the resynchronization strategy (start codes, partition headers, state reset frequency) needed to make arithmetic-coded partitions independently decodable after loss; without this, the error-resilience mechanism is underspecified.</p> </li> <li> <p><strong>[Technical]</strong> The proposal does not discuss how NNC bitstreams would be carried in 3GPP systems (RTP payload format, ISO BMFF sample entry, OMAF, DASH/CMAF signaling, SDP/offer-answer), which is essential if the TR is to document applicability to 6G media rather than just summarize papers.</p> </li> <li> <p><strong>[Technical]</strong> “Exceptional reduction in tail latency” is asserted but the causal mechanism is not tied to system-level latency contributors (encoder lookahead, buffering, retransmissions, playout delay); additionally, using Google GCC/WebRTC traces is not equivalent to 3GPP QoS flows and scheduler behavior, so the latency claim needs careful qualification.</p> </li> <li> <p><strong>[Technical]</strong> The “reconstruction failures due to non-bit-exact arithmetic operations in GPU frameworks” point is important but incomplete: it should explicitly distinguish training-time nondeterminism from inference-time decoder determinism requirements, and identify what must be standardized (fixed-point ops, deterministic kernels, rounding modes) to ensure interoperable decoding.</p> </li> <li> <p><strong>[Technical]</strong> The document mentions “Deep Render codec in FFMPEG and VLC” as evidence of industry adoption, but it is unclear whether this is the same DVC lineage, whether it is interoperable, and whether it is actually deployed; this risks overstating maturity without verifiable references.</p> </li> <li> <p><strong>[Technical]</strong> The proposal does not address complexity/power trade-offs in a 3GPP-relevant way (device classes, thermal limits, uplink vs downlink split, encoder/decoder placement), and the hardware statements (A40 GPU fps, “real-time on mobile”) are too vague to inform 6G feasibility.</p> </li> <li> <p><strong>[Editorial]</strong> Adding the DVC and GRACE papers to “normative references” is likely incorrect for a TR-style descriptive clause; these should be informative references unless the text normatively depends on them, otherwise it creates an unintended compliance implication.</p> </li> <li> <p><strong>[Editorial]</strong> The new clause numbering “6.2.4.X” is a placeholder and should be resolved to an actual number consistent with the TR structure; leaving “X” is not acceptable in a contribution proposing spec text.</p> </li> <li> <p><strong>[Editorial]</strong> Several statements are promotional or absolute (“exceptional,” “key enabler,” “realistic conditions”) and should be rewritten in neutral 3GPP style with quantified qualifiers and explicit assumptions.</p> </li> <li> <p><strong>[Editorial]</strong> The summary references “clauses 2 and 3” as the basis for the documentation request, but the proposed insertion is in clause 6.2.4; the contribution should align the rationale with the actual target clause(s) and ensure cross-references are consistent.</p> </li> <li> <p><strong>[Technical]</strong> The “content-specific due to training data dependencies” limitation is noted but not connected to operational mitigations relevant to 3GPP (model update cadence, on-device adaptation, signaling of model/version, backward compatibility), leaving a major deployment issue unaddressed.</p> </li> <li> <p><strong>[Editorial]</strong> If architecture diagrams are included, the contribution should ensure they are either original or properly licensed/cited and that figure captions and terminology match 3GPP conventions (e.g., avoid paper-specific module names without definitions).</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260096 PDF Edit	Survey of Native AI formats for multi-modal AI	Huawei Tech.(UK) Co.. Ltd	Extracted Proposals Proposal a) Take this information into account when developing an overview of AI traffic characteristics with native AI format or codec besides options for traditional codec. Proposal b) Consider for 6G split inferencing that the split operation may include such AI processing/formatting in addition to the more traditional and direct model splitting considered in 5G. Proposal c) Add text and diagram based on clause 2 to TR for FS_6G_MED (see PcR below).	Previous Reviews: manager 2026-02-09 04:25:32 [Technical] The contribution proposes adding a broad “Native AI Formats” clause but provides no 3GPP-relevant characterization (e.g., bitrate ranges, token rates, latency/jitter sensitivity, burstiness, uplink/downlink asymmetry), so it is unclear how it concretely supports “AI traffic characteristics” work in FS_6G_MED. [Technical] The “General AI Processing Architecture” (Input→Encoder→Latent z→Quantization→Decoder→Output) is presented as generic, but many cited formats used for comprehension/IR/recommendation do not include a decoder or reconstruction objective; the clause should distinguish generative tokenizers/codecs vs embedding-only representations to avoid misleading conclusions. [Technical] The stated “Alternative split inference approach” (“AI native format generation and AI pre-training instead of model-splitting”) is not technically substantiated: pre-training is offline and not a split-inference partitioning method, and the document does not define where the split occurs (UE, edge, network) nor the standardized interface implications. [Technical] Several items in Table 1 are not “native AI formats” in the sense of a transferable discrete representation (e.g., CLIP is primarily continuous embeddings; many “N/A quantization” entries), so the proposed clause risks conflating embeddings, tokenizers, and codecs without defining the format properties that matter for transport and interoperability. [Technical] The “Reasons for AI split processing” list mixes privacy, compute offload, and “LLM compatibility” but does not address key constraints for 3GPP (security of intermediate representations, reversibility/leakage, integrity, model/version coupling), which are central if SA4 is to consider such formats for media services. [Technical] Quantization technique descriptions are inaccurate/unclear: “Level Wise Quantization (RQ)” is described as value-magnitude dependent error, whereas residual quantization is typically multi-stage codebooks on residuals; “FSQ projects vector to few dimensions” is not generally correct (FSQ is scalar quantization with finite levels per dimension). [Technical] The proposal to add JPEG AI as an “AI-based codec” example is plausible, but the document does not explain how JPEG AI bitstreams relate to “native AI formats” for LLMs (tokens/latents) versus conventional decoded pixels, which affects whether it belongs in the same clause and what traffic characteristics apply. [Technical] The contribution does not specify whether the “native AI format” is intended to be standardized as a bitstream, a feature tensor, or an application-layer payload; without a clear abstraction boundary, it is hard to align with SA4 scope and existing 3GPP media frameworks. [Technical] The table includes many proprietary or poorly specified items (“Cosmos [NVIDIA, 2025]”, “Deep Render codec”, “Ming-univision”) without stable normative references, which undermines the feasibility of adding them as TR references and risks rapid obsolescence. [Editorial] The document claims a “comprehensive survey table (Table 1)” but the actual table is not included (only bullet lists), making it impossible to review completeness, columns, definitions, or consistency with the proposed TR insertion. [Editorial] Reference placeholders “[x1] through [x10]” are not provided with full bibliographic details, and several in-text citations are incomplete or inconsistent (e.g., “O’Shea 2015” for CNNs, “ISO/IEC 6048-1” for JPEG AI), which would block TR integration. [Editorial] Terminology is inconsistent and sometimes incorrect for the target audience: “MLM compatibility” is used where “multimodal LLM” is intended, and “native AI format” vs “tokenizer” vs “codec” are used interchangeably without definitions. [Technical] The “Supervision” subsection implies reconstruction loss (L2) as a general mechanism, but many modern tokenizers use perceptual/adversarial losses and many comprehension/IR tokenizers are trained with contrastive or task losses; the oversimplification could mislead SA4 conclusions about QoS/traffic (e.g., token stability vs task accuracy). [Editorial] Several modality descriptions are overly generic or contain questionable statements (e.g., “Transformers handle large parameter sizes efficiently”), which reads like tutorial material rather than TR-ready text tied to 3GPP study objectives. <ol> <li> <p><strong>[Technical]</strong> The contribution proposes adding a broad “Native AI Formats” clause but provides no 3GPP-relevant characterization (e.g., bitrate ranges, token rates, latency/jitter sensitivity, burstiness, uplink/downlink asymmetry), so it is unclear how it concretely supports “AI traffic characteristics” work in FS_6G_MED. </p> </li> <li> <p><strong>[Technical]</strong> The “General AI Processing Architecture” (Input→Encoder→Latent z→Quantization→Decoder→Output) is presented as generic, but many cited formats used for comprehension/IR/recommendation do not include a decoder or reconstruction objective; the clause should distinguish generative tokenizers/codecs vs embedding-only representations to avoid misleading conclusions. </p> </li> <li> <p><strong>[Technical]</strong> The stated “Alternative split inference approach” (“AI native format generation and AI pre-training instead of model-splitting”) is not technically substantiated: pre-training is offline and not a split-inference partitioning method, and the document does not define where the split occurs (UE, edge, network) nor the standardized interface implications. </p> </li> <li> <p><strong>[Technical]</strong> Several items in Table 1 are not “native AI formats” in the sense of a transferable discrete representation (e.g., CLIP is primarily continuous embeddings; many “N/A quantization” entries), so the proposed clause risks conflating embeddings, tokenizers, and codecs without defining the format properties that matter for transport and interoperability. </p> </li> <li> <p><strong>[Technical]</strong> The “Reasons for AI split processing” list mixes privacy, compute offload, and “LLM compatibility” but does not address key constraints for 3GPP (security of intermediate representations, reversibility/leakage, integrity, model/version coupling), which are central if SA4 is to consider such formats for media services. </p> </li> <li> <p><strong>[Technical]</strong> Quantization technique descriptions are inaccurate/unclear: “Level Wise Quantization (RQ)” is described as value-magnitude dependent error, whereas residual quantization is typically multi-stage codebooks on residuals; “FSQ projects vector to few dimensions” is not generally correct (FSQ is scalar quantization with finite levels per dimension). </p> </li> <li> <p><strong>[Technical]</strong> The proposal to add JPEG AI as an “AI-based codec” example is plausible, but the document does not explain how JPEG AI bitstreams relate to “native AI formats” for LLMs (tokens/latents) versus conventional decoded pixels, which affects whether it belongs in the same clause and what traffic characteristics apply. </p> </li> <li> <p><strong>[Technical]</strong> The contribution does not specify whether the “native AI format” is intended to be standardized as a bitstream, a feature tensor, or an application-layer payload; without a clear abstraction boundary, it is hard to align with SA4 scope and existing 3GPP media frameworks. </p> </li> <li> <p><strong>[Technical]</strong> The table includes many proprietary or poorly specified items (“Cosmos [NVIDIA, 2025]”, “Deep Render codec”, “Ming-univision”) without stable normative references, which undermines the feasibility of adding them as TR references and risks rapid obsolescence. </p> </li> <li> <p><strong>[Editorial]</strong> The document claims a “comprehensive survey table (Table 1)” but the actual table is not included (only bullet lists), making it impossible to review completeness, columns, definitions, or consistency with the proposed TR insertion. </p> </li> <li> <p><strong>[Editorial]</strong> Reference placeholders “[x1] through [x10]” are not provided with full bibliographic details, and several in-text citations are incomplete or inconsistent (e.g., “O’Shea 2015” for CNNs, “ISO/IEC 6048-1” for JPEG AI), which would block TR integration. </p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent and sometimes incorrect for the target audience: “MLM compatibility” is used where “multimodal LLM” is intended, and “native AI format” vs “tokenizer” vs “codec” are used interchangeably without definitions. </p> </li> <li> <p><strong>[Technical]</strong> The “Supervision” subsection implies reconstruction loss (L2) as a general mechanism, but many modern tokenizers use perceptual/adversarial losses and many comprehension/IR tokenizers are trained with contrastive or task losses; the oversimplification could mislead SA4 conclusions about QoS/traffic (e.g., token stability vs task accuracy). </p> </li> <li> <p><strong>[Editorial]</strong> Several modality descriptions are overly generic or contain questionable statements (e.g., “Transformers handle large parameter sizes efficiently”), which reads like tutorial material rather than TR-ready text tied to 3GPP study objectives.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260097 PDF Edit	Embodied AI use case and related requirements	Huawei Tech.(UK) Co.. Ltd	Proposal 1: Take the requirements for embodied AI into account in FS_6G_MED, in particular the real-time AI inference requirement. Proposal 2: Document the simplified embodied AI use case and related requirements based on this paper (based on clause 8)	Previous Reviews: manager 2026-02-09 04:27:19 [Technical] The claimed uplink “peak data rates: 20–100 Mbit for 6–8 cameras using 3GPP codecs (e.g., HEVC)” is not substantiated with camera resolution/FPS/bitrate assumptions and may be inconsistent with TR 22.870 unless the exact referenced clause/text is quoted; add explicit parameterization (e.g., 1080p/4K, 30/60 fps, number of concurrent streams) and clarify whether this is per-UE peak, per-robot, or per-session aggregate. [Technical] “Ultra-low latency” and “error resilience” are repeatedly asserted for real-time navigation, but no concrete latency/jitter/reliability targets (e.g., E2E, UL one-way, packet loss tolerance) are proposed, making the requirement non-actionable for FS_6G_MED and inconsistent with how SA requirements are normally expressed. [Technical] The proposed clause mixes AI research task descriptions/metrics (surprisal, IoU+/IoU-, SPL, etc.) with 3GPP service requirements, but it never maps these metrics to communication KPIs (latency, throughput, reliability, synchronization), so the added text risks being non-normative narrative rather than requirements usable by RAN/CT. [Technical] The document assumes cloud/server inference as the dominant architecture (“AI processing may occur at cloud/server”), but does not address split inference options (on-device, edge, hybrid) and the resulting different UL/DL traffic patterns (e.g., downlink control/trajectory updates), which is critical for embodied control loops. [Technical] Traffic characterization is incomplete: it states “bursty” uplink but omits key properties needed for system design (packet sizes, periodicity, concurrency across sensors, synchronization between multi-modal streams, and whether traffic is constant bitrate video vs event-driven keyframes/point clouds). [Technical] Multi-modal data is mentioned (video, point clouds, embeddings), but the proposal does not specify whether point clouds are LiDAR-like, depth maps, or reconstructed meshes, nor their typical rates; this omission can materially change the bandwidth/latency conclusions. [Technical] The “Transmission format” table introduces MPEG VCM/FCM and JPEG AI, but it does not explain how these would be carried in 3GPP (e.g., media framework, application layer, QoS handling) or what network feature is actually required (e.g., generic support for opaque payloads vs specific codec awareness). [Technical] The statement “efficient transmission support needed” for proprietary embeddings/tokenizers is vague and risks implying new 3GPP standardization of AI feature codecs without scoping; it should instead identify concrete network enablers (e.g., QoS, prioritization, segmentation/reassembly, loss protection) independent of payload semantics. [Technical] The rationale for cloud offloading (“keep robots simple/light”, “centralize AI for multiple robots”) is plausible but one-sided; it ignores privacy/safety/regulatory constraints and local fallback requirements, which are often decisive for medical/industrial deployments and should be reflected as requirements (e.g., local autonomy under connectivity loss). [Editorial] The proposed new clause numbering is inconsistent in the summary (“new clause 4.2.2.X” vs “based on the proposed text in clause 8”); the contribution should clearly identify the target TR, exact clause location, and provide the actual proposed text with proper numbering. [Editorial] References to prior work are too loose (“TR 22.870 clause 6.28”, “SA4#134 (S4-251826)”) without quoting the baseline text being extended; for a change proposal, the delta versus existing TR wording should be explicit to avoid duplication or contradiction. [Editorial] Several terms are undefined or used inconsistently for 3GPP context (“mobile embodied sensors”, “ultra-low latency”, “error resilience”, “cloud/server”, “gateway”), and the clause should add definitions or align to existing 3GPP terminology (UE, edge, DN, application server, URLLC-like requirements). [Editorial] The document includes vendor/industry examples (e.g., NVIDIA Isaac GR00T, ITU-T SG21 workshop) that are not necessary for TR requirements text and may be inappropriate in 3GPP specifications; keep background in the contribution but avoid embedding such references in proposed TR clause text. [Technical] The proposal focuses almost exclusively on uplink, but embodied AI control typically requires timely downlink (commands, maps, model updates) and possibly sidelink/robot-to-robot coordination; omitting DL/bi-directional requirements may lead to an incomplete requirement set for FS_6G_MED. <ol> <li> <p><strong>[Technical]</strong> The claimed uplink “peak data rates: 20–100 Mbit for 6–8 cameras using 3GPP codecs (e.g., HEVC)” is not substantiated with camera resolution/FPS/bitrate assumptions and may be inconsistent with TR 22.870 unless the exact referenced clause/text is quoted; add explicit parameterization (e.g., 1080p/4K, 30/60 fps, number of concurrent streams) and clarify whether this is per-UE peak, per-robot, or per-session aggregate.</p> </li> <li> <p><strong>[Technical]</strong> “Ultra-low latency” and “error resilience” are repeatedly asserted for real-time navigation, but no concrete latency/jitter/reliability targets (e.g., E2E, UL one-way, packet loss tolerance) are proposed, making the requirement non-actionable for FS_6G_MED and inconsistent with how SA requirements are normally expressed.</p> </li> <li> <p><strong>[Technical]</strong> The proposed clause mixes AI research task descriptions/metrics (surprisal, IoU+/IoU-, SPL, etc.) with 3GPP service requirements, but it never maps these metrics to communication KPIs (latency, throughput, reliability, synchronization), so the added text risks being non-normative narrative rather than requirements usable by RAN/CT.</p> </li> <li> <p><strong>[Technical]</strong> The document assumes cloud/server inference as the dominant architecture (“AI processing may occur at cloud/server”), but does not address split inference options (on-device, edge, hybrid) and the resulting different UL/DL traffic patterns (e.g., downlink control/trajectory updates), which is critical for embodied control loops.</p> </li> <li> <p><strong>[Technical]</strong> Traffic characterization is incomplete: it states “bursty” uplink but omits key properties needed for system design (packet sizes, periodicity, concurrency across sensors, synchronization between multi-modal streams, and whether traffic is constant bitrate video vs event-driven keyframes/point clouds).</p> </li> <li> <p><strong>[Technical]</strong> Multi-modal data is mentioned (video, point clouds, embeddings), but the proposal does not specify whether point clouds are LiDAR-like, depth maps, or reconstructed meshes, nor their typical rates; this omission can materially change the bandwidth/latency conclusions.</p> </li> <li> <p><strong>[Technical]</strong> The “Transmission format” table introduces MPEG VCM/FCM and JPEG AI, but it does not explain how these would be carried in 3GPP (e.g., media framework, application layer, QoS handling) or what network feature is actually required (e.g., generic support for opaque payloads vs specific codec awareness).</p> </li> <li> <p><strong>[Technical]</strong> The statement “efficient transmission support needed” for proprietary embeddings/tokenizers is vague and risks implying new 3GPP standardization of AI feature codecs without scoping; it should instead identify concrete network enablers (e.g., QoS, prioritization, segmentation/reassembly, loss protection) independent of payload semantics.</p> </li> <li> <p><strong>[Technical]</strong> The rationale for cloud offloading (“keep robots simple/light”, “centralize AI for multiple robots”) is plausible but one-sided; it ignores privacy/safety/regulatory constraints and local fallback requirements, which are often decisive for medical/industrial deployments and should be reflected as requirements (e.g., local autonomy under connectivity loss).</p> </li> <li> <p><strong>[Editorial]</strong> The proposed new clause numbering is inconsistent in the summary (“new clause 4.2.2.X” vs “based on the proposed text in clause 8”); the contribution should clearly identify the target TR, exact clause location, and provide the actual proposed text with proper numbering.</p> </li> <li> <p><strong>[Editorial]</strong> References to prior work are too loose (“TR 22.870 clause 6.28”, “SA4#134 (S4-251826)”) without quoting the baseline text being extended; for a change proposal, the delta versus existing TR wording should be explicit to avoid duplication or contradiction.</p> </li> <li> <p><strong>[Editorial]</strong> Several terms are undefined or used inconsistently for 3GPP context (“mobile embodied sensors”, “ultra-low latency”, “error resilience”, “cloud/server”, “gateway”), and the clause should add definitions or align to existing 3GPP terminology (UE, edge, DN, application server, URLLC-like requirements).</p> </li> <li> <p><strong>[Editorial]</strong> The document includes vendor/industry examples (e.g., NVIDIA Isaac GR00T, ITU-T SG21 workshop) that are not necessary for TR requirements text and may be inappropriate in 3GPP specifications; keep background in the contribution but avoid embedding such references in proposed TR clause text.</p> </li> <li> <p><strong>[Technical]</strong> The proposal focuses almost exclusively on uplink, but embodied AI control typically requires timely downlink (commands, maps, model updates) and possibly sidelink/robot-to-robot coordination; omitting DL/bi-directional requirements may lead to an incomplete requirement set for FS_6G_MED.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260098 PDF Edit	demonstration of real-time ai codec transmission in WebRTC	Huawei Tech.(UK) Co.. Ltd	Proposal 1: Take this approach into account as it shows real-time AI codec based traffic over WebRTC Proposal 2: Consider the feasibility of such an approach for generating traces for real-time AI traffic.	Previous Reviews: manager 2026-02-09 04:27:45 [Technical] The contribution does not map the demo to any concrete Rel-20 normative work item deliverable (e.g., RTP payload format specification, SDP signaling, WebRTC integration requirements), so it is unclear what SA4 action is requested beyond a general “take into account.” [Technical] The “Custom RTP Payload Format Design” is underspecified and not aligned with 3GPP/RTC practice: no RTP payload type name, no clock rate, no timestamping rules, no marker-bit semantics, no fragmentation/reassembly rules, and no handling of packet loss/out-of-order beyond “aiortc buffers,” which is not a spec. [Technical] The proposed payload header fields (“Latent Shape \| Hyperprior Byte Length \| Latent Byte Length”) lack bit-level definition (field sizes, endianness, allowed ranges) and do not address how “Latent Shape” is encoded or negotiated, making interoperability impossible. [Technical] Fragmentation is described as “large payloads fragmented due to MTU limitations” with aiortc appending RTP headers, but there is no defined fragmentation unit, no FU indicator/header, and no recovery behavior; relying on library behavior is not acceptable for a payload format intended for standardization. [Technical] The document claims “RTP retransmission enabled” but does not specify whether this is RTX (RFC 4588), NACK (RFC 4585), or WebRTC-specific mechanisms, nor how the AI codec payload interacts with retransmission, FEC, or congestion control—key for real-time media feasibility. [Technical] Congestion control is mentioned generically (“RTP packets transmitted with congestion control”) without stating which algorithm (e.g., GCC/SCReAM), what bitrate adaptation hooks exist for the AI codec, or how encoder rate control reacts to loss/jitter—critical for “real-time” claims. [Technical] The demo’s “error resilient codec compensates for potential packet loss” contradicts the later statement “error recovery not yet implemented”; this inconsistency undermines conclusions about robustness and should be clarified with exact mechanisms implemented (if any). [Technical] SDP negotiation support is asserted (“Enabled codec recognition during SDP negotiation”) but no SDP offer/answer examples, fmtp parameters, or MIME subtype registration approach are provided; without this, WebRTC interoperability and signaling feasibility cannot be evaluated. [Technical] There is no discussion of packetization timing and RTP timestamp derivation for frame-by-frame neural codec output (e.g., variable frame sizes, variable encode time), which impacts jitter buffering, playout, and marker-bit usage. [Technical] The trace analysis focuses on RTP header fields only, but does not provide quantitative results (loss rate vs. quality, latency/jitter distributions, bitrate, frame rate, retransmission overhead), so the “feasibility proven” claim is not substantiated for SA4 evaluation. [Technical] Using bmshj2018_factorized (an image compression model) as a stand-in for a video AI codec raises questions about temporal prediction, inter-frame dependencies, and real-time constraints; the document should explain how video is handled (intra-only vs inter) and implications for packet loss and bitrate. [Editorial] The contribution reads like an implementation report rather than a standards contribution: it lacks section references to any 3GPP spec, does not identify gaps in current specs, and does not propose specific normative text or study conclusions. [Editorial] Terminology is inconsistent/vague (“AI codec,” “AI traffic,” “AI media delivery,” “real-time AI codec-based traffic”) and should be aligned with FS_6G_MED definitions to avoid ambiguity about whether this targets conversational video, XR, or a new media type. [Editorial] The payload format diagram is informal and missing a figure number, field definitions, and alignment with RTP payload format conventions (e.g., “payload header,” “payload data,” optional extensions), making it hard to review or compare with existing SA4 payload formats. <ol> <li> <p><strong>[Technical]</strong> The contribution does not map the demo to any concrete Rel-20 normative work item deliverable (e.g., RTP payload format specification, SDP signaling, WebRTC integration requirements), so it is unclear what SA4 action is requested beyond a general “take into account.” </p> </li> <li> <p><strong>[Technical]</strong> The “Custom RTP Payload Format Design” is underspecified and not aligned with 3GPP/RTC practice: no RTP payload type name, no clock rate, no timestamping rules, no marker-bit semantics, no fragmentation/reassembly rules, and no handling of packet loss/out-of-order beyond “aiortc buffers,” which is not a spec. </p> </li> <li> <p><strong>[Technical]</strong> The proposed payload header fields (“Latent Shape \| Hyperprior Byte Length \| Latent Byte Length”) lack bit-level definition (field sizes, endianness, allowed ranges) and do not address how “Latent Shape” is encoded or negotiated, making interoperability impossible. </p> </li> <li> <p><strong>[Technical]</strong> Fragmentation is described as “large payloads fragmented due to MTU limitations” with aiortc appending RTP headers, but there is no defined fragmentation unit, no FU indicator/header, and no recovery behavior; relying on library behavior is not acceptable for a payload format intended for standardization. </p> </li> <li> <p><strong>[Technical]</strong> The document claims “RTP retransmission enabled” but does not specify whether this is RTX (RFC 4588), NACK (RFC 4585), or WebRTC-specific mechanisms, nor how the AI codec payload interacts with retransmission, FEC, or congestion control—key for real-time media feasibility. </p> </li> <li> <p><strong>[Technical]</strong> Congestion control is mentioned generically (“RTP packets transmitted with congestion control”) without stating which algorithm (e.g., GCC/SCReAM), what bitrate adaptation hooks exist for the AI codec, or how encoder rate control reacts to loss/jitter—critical for “real-time” claims. </p> </li> <li> <p><strong>[Technical]</strong> The demo’s “error resilient codec compensates for potential packet loss” contradicts the later statement “error recovery not yet implemented”; this inconsistency undermines conclusions about robustness and should be clarified with exact mechanisms implemented (if any). </p> </li> <li> <p><strong>[Technical]</strong> SDP negotiation support is asserted (“Enabled codec recognition during SDP negotiation”) but no SDP offer/answer examples, fmtp parameters, or MIME subtype registration approach are provided; without this, WebRTC interoperability and signaling feasibility cannot be evaluated. </p> </li> <li> <p><strong>[Technical]</strong> There is no discussion of packetization timing and RTP timestamp derivation for frame-by-frame neural codec output (e.g., variable frame sizes, variable encode time), which impacts jitter buffering, playout, and marker-bit usage. </p> </li> <li> <p><strong>[Technical]</strong> The trace analysis focuses on RTP header fields only, but does not provide quantitative results (loss rate vs. quality, latency/jitter distributions, bitrate, frame rate, retransmission overhead), so the “feasibility proven” claim is not substantiated for SA4 evaluation. </p> </li> <li> <p><strong>[Technical]</strong> Using bmshj2018_factorized (an image compression model) as a stand-in for a video AI codec raises questions about temporal prediction, inter-frame dependencies, and real-time constraints; the document should explain how video is handled (intra-only vs inter) and implications for packet loss and bitrate. </p> </li> <li> <p><strong>[Editorial]</strong> The contribution reads like an implementation report rather than a standards contribution: it lacks section references to any 3GPP spec, does not identify gaps in current specs, and does not propose specific normative text or study conclusions. </p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent/vague (“AI codec,” “AI traffic,” “AI media delivery,” “real-time AI codec-based traffic”) and should be aligned with FS_6G_MED definitions to avoid ambiguity about whether this targets conversational video, XR, or a new media type. </p> </li> <li> <p><strong>[Editorial]</strong> The payload format diagram is informal and missing a figure number, field definitions, and alignment with RTP payload format conventions (e.g., “payload header,” “payload data,” optional extensions), making it hard to review or compare with existing SA4 payload formats.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260107 PDF Edit	[FS_6G_MED] Discussion on AI traffic trends	Nokia	Extracted Proposals Proposal 1: Add Clause 2 above to TR 26.870 clause 6.2 as a basis for further work. Proposal 2: Take into account that current AIML traffic reuses the existing protocols and formats (i.e., audio, video, text over HTTP, RTP, etc.). Proposal 3: Agree to prioritize characterization of existing popular AI apps and provide initial analysis to SA by June 2026.	Previous Reviews: manager 2026-02-09 04:31:38 [Technical] The proposal to “Add Clause 2 content to TR 26.870 clause 6.2” is not actionable because the contribution does not map its material to the existing clause structure/terminology of TR 26.870 (e.g., what subclauses, tables, KPIs, or definitions are to be inserted), risking inconsistency with the TR’s established traffic-characterization framework. [Technical] Several key assertions are presented without sufficient methodological detail (e.g., “AI traffic constitutes 0.06%… 74% DL/26% UL”, latency non-linearity at 0.5 s/1.5 s, “bursts in uplink traffic”), but no information is given on measurement point (RAN/UPF/PGW), sampling period, app identification method, device mix, or confidence—making the results hard to validate or reuse in a 3GPP TR. [Technical] The statement “Text and images are base-64 encoded and encapsulated in JSON (OpenAI API, Gemini API)” is over-generalized and may be misleading for traffic characterization, since many deployments use multipart/form-data, binary uploads, HTTP/2/3, gRPC, or streaming APIs; the TR should describe variability and typical patterns rather than a single encoding approach. [Technical] The claim that AI apps “use existing web-based protocols (e.g., WebRTC for live audio/video)” is not representative for most conversational AI services today (often HTTPS-based streaming, WebSockets, or proprietary transports), and the document does not distinguish between real-time interactive media sessions vs request/response inference sessions. [Technical] The codec statement “Existing codecs (AVC, HEVC) used for encoding before transport” is incomplete and potentially incorrect for many AI interactions (e.g., Opus for voice, AAC, AV1, JPEG/PNG/HEIF for images, and pre-encoded camera captures), and it conflates user media capture formats with network transport payload formats. [Technical] The UL/DL trend discussion (“uplink data growing faster… driven by conditioning inputs”) lacks quantification by use case (chat vs image/video conditioning vs agentic workflows) and does not address compression/resolution effects, caching, or on-device pre-processing—key to credible UL/DL ratio conclusions for 6G planning. [Technical] The latency sensitivity section cites “inserted latency” effects but does not specify whether this is RTT, one-way delay, added at IP layer vs application layer, nor whether server processing time is separated from network delay; without this, the “non-linear” behavior cannot be translated into 3GPP QoS/QoE requirements. [Technical] The “Agentic AI opportunities” claim that agents can shift traffic off-peak is speculative and not tied to concrete traffic models (background scheduling constraints, user tolerance, deadlines, notification patterns), and it ignores that many agentic tasks are user-triggered and time-sensitive, limiting applicability as a general traffic assumption. [Technical] The agentic protocol discussion (MCP, A2A) is not clearly relevant to 3GPP traffic characterization unless it is tied to observable network behaviors (session duration, concurrency, message sizes, polling vs push, keep-alives); currently it reads as architecture background rather than traffic-impact analysis. [Technical] The contribution does not address encryption and traffic classification implications (most AI traffic over TLS/QUIC), which is central to any realistic operator-side characterization and to how TR 26.870 can describe identification/measurement approaches. [Editorial] References to “Figure 1” and “Figure 2” are included but the figures are missing, leaving key quantitative claims unsupported and making the text unsuitable for direct insertion into a TR clause. [Editorial] Terminology is inconsistent and sometimes non-3GPP (e.g., “AI inference factories”, “AIML traffic”, “agentic apps”), and should be aligned with agreed study terminology and defined once (including what is meant by “AI media traffic” vs general AI application traffic). [Editorial] The proposal items are phrased as meeting agreements (“Agree to prioritize… by June 2026”) rather than concrete spec text or study deliverables, and they do not identify responsible rapporteurs, target clauses, or expected outputs, reducing usefulness as a formal 3GPP contribution. <ol> <li> <p><strong>[Technical]</strong> The proposal to “Add Clause 2 content to TR 26.870 clause 6.2” is not actionable because the contribution does not map its material to the existing clause structure/terminology of TR 26.870 (e.g., what subclauses, tables, KPIs, or definitions are to be inserted), risking inconsistency with the TR’s established traffic-characterization framework.</p> </li> <li> <p><strong>[Technical]</strong> Several key assertions are presented without sufficient methodological detail (e.g., “AI traffic constitutes 0.06%… 74% DL/26% UL”, latency non-linearity at 0.5 s/1.5 s, “bursts in uplink traffic”), but no information is given on measurement point (RAN/UPF/PGW), sampling period, app identification method, device mix, or confidence—making the results hard to validate or reuse in a 3GPP TR.</p> </li> <li> <p><strong>[Technical]</strong> The statement “Text and images are base-64 encoded and encapsulated in JSON (OpenAI API, Gemini API)” is over-generalized and may be misleading for traffic characterization, since many deployments use multipart/form-data, binary uploads, HTTP/2/3, gRPC, or streaming APIs; the TR should describe variability and typical patterns rather than a single encoding approach.</p> </li> <li> <p><strong>[Technical]</strong> The claim that AI apps “use existing web-based protocols (e.g., WebRTC for live audio/video)” is not representative for most conversational AI services today (often HTTPS-based streaming, WebSockets, or proprietary transports), and the document does not distinguish between real-time interactive media sessions vs request/response inference sessions.</p> </li> <li> <p><strong>[Technical]</strong> The codec statement “Existing codecs (AVC, HEVC) used for encoding before transport” is incomplete and potentially incorrect for many AI interactions (e.g., Opus for voice, AAC, AV1, JPEG/PNG/HEIF for images, and pre-encoded camera captures), and it conflates user media capture formats with network transport payload formats.</p> </li> <li> <p><strong>[Technical]</strong> The UL/DL trend discussion (“uplink data growing faster… driven by conditioning inputs”) lacks quantification by use case (chat vs image/video conditioning vs agentic workflows) and does not address compression/resolution effects, caching, or on-device pre-processing—key to credible UL/DL ratio conclusions for 6G planning.</p> </li> <li> <p><strong>[Technical]</strong> The latency sensitivity section cites “inserted latency” effects but does not specify whether this is RTT, one-way delay, added at IP layer vs application layer, nor whether server processing time is separated from network delay; without this, the “non-linear” behavior cannot be translated into 3GPP QoS/QoE requirements.</p> </li> <li> <p><strong>[Technical]</strong> The “Agentic AI opportunities” claim that agents can shift traffic off-peak is speculative and not tied to concrete traffic models (background scheduling constraints, user tolerance, deadlines, notification patterns), and it ignores that many agentic tasks are user-triggered and time-sensitive, limiting applicability as a general traffic assumption.</p> </li> <li> <p><strong>[Technical]</strong> The agentic protocol discussion (MCP, A2A) is not clearly relevant to 3GPP traffic characterization unless it is tied to observable network behaviors (session duration, concurrency, message sizes, polling vs push, keep-alives); currently it reads as architecture background rather than traffic-impact analysis.</p> </li> <li> <p><strong>[Technical]</strong> The contribution does not address encryption and traffic classification implications (most AI traffic over TLS/QUIC), which is central to any realistic operator-side characterization and to how TR 26.870 can describe identification/measurement approaches.</p> </li> <li> <p><strong>[Editorial]</strong> References to “Figure 1” and “Figure 2” are included but the figures are missing, leaving key quantitative claims unsupported and making the text unsuitable for direct insertion into a TR clause.</p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent and sometimes non-3GPP (e.g., “AI inference factories”, “AIML traffic”, “agentic apps”), and should be aligned with agreed study terminology and defined once (including what is meant by “AI media traffic” vs general AI application traffic).</p> </li> <li> <p><strong>[Editorial]</strong> The proposal items are phrased as meeting agreements (“Agree to prioritize… by June 2026”) rather than concrete spec text or study deliverables, and they do not identify responsible rapporteurs, target clauses, or expected outputs, reducing usefulness as a formal 3GPP contribution.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260108 PDF Edit	[FS_6G_MED] LLM-based AI services	Nokia	Extracted Proposals Proposal: The proposal is to discuss clause 3 and agree a generic architecture and definitions for LLM based AI applications as basis for further work.	Previous Reviews: manager 2026-02-09 04:31:02 [Technical] The proposed “Tokenizer” definition as converting any modality into tokens (including “audio frames” and “image patches”) conflates model-internal tokenization with generic media segmentation; in practice audio/image tokenization is highly model- and codec-dependent and not a stable unit suitable for SA4 normative terminology without tighter scoping. [Technical] “Tokens … with clearly defined boundaries” is not generally true for common LLM tokenizers (e.g., BPE/WordPiece) where boundaries are algorithmic subword units and vary by vocabulary/version; the definition risks being misleading for traffic characterization and charging discussions. [Technical] The architecture mixes functional blocks in a way that is inconsistent with typical MLLM pipelines: CLIP is cited as a “Modality Encoder” producing “token embeddings,” but CLIP commonly produces fixed-length embeddings (or patch embeddings internally) and is not representative of many current MLLMs; the document should avoid naming specific models or clarify the abstraction level. [Technical] The “Combination Layer” description (“combines input token embeddings with contextual token embeddings, potentially using RAG for context window management”) is conceptually muddled: RAG is an external retrieval + prompt construction mechanism, not a layer combining embeddings inside the model; this should be separated into “context retrieval/augmentation” vs “model inference.” [Technical] NOTE 2 claims token charging is based on “outcome of modality encoding and combination layers,” which is inaccurate for most services (charging is typically based on input/output token counts at the text tokenization interface, not internal embeddings); this undermines the traffic/charging motivation. [Technical] The proposal does not connect the architecture/definitions to SA4-relevant study outputs (e.g., concrete traffic models, latency/jitter constraints, conversational turn-taking, uplink/downlink asymmetry, streaming token generation), so it’s unclear how Clause 3 definitions will enable gap analysis in QoS or media formats. [Technical] NOTE 1 introduces “transport of token embeddings” as FFS, but the contribution does not justify why embedding transport is in scope for 3GPP media work versus transporting conventional media + text; without a clear use case, this risks steering the study toward non-deployable assumptions. [Technical] The statement in NOTE 3 that “all components on the server side run on the server” is already not universally true (on-device encoders, hybrid edge inference, split computing); if kept, it should be framed as “typical today” and include split/edge variants relevant to 6G. [Editorial] The document references “Figure X.1” and a “dashed line” but provides no figure; key architectural claims cannot be reviewed or agreed without the actual diagram and its boundary conditions. [Editorial] The contribution proposes adding content to “Clause 3” but does not specify which 3GPP document/TS/TR and which clause numbering (FS_6G_MED deliverable vs TR 26.847 vs a new SA4 TR), making the proposal non-actionable. [Editorial] Several terms are introduced without alignment to existing 3GPP terminology (e.g., “Media Decoder/Generator” vs codec/renderer concepts, “token embeddings” vs feature vectors), and no mapping is provided to existing SA4 definitions, risking inconsistent vocabulary across the study. [Editorial] The introduction cites TR 22.870 and “over 60 AI-related use cases” but does not cite specific use cases relevant to media communication nor extract requirements; adding a small set of representative use cases and their media/traffic implications would strengthen the contribution. <ol> <li> <p><strong>[Technical]</strong> The proposed “Tokenizer” definition as converting <em>any modality</em> into tokens (including “audio frames” and “image patches”) conflates model-internal tokenization with generic media segmentation; in practice audio/image tokenization is highly model- and codec-dependent and not a stable unit suitable for SA4 normative terminology without tighter scoping.</p> </li> <li> <p><strong>[Technical]</strong> “Tokens … with clearly defined boundaries” is not generally true for common LLM tokenizers (e.g., BPE/WordPiece) where boundaries are algorithmic subword units and vary by vocabulary/version; the definition risks being misleading for traffic characterization and charging discussions.</p> </li> <li> <p><strong>[Technical]</strong> The architecture mixes functional blocks in a way that is inconsistent with typical MLLM pipelines: CLIP is cited as a “Modality Encoder” producing “token embeddings,” but CLIP commonly produces fixed-length embeddings (or patch embeddings internally) and is not representative of many current MLLMs; the document should avoid naming specific models or clarify the abstraction level.</p> </li> <li> <p><strong>[Technical]</strong> The “Combination Layer” description (“combines input token embeddings with contextual token embeddings, potentially using RAG for context window management”) is conceptually muddled: RAG is an external retrieval + prompt construction mechanism, not a layer combining embeddings inside the model; this should be separated into “context retrieval/augmentation” vs “model inference.”</p> </li> <li> <p><strong>[Technical]</strong> NOTE 2 claims token charging is based on “outcome of modality encoding and combination layers,” which is inaccurate for most services (charging is typically based on input/output token counts at the text tokenization interface, not internal embeddings); this undermines the traffic/charging motivation.</p> </li> <li> <p><strong>[Technical]</strong> The proposal does not connect the architecture/definitions to SA4-relevant study outputs (e.g., concrete traffic models, latency/jitter constraints, conversational turn-taking, uplink/downlink asymmetry, streaming token generation), so it’s unclear how Clause 3 definitions will enable gap analysis in QoS or media formats.</p> </li> <li> <p><strong>[Technical]</strong> NOTE 1 introduces “transport of token embeddings” as FFS, but the contribution does not justify why embedding transport is in scope for 3GPP media work versus transporting conventional media + text; without a clear use case, this risks steering the study toward non-deployable assumptions.</p> </li> <li> <p><strong>[Technical]</strong> The statement in NOTE 3 that “all components on the server side run on the server” is already not universally true (on-device encoders, hybrid edge inference, split computing); if kept, it should be framed as “typical today” and include split/edge variants relevant to 6G.</p> </li> <li> <p><strong>[Editorial]</strong> The document references “Figure X.1” and a “dashed line” but provides no figure; key architectural claims cannot be reviewed or agreed without the actual diagram and its boundary conditions.</p> </li> <li> <p><strong>[Editorial]</strong> The contribution proposes adding content to “Clause 3” but does not specify which 3GPP document/TS/TR and which clause numbering (FS_6G_MED deliverable vs TR 26.847 vs a new SA4 TR), making the proposal non-actionable.</p> </li> <li> <p><strong>[Editorial]</strong> Several terms are introduced without alignment to existing 3GPP terminology (e.g., “Media Decoder/Generator” vs codec/renderer concepts, “token embeddings” vs feature vectors), and no mapping is provided to existing SA4 definitions, risking inconsistent vocabulary across the study.</p> </li> <li> <p><strong>[Editorial]</strong> The introduction cites TR 22.870 and “over 60 AI-related use cases” but does not cite specific use cases relevant to media communication nor extract requirements; adding a small set of representative use cases and their media/traffic implications would strengthen the contribution.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260114 PDF Edit	[FS_6G_MED] Testbed for AI Media Services traffic characterization	Qualcomm Atheros, Inc.	Proposals Proposal We propose that SA4 agrees the proposed testbed for AI traffic characterization as the baseline for the evaluation and document it in TR 26.870 (Study on Media Aspects for 6G System).	You should sign in to be able to post reviews Sign In
S4-260115 PDF Edit	[FS_6G_MED] Test scenarios for AI traffic characterization	Qualcomm Atheros, Inc.	Extracted Proposals Proposal: We propose that the group adopts the identified test scenarios as described in this contribution and implemented in the AI testbed. We also propose to document the relevant AI use cases from TR 22.870 in an Annex in the TR 26.870.	You should sign in to be able to post reviews Sign In
S4-260133 PDF Edit	pCR [FS_6G_MED] Considerations on Work Topic 1 Media Delivery requirements for intelligent immersive calling	HuaWei Technologies Co., Ltd	Proposal: It is proposed to agree the following changes to 3GPP TR 26.870	Previous Reviews: manager 2026-02-10 03:44:01 [Technical] The proposed “requirements” (e.g., “4K + HDR uplink video capability”, “eye tracking”, “user intention understanding”) are largely service/AI feature statements and not clearly mapped to media delivery functions in TR 26.870 clause 4.2; they need to be reframed as measurable media delivery requirements (latency, bitrate ranges, synchronization, reliability, metadata carriage) or moved to a more appropriate section/spec. [Technical] “4K + HDR uplink video capability” is underspecified and potentially unrealistic without constraints (frame rate, codec profiles/levels, HDR format such as HDR10/HLG, color depth, end-to-end latency, and uplink bandwidth assumptions); as written it is not testable and risks conflicting with existing 5G/IMS deployment realities. [Technical] Eye tracking support is introduced without defining what the network/media system must transport (e.g., gaze vectors, timestamps, coordinate systems, sampling rates) and without addressing synchronization with audio/video; this omission makes it unclear whether the requirement impacts RTP payloads, metadata frameworks, or application-layer signaling. [Technical] “User intention understanding through voice and gesture inputs” is not a media delivery requirement and implies AI inference in the network/operator domain; the CR should clarify whether the requirement is about transporting multimodal sensor streams/metadata, or about network exposure/compute, otherwise it exceeds SA4 media scope. [Technical] “Device-Aware QoE” is vague: it does not specify the mechanism (adaptation sets, scalable coding, multi-stream, per-device rendering constraints) nor the KPIs (stall rate, motion-to-photon latency, audio-video sync); it should align with existing 26-series QoE/QoS adaptation concepts rather than introducing a new undefined tiering concept. [Technical] “IMS Extensions: Extensibility of IMS to support these new capabilities” is problematic in TR 26.870 unless it identifies concrete IMS/SIP/SDP capability negotiation needs; otherwise it becomes an open-ended requirement that overlaps with SA2/CT and lacks normative direction. [Technical] “Multi-Media Protocol Extensions: Review of protocol extensions for multi-media transporting” is not a requirement but an action item; it should be converted into specific protocol needs (e.g., RTP/RTCP extensions, SDP attributes, metadata carriage, multi-stream synchronization) or removed from requirements text. [Technical] The service definition claims the service “can be natively provided by operators” but does not specify architectural implications (edge compute, media processing functions, exposure APIs) or how this interacts with existing 3GPP media frameworks; this risks creating an ungrounded requirement without feasibility analysis. [Technical] The contribution references SA1 TR 22.870 use cases for aging populations but does not trace each new requirement back to a specific SA1 requirement/use-case element; lack of traceability weakens justification and may introduce requirements not endorsed by SA1. [Editorial] The document metadata is inconsistent: it is labeled “Document: S4-260133” but “Document Number: S4-260080”; this needs correction to avoid confusion in CR tracking and meeting records. [Editorial] Terminology is inconsistent/unclear (“multi-media transporting”, “intelligent immersive calling”, “tiered QoE”); the CR should introduce definitions and use consistent 3GPP wording (e.g., “multimedia transport”, “immersive communication”) aligned with TR 26.870 style. [Editorial] The summary states “updates to clause 4.2” and “New Clause 4.2.1” but does not show the exact tracked changes text; for a CR for approval, the absence of precise change markup (additions/deletions) makes it impossible to verify consistency with the current clause structure and numbering. <ol> <li> <p><strong>[Technical]</strong> The proposed “requirements” (e.g., “4K + HDR uplink video capability”, “eye tracking”, “user intention understanding”) are largely service/AI feature statements and not clearly mapped to media delivery functions in TR 26.870 clause 4.2; they need to be reframed as measurable media delivery requirements (latency, bitrate ranges, synchronization, reliability, metadata carriage) or moved to a more appropriate section/spec. </p> </li> <li> <p><strong>[Technical]</strong> “4K + HDR uplink video capability” is underspecified and potentially unrealistic without constraints (frame rate, codec profiles/levels, HDR format such as HDR10/HLG, color depth, end-to-end latency, and uplink bandwidth assumptions); as written it is not testable and risks conflicting with existing 5G/IMS deployment realities. </p> </li> <li> <p><strong>[Technical]</strong> Eye tracking support is introduced without defining what the network/media system must transport (e.g., gaze vectors, timestamps, coordinate systems, sampling rates) and without addressing synchronization with audio/video; this omission makes it unclear whether the requirement impacts RTP payloads, metadata frameworks, or application-layer signaling. </p> </li> <li> <p><strong>[Technical]</strong> “User intention understanding through voice and gesture inputs” is not a media delivery requirement and implies AI inference in the network/operator domain; the CR should clarify whether the requirement is about transporting multimodal sensor streams/metadata, or about network exposure/compute, otherwise it exceeds SA4 media scope. </p> </li> <li> <p><strong>[Technical]</strong> “Device-Aware QoE” is vague: it does not specify the mechanism (adaptation sets, scalable coding, multi-stream, per-device rendering constraints) nor the KPIs (stall rate, motion-to-photon latency, audio-video sync); it should align with existing 26-series QoE/QoS adaptation concepts rather than introducing a new undefined tiering concept. </p> </li> <li> <p><strong>[Technical]</strong> “IMS Extensions: Extensibility of IMS to support these new capabilities” is problematic in TR 26.870 unless it identifies concrete IMS/SIP/SDP capability negotiation needs; otherwise it becomes an open-ended requirement that overlaps with SA2/CT and lacks normative direction. </p> </li> <li> <p><strong>[Technical]</strong> “Multi-Media Protocol Extensions: Review of protocol extensions for multi-media transporting” is not a requirement but an action item; it should be converted into specific protocol needs (e.g., RTP/RTCP extensions, SDP attributes, metadata carriage, multi-stream synchronization) or removed from requirements text. </p> </li> <li> <p><strong>[Technical]</strong> The service definition claims the service “can be natively provided by operators” but does not specify architectural implications (edge compute, media processing functions, exposure APIs) or how this interacts with existing 3GPP media frameworks; this risks creating an ungrounded requirement without feasibility analysis. </p> </li> <li> <p><strong>[Technical]</strong> The contribution references SA1 TR 22.870 use cases for aging populations but does not trace each new requirement back to a specific SA1 requirement/use-case element; lack of traceability weakens justification and may introduce requirements not endorsed by SA1. </p> </li> <li> <p><strong>[Editorial]</strong> The document metadata is inconsistent: it is labeled “Document: S4-260133” but “Document Number: S4-260080”; this needs correction to avoid confusion in CR tracking and meeting records. </p> </li> <li> <p><strong>[Editorial]</strong> Terminology is inconsistent/unclear (“multi-media transporting”, “intelligent immersive calling”, “tiered QoE”); the CR should introduce definitions and use consistent 3GPP wording (e.g., “multimedia transport”, “immersive communication”) aligned with TR 26.870 style. </p> </li> <li> <p><strong>[Editorial]</strong> The summary states “updates to clause 4.2” and “New Clause 4.2.1” but does not show the exact tracked changes text; for a CR for approval, the absence of precise change markup (additions/deletions) makes it impossible to verify consistency with the current clause structure and numbering.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260161 PDF Edit	[FS_6G_MED]pCR on Embodied Video for 6G Media	China Mobile Com. Corporation	Proposals Proposal 1: It is proposed to agree the following changes to clause 6 of 3GPP TR 26.870.	Previous Reviews: manager 2026-02-10 04:17:09 [Technical] The proposed new use case “Embodied Video Internet (EVI)” is not clearly mapped to the existing TR 26.870 study objectives and terminology; it reads like a new umbrella concept rather than a media-centric use case, and the CR should explicitly justify why it belongs in TR 26.870 (SA4 media study) versus remaining in SA1/SA2 domain. [Technical] Several KPI values appear internally inconsistent or insufficiently specified for media work: e.g., “6x 1080p @ 15Hz → 20 Mbps” and “compression ratio 240:1” are asserted without stating codec, chroma format, bit depth, target quality, or whether “Hz” means fps, making the derived bitrates non-reproducible and potentially misleading for SA4 conclusions. [Technical] The latency requirement “E2E RTT 100–300 ms” for “real-time” embodied control/offloading is not reconciled with the tighter control-loop needs implied elsewhere (e.g., 10 ms sensor intervals, motion control), and the text should distinguish clearly between (a) media transport latency for perception streams and (b) closed-loop control latency/reliability requirements. [Technical] The contribution mixes “video” requirements with non-media payloads (LiDAR, point clouds, sensor data) but does not define the scope boundary for TR 26.870 (media codecs/protocols/QoE); without scoping, the clause risks driving requirements that are more appropriate for generic data transport or edge computing studies. [Technical] “AI codec with error-tolerant capabilities (Grace method)” is introduced as a requirement but is not defined, referenced, or aligned with ongoing 3GPP/MPEG terminology (e.g., neural codecs, feature/latent compression, ROI coding); as written it is not actionable and could conflict with existing codec evaluation frameworks. [Technical] “AI-native Video Protocol” is proposed as a key requirement without identifying what is missing in existing protocol stacks (RTP/RTCP, QUIC, DASH/CMAF, WebRTC, 5G media streaming) or what protocol functions are uniquely required (e.g., semantic prioritization, multi-stream synchronization, in-network adaptation), so it reads as a vague solution statement rather than a requirement. [Technical] Reliability targets such as “>99.99%” are stated for UAV inspection and robot sensor/LiDAR traffic without defining the reliability metric (packet success probability, frame delivery, application-level inference success, within what time bound), which is critical for translating to media-layer mechanisms. [Technical] The UAV “event security” latency “≤10 ms” for 1K/4K video at ≥5/≥25 Mbps is extremely stringent and likely infeasible end-to-end for typical video pipelines (capture/encode/packetize/decode/render), unless it refers to one-way transport only; the clause should clarify the latency definition and include processing components or explicitly exclude them. [Technical] Multi-camera scenarios (6–8 cameras, mixed 1080p/4K, 15/30/60 fps) are listed, but there is no requirement discussion on synchronization (inter-camera time alignment), multi-stream correlation, or joint encoding/transport, which are central media issues for embodied perception and 3D reconstruction. [Technical] The “QoE model” section is too generic and user-centric for a machine-consumer scenario; embodied AI often optimizes task success (e.g., mAP, tracking stability, control error) rather than human QoE, so the clause should introduce task-oriented QoS/QoE (QoTask) metrics and how they relate to media impairments. [Technical] The contribution cites SA1 TR 22.870 use cases but does not ensure consistent numbering/traceability (e.g., “Use Case 6.28/6.19/6.48/6.11”) to the exact clauses/tables in TR 22.870; without precise references, the extracted KPIs risk being challenged as non-authoritative. [Editorial] Clause/table numbering appears inconsistent in the summary (“Table 2.1.3-1/2.1.3-2” under “Clause 6.1.3”), suggesting the CR may introduce numbering conflicts or incorrect cross-references in TR 26.870; numbering should follow the target document’s clause structure. [Editorial] Terms are used inconsistently or non-standardly (“15Hz” for frame rate, “E2E RTT” vs “E2E latency”, “1K” resolution), and should be normalized to 3GPP style (fps, one-way latency vs RTT, explicit pixel dimensions). [Editorial] The text frequently shifts from requirements to solution proposals (“new protocol design”, “AI codec technology”) without using normative/requirements language appropriate for a TR study clause (e.g., “may need”, “is expected to”), which could be seen as over-prescriptive for a study item. <ol> <li> <p><strong>[Technical]</strong> The proposed new use case “Embodied Video Internet (EVI)” is not clearly mapped to the existing TR 26.870 study objectives and terminology; it reads like a new umbrella concept rather than a media-centric use case, and the CR should explicitly justify why it belongs in TR 26.870 (SA4 media study) versus remaining in SA1/SA2 domain. </p> </li> <li> <p><strong>[Technical]</strong> Several KPI values appear internally inconsistent or insufficiently specified for media work: e.g., “6x 1080p @ 15Hz → 20 Mbps” and “compression ratio 240:1” are asserted without stating codec, chroma format, bit depth, target quality, or whether “Hz” means fps, making the derived bitrates non-reproducible and potentially misleading for SA4 conclusions. </p> </li> <li> <p><strong>[Technical]</strong> The latency requirement “E2E RTT 100–300 ms” for “real-time” embodied control/offloading is not reconciled with the tighter control-loop needs implied elsewhere (e.g., 10 ms sensor intervals, motion control), and the text should distinguish clearly between (a) media transport latency for perception streams and (b) closed-loop control latency/reliability requirements. </p> </li> <li> <p><strong>[Technical]</strong> The contribution mixes “video” requirements with non-media payloads (LiDAR, point clouds, sensor data) but does not define the scope boundary for TR 26.870 (media codecs/protocols/QoE); without scoping, the clause risks driving requirements that are more appropriate for generic data transport or edge computing studies. </p> </li> <li> <p><strong>[Technical]</strong> “AI codec with error-tolerant capabilities (Grace method)” is introduced as a requirement but is not defined, referenced, or aligned with ongoing 3GPP/MPEG terminology (e.g., neural codecs, feature/latent compression, ROI coding); as written it is not actionable and could conflict with existing codec evaluation frameworks. </p> </li> <li> <p><strong>[Technical]</strong> “AI-native Video Protocol” is proposed as a key requirement without identifying what is missing in existing protocol stacks (RTP/RTCP, QUIC, DASH/CMAF, WebRTC, 5G media streaming) or what protocol functions are uniquely required (e.g., semantic prioritization, multi-stream synchronization, in-network adaptation), so it reads as a vague solution statement rather than a requirement. </p> </li> <li> <p><strong>[Technical]</strong> Reliability targets such as “>99.99%” are stated for UAV inspection and robot sensor/LiDAR traffic without defining the reliability metric (packet success probability, frame delivery, application-level inference success, within what time bound), which is critical for translating to media-layer mechanisms. </p> </li> <li> <p><strong>[Technical]</strong> The UAV “event security” latency “≤10 ms” for 1K/4K video at ≥5/≥25 Mbps is extremely stringent and likely infeasible end-to-end for typical video pipelines (capture/encode/packetize/decode/render), unless it refers to one-way transport only; the clause should clarify the latency definition and include processing components or explicitly exclude them. </p> </li> <li> <p><strong>[Technical]</strong> Multi-camera scenarios (6–8 cameras, mixed 1080p/4K, 15/30/60 fps) are listed, but there is no requirement discussion on synchronization (inter-camera time alignment), multi-stream correlation, or joint encoding/transport, which are central media issues for embodied perception and 3D reconstruction. </p> </li> <li> <p><strong>[Technical]</strong> The “QoE model” section is too generic and user-centric for a machine-consumer scenario; embodied AI often optimizes task success (e.g., mAP, tracking stability, control error) rather than human QoE, so the clause should introduce task-oriented QoS/QoE (QoTask) metrics and how they relate to media impairments. </p> </li> <li> <p><strong>[Technical]</strong> The contribution cites SA1 TR 22.870 use cases but does not ensure consistent numbering/traceability (e.g., “Use Case 6.28/6.19/6.48/6.11”) to the exact clauses/tables in TR 22.870; without precise references, the extracted KPIs risk being challenged as non-authoritative. </p> </li> <li> <p><strong>[Editorial]</strong> Clause/table numbering appears inconsistent in the summary (“Table 2.1.3-1/2.1.3-2” under “Clause 6.1.3”), suggesting the CR may introduce numbering conflicts or incorrect cross-references in TR 26.870; numbering should follow the target document’s clause structure. </p> </li> <li> <p><strong>[Editorial]</strong> Terms are used inconsistently or non-standardly (“15Hz” for frame rate, “E2E RTT” vs “E2E latency”, “1K” resolution), and should be normalized to 3GPP style (fps, one-way latency vs RTT, explicit pixel dimensions). </p> </li> <li> <p><strong>[Editorial]</strong> The text frequently shifts from requirements to solution proposals (“new protocol design”, “AI codec technology”) without using normative/requirements language appropriate for a TR study clause (e.g., “may need”, “is expected to”), which could be seen as over-prescriptive for a study item.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260230 PDF Edit	[FS_DCTC_eQoS_MED] Description of experimental approach and test setup for media transmission for AI inferencing	InterDigital Pennsylvania	Proposal: It is proposed to agree the following changes to 3GPP TR 26.823 v0.2.0.	You should sign in to be able to post reviews Sign In
S4-260234 PDF Edit	6GMedia - work topic 2- Characteristics of AI-enabled applications	InterDigital New York	Proposals from Document S4-260234 Proposition 1: SA4 should study the support of additional media modality and codecs or their enhancements for 6G, building upon 5G and 5GA studies. Proposition 2: SA4 should define terminology applicable for AI/ML data (feature, token, embeddings, latent, intent, etc.), study the relevant AI representation formats to ensure a common understanding across WGs and study which interchangeable format/codecs are applicable. Proposition 3: SA4 should identify and study a set of spatial compute functions that may benefit from off-device processing for mobile AI-enabled applications and services Proposition 4: When studying traffic characteristics for AI-enabled applications and services, SA4 should aim at developing generic QoS and QoE mechanisms suitable across the diversity of traffic patterns. Proposition 5: SA4 should study the necessary enhancement to QoS framework enabling finer granularity and context awareness. Proposal 6: SA4 should specify the procedures for real-time QoE-based adaptation of multimodal media and define QoE metrics for real-time and delay-bound AI inference. Proposal 7: SA4 should characterize the impact of QUIC-based protocols on the delivery of AI data and on the traffic characteristics of AI applications and services over QUIC-based protocols especially for real-time or delay-bound applications. Proposal 8: SA4 should study the potential integration of the extensions SA2 has defined for QUIC-based transport solutions into the media delivery architecture. For the RTC architecture, this would be done by leveraging the FS_Q4RTC-MED study and applied to these AI-enabled applications. Proposal 8: SA4 should study the impact of multi-devices on the QoS and QoE framework. Proposition 9: SA4 should consider heterogenous multi-devices associated with the same user for the definition of QoE metrics and for the study of potential enhancement to QoS for real-time and delay-bound AI inference.	Previous Reviews: manager 2026-02-09 04:30:16 [Technical] Several proposals (e.g., “SA4 should develop generic QoS and QoE mechanisms”, “enhance QoS framework granularity/context awareness”, “procedures for real-time QoE-based adaptation”) are largely SA2/SA5 scope and risk duplicating 5GS QoS work; the contribution should clearly delimit SA4’s role (media adaptation signaling, codecs, application-layer metrics) vs system QoS control. [Technical] The document treats “AI/ML data” (tokens, embeddings, latent, intent, prompts, model parameters) as if SA4 should standardize representation formats/codecs, but it does not justify why this belongs in 3GPP SA4 rather than external SDOs (IETF/W3C/ISO/IEC JTC1/SC29) or 3GPP SA6; without a concrete interoperability gap and target interface, the proposal is not actionable. [Technical] Table 1 claims/assumptions around specific formats (ONNX, GGUF, MPEG NNC, MPEG-7, W3C Media Annotations, “MPEG FCM”, “upcoming MPEG avatar”) are problematic: some are not stable standards, not widely interoperable, or not clearly relevant to 3GPP media specs, and the document does not state selection criteria or normative references. [Technical] The “spatial computing off-device processing” propositions are underspecified: no functional split options, latency budgets, synchronization requirements, or mapping to 3GPP enablers (e.g., edge computing, uplink media formats, timing) are provided, making it hard to translate into TR text beyond generic statements. [Technical] The QoS characterization in Table 2 (“mid reliability”, “real-time latency”, “high need for QoE-based adaptation”) is qualitative and inconsistent with 3GPP practice; it should be tied to measurable KPIs (e.g., packet delay budget, loss rate, jitter, sync error) and to existing 5QI/GBR/non-GBR concepts if it is to influence SA4 study conclusions. [Technical] The contribution asserts “current QoS frameworks lack application/context awareness, granularity, and adaptability” without citing specific limitations in existing 3GPP mechanisms (e.g., QoS flows, reflective QoS, NWDAF, TS 26.114/26.501/26.522 mechanisms); this weakens the rationale for new work and risks overstating gaps. [Technical] The QUIC/MRI discussion mixes layers and responsibilities: SA4’s RTP header extension MRI solution (TS 26.522) is not directly comparable to SA2 N6 relaying of MRI for encrypted QUIC traffic, and the document does not explain what “integration” would concretely mean (new application-layer metadata objects, gateways, or mapping rules). [Technical] The statement that “Rel-19 SA2 specified techniques for delivering MRI when XRM traffic is end-to-end encrypted (QUIC)” needs precision (exact WI/spec references and what was standardized vs studied); otherwise it risks being factually incorrect or misleading. [Technical] Multi-device/tethering observations are valid but the proposal again drifts into system-level territory (“UE-centric assumptions”, “traffic correlation across UEs”) without identifying SA4-specific deliverables (e.g., cross-device media synchronization signaling, multi-stream adaptation coordination, per-device capture/render timing). [Editorial] Numbering is inconsistent and duplicated (e.g., “Observation 8/9” and “Proposal 8” appear in multiple sections with different meanings), which will cause confusion when transcribing into TR clauses. [Editorial] Terminology is not controlled: “XR”, “XRM”, “AI-enabled”, “spatial computing”, “MRI”, “AI data”, “intermediate data (embeddings)” are used without definitions or alignment to existing 3GPP terms, undermining Proposition 2’s stated goal. [Editorial] Several codec/format mentions are vague or potentially incorrect (“dynamic mesh/gaussian splat codecs”, “MPEG haptics”, “MPEG FCM”) and should be replaced with precise standard names, part numbers, and maturity status, or moved to informative examples. [Editorial] The conclusion proposes adding content to “a new section 6.X of the TR” but does not specify which TR (presumably the 6GMedia TR) nor the intended clause structure and exact text changes, making it hard for the group to adopt as-is. <ol> <li> <p><strong>[Technical]</strong> Several proposals (e.g., “SA4 should develop generic QoS and QoE mechanisms”, “enhance QoS framework granularity/context awareness”, “procedures for real-time QoE-based adaptation”) are largely SA2/SA5 scope and risk duplicating 5GS QoS work; the contribution should clearly delimit SA4’s role (media adaptation signaling, codecs, application-layer metrics) vs system QoS control. </p> </li> <li> <p><strong>[Technical]</strong> The document treats “AI/ML data” (tokens, embeddings, latent, intent, prompts, model parameters) as if SA4 should standardize representation formats/codecs, but it does not justify why this belongs in 3GPP SA4 rather than external SDOs (IETF/W3C/ISO/IEC JTC1/SC29) or 3GPP SA6; without a concrete interoperability gap and target interface, the proposal is not actionable. </p> </li> <li> <p><strong>[Technical]</strong> Table 1 claims/assumptions around specific formats (ONNX, GGUF, MPEG NNC, MPEG-7, W3C Media Annotations, “MPEG FCM”, “upcoming MPEG avatar”) are problematic: some are not stable standards, not widely interoperable, or not clearly relevant to 3GPP media specs, and the document does not state selection criteria or normative references. </p> </li> <li> <p><strong>[Technical]</strong> The “spatial computing off-device processing” propositions are underspecified: no functional split options, latency budgets, synchronization requirements, or mapping to 3GPP enablers (e.g., edge computing, uplink media formats, timing) are provided, making it hard to translate into TR text beyond generic statements. </p> </li> <li> <p><strong>[Technical]</strong> The QoS characterization in Table 2 (“mid reliability”, “real-time latency”, “high need for QoE-based adaptation”) is qualitative and inconsistent with 3GPP practice; it should be tied to measurable KPIs (e.g., packet delay budget, loss rate, jitter, sync error) and to existing 5QI/GBR/non-GBR concepts if it is to influence SA4 study conclusions. </p> </li> <li> <p><strong>[Technical]</strong> The contribution asserts “current QoS frameworks lack application/context awareness, granularity, and adaptability” without citing specific limitations in existing 3GPP mechanisms (e.g., QoS flows, reflective QoS, NWDAF, TS 26.114/26.501/26.522 mechanisms); this weakens the rationale for new work and risks overstating gaps. </p> </li> <li> <p><strong>[Technical]</strong> The QUIC/MRI discussion mixes layers and responsibilities: SA4’s RTP header extension MRI solution (TS 26.522) is not directly comparable to SA2 N6 relaying of MRI for encrypted QUIC traffic, and the document does not explain what “integration” would concretely mean (new application-layer metadata objects, gateways, or mapping rules). </p> </li> <li> <p><strong>[Technical]</strong> The statement that “Rel-19 SA2 specified techniques for delivering MRI when XRM traffic is end-to-end encrypted (QUIC)” needs precision (exact WI/spec references and what was standardized vs studied); otherwise it risks being factually incorrect or misleading. </p> </li> <li> <p><strong>[Technical]</strong> Multi-device/tethering observations are valid but the proposal again drifts into system-level territory (“UE-centric assumptions”, “traffic correlation across UEs”) without identifying SA4-specific deliverables (e.g., cross-device media synchronization signaling, multi-stream adaptation coordination, per-device capture/render timing). </p> </li> <li> <p><strong>[Editorial]</strong> Numbering is inconsistent and duplicated (e.g., “Observation 8/9” and “Proposal 8” appear in multiple sections with different meanings), which will cause confusion when transcribing into TR clauses. </p> </li> <li> <p><strong>[Editorial]</strong> Terminology is not controlled: “XR”, “XRM”, “AI-enabled”, “spatial computing”, “MRI”, “AI data”, “intermediate data (embeddings)” are used without definitions or alignment to existing 3GPP terms, undermining Proposition 2’s stated goal. </p> </li> <li> <p><strong>[Editorial]</strong> Several codec/format mentions are vague or potentially incorrect (“dynamic mesh/gaussian splat codecs”, “MPEG haptics”, “MPEG FCM”) and should be replaced with precise standard names, part numbers, and maturity status, or moved to informative examples. </p> </li> <li> <p><strong>[Editorial]</strong> The conclusion proposes adding content to “a new section 6.X of the TR” but does not specify which TR (presumably the 6GMedia TR) nor the intended clause structure and exact text changes, making it hard for the group to adopt as-is.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260269 PDF Edit	On SA4 work on AI traffic characteristics	Apple Inc.	Proposal 1: Agree that any assumptions made regarding AI formats for the purpose of traffic modeling are for guidance of related development of AI traffic models. They should not be treated as rigid, normative standardization targets. Correspondingly, SA4 should avoid normative work on AI formats (the actual structure of the data packets) at this stage. Doing so risks locking the 3GPP specifications into constraints that do not suit the technology as it evolves over the next few years. Proposal 2: Work should focus on the traffic characteristics (latency, throughput, periodicity, burstiness characteristics), rather than the specific coding or file formats used to generate that traffic. Proposal 3: SA4 should review this as the landscape on AI traffic and its formats continue to evolve rapidly.	Previous Reviews: manager 2026-02-09 04:29:45 [Technical] The contribution argues SA4 should avoid “normative work on AI formats,” but it does not clearly distinguish between (a) normative specification of application payload schemas (likely out of scope) and (b) normative traffic descriptors/traffic models needed by RAN2/SA2; this risks leaving RAN2/SA2 without actionable, comparable parameters. [Technical] The paper claims “no clear interoperability requirement exists today,” yet AI services already rely on interoperable transport/application behaviors (e.g., HTTP/2/3, TLS, streaming responses); the contribution should clarify what “interoperability” means in 3GPP terms and why that precludes any normative assumptions for modeling. [Technical] The proposed “focus on traffic characteristics” is not operationalized: no concrete set of traffic model parameters (e.g., request size distributions, token/segment inter-arrival, response truncation, concurrency, session duration) is provided, so SA4 cannot translate the guidance into a usable model. [Technical] The characterization “bursty and unpredictable” / “event-driven rather than steady-state streaming” is too generic and potentially misleading given common AI response streaming (server-sent events / chunked transfer / WebSocket-like patterns) that can resemble quasi-streaming; the document should explicitly cover both non-streaming and streaming inference modes. [Technical] The “Text-to-Text small prompt, token-based responses” framing omits key drivers of traffic: context window growth, retrieval-augmented generation (RAG) document fetches, tool-calling/agent loops, and multi-turn conversation history, all of which can dominate uplink/downlink volumes and burst patterns. [Technical] Multimodal is reduced to “photo uploads,” but emerging patterns include continuous audio (speech) and video understanding with sustained uplink plus low-latency downlink; excluding these biases any future traffic model toward sporadic UL bursts only. [Technical] The document references REST/JSON examples (OpenAI/Claude/Gemini) but does not address that payload encoding (JSON vs protobuf), compression, and HTTP streaming materially affect packetization and burstiness; if SA4 avoids format standardization, it should still bound these factors for modeling. [Technical] The “5+ years in the future / 6G deployment” argument is not aligned with SA4’s immediate remit to support Rel-19/Rel-20 studies; the contribution should reconcile near-term modeling needs with long-term uncertainty rather than using long-term uncertainty to defer specifics. [Technical] “Continuous review” is proposed without a mechanism (trigger conditions, cadence, ownership, or liaison process with RAN2/SA2), making it unlikely to be actionable within 3GPP work planning. [Technical] The paper does not map its recommendations to existing 3GPP traffic model frameworks (e.g., TS 26.234/26.247 style service requirements, TR-based traffic models, or SA2 service exposure assumptions), risking inconsistency with established methodology. [Editorial] The contribution reads as a positioning note but lacks explicit asks to SA4 (e.g., “agree to treat AI payload formats as non-normative in TR X,” “provide parameter set Y to RAN2 by date Z”), which makes it hard to conclude or capture in meeting notes. [Editorial] Terms like “AI formats,” “data packet structure,” “model formats,” and “traffic patterns” are used interchangeably; tighter terminology (application payload schema vs transport protocol vs traffic model parameters) is needed to avoid misinterpretation. [Editorial] The summary cites external proprietary API references as evidence of variability, but does not provide stable citations or extract the relevant commonalities/differences; for a 3GPP contribution, the argument should be supported by a more systematic comparison or at least a table of observed behaviors. <ol> <li> <p><strong>[Technical]</strong> The contribution argues SA4 should avoid “normative work on AI formats,” but it does not clearly distinguish between (a) normative specification of application payload schemas (likely out of scope) and (b) normative traffic descriptors/traffic models needed by RAN2/SA2; this risks leaving RAN2/SA2 without actionable, comparable parameters.</p> </li> <li> <p><strong>[Technical]</strong> The paper claims “no clear interoperability requirement exists today,” yet AI services already rely on interoperable transport/application behaviors (e.g., HTTP/2/3, TLS, streaming responses); the contribution should clarify what “interoperability” means in 3GPP terms and why that precludes any normative assumptions for modeling.</p> </li> <li> <p><strong>[Technical]</strong> The proposed “focus on traffic characteristics” is not operationalized: no concrete set of traffic model parameters (e.g., request size distributions, token/segment inter-arrival, response truncation, concurrency, session duration) is provided, so SA4 cannot translate the guidance into a usable model.</p> </li> <li> <p><strong>[Technical]</strong> The characterization “bursty and unpredictable” / “event-driven rather than steady-state streaming” is too generic and potentially misleading given common AI response streaming (server-sent events / chunked transfer / WebSocket-like patterns) that can resemble quasi-streaming; the document should explicitly cover both non-streaming and streaming inference modes.</p> </li> <li> <p><strong>[Technical]</strong> The “Text-to-Text small prompt, token-based responses” framing omits key drivers of traffic: context window growth, retrieval-augmented generation (RAG) document fetches, tool-calling/agent loops, and multi-turn conversation history, all of which can dominate uplink/downlink volumes and burst patterns.</p> </li> <li> <p><strong>[Technical]</strong> Multimodal is reduced to “photo uploads,” but emerging patterns include continuous audio (speech) and video understanding with sustained uplink plus low-latency downlink; excluding these biases any future traffic model toward sporadic UL bursts only.</p> </li> <li> <p><strong>[Technical]</strong> The document references REST/JSON examples (OpenAI/Claude/Gemini) but does not address that payload encoding (JSON vs protobuf), compression, and HTTP streaming materially affect packetization and burstiness; if SA4 avoids format standardization, it should still bound these factors for modeling.</p> </li> <li> <p><strong>[Technical]</strong> The “5+ years in the future / 6G deployment” argument is not aligned with SA4’s immediate remit to support Rel-19/Rel-20 studies; the contribution should reconcile near-term modeling needs with long-term uncertainty rather than using long-term uncertainty to defer specifics.</p> </li> <li> <p><strong>[Technical]</strong> “Continuous review” is proposed without a mechanism (trigger conditions, cadence, ownership, or liaison process with RAN2/SA2), making it unlikely to be actionable within 3GPP work planning.</p> </li> <li> <p><strong>[Technical]</strong> The paper does not map its recommendations to existing 3GPP traffic model frameworks (e.g., TS 26.234/26.247 style service requirements, TR-based traffic models, or SA2 service exposure assumptions), risking inconsistency with established methodology.</p> </li> <li> <p><strong>[Editorial]</strong> The contribution reads as a positioning note but lacks explicit asks to SA4 (e.g., “agree to treat AI payload formats as non-normative in TR X,” “provide parameter set Y to RAN2 by date Z”), which makes it hard to conclude or capture in meeting notes.</p> </li> <li> <p><strong>[Editorial]</strong> Terms like “AI formats,” “data packet structure,” “model formats,” and “traffic patterns” are used interchangeably; tighter terminology (application payload schema vs transport protocol vs traffic model parameters) is needed to avoid misinterpretation.</p> </li> <li> <p><strong>[Editorial]</strong> The summary cites external proprietary API references as evidence of variability, but does not provide stable citations or extract the relevant commonalities/differences; for a 3GPP contribution, the argument should be supported by a more systematic comparison or at least a table of observed behaviors.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260273 PDF Edit	6GMedia - AI terminology	InterDigital New York	Extracted Proposals Proposal It is proposed to include sections 1 to 3 in a relevant section of the TR 26.870	Previous Reviews: manager 2026-02-09 04:29:05 [Technical] The proposal is not framed as normative 3GPP terminology (no alignment to TR 21.905 style, no indication of scope/authority), so inserting these definitions into TR 26.870 risks creating conflicting “official” definitions versus existing SA4/SA2 terms (e.g., “feature”, “descriptor”, “intermediate data”) without a clear governance statement. [Technical] “Soft token” is defined as a continuous vector that “replaces or augments a hard token” and is “processed similarly,” but in many architectures tokens remain discrete indices while the embedding is continuous; the current text blurs token vs embedding and will confuse traffic characterization discussions (bits on the wire are typically hard-token indices or coded latents, not “soft tokens”). [Technical] The definition “Embedding… Not inherently part of a token sequence” is incorrect/incomplete for common transformer pipelines where embeddings are exactly the per-token continuous representations forming the input sequence; this contradiction undermines the intended clarity between “token”, “embedding”, and “latent”. [Technical] “Learned based media compression (representation)” is described as “syntax-defined coded form derived from latent representation after quantization and entropy coding,” which excludes important learned codecs that do not use explicit entropy coding or use arithmetic coding over discrete tokens; the definition should be generalized or explicitly scoped to avoid being wrong for major classes of neural codecs. [Technical] The “Model exchange representation” examples include GGUF, which is primarily a model file/container format for specific inference stacks rather than an interoperable operator-graph exchange format like ONNX/NNEF; mixing these may mislead SA4 on what is realistically exchangeable across vendors. [Technical] The “Internal vs external representation” matrix is largely FFS and asserts “Model exchange representation: Not internal,” but model formats can be internal to a system component boundary (e.g., between orchestrator and accelerator runtime); the internal/external dichotomy needs a 3GPP-entity/interface context (UE/AF/AS/NEF, etc.) to be meaningful. [Technical] “Intermediate data” is said to “include intermediate coded representation, feature representation or descriptors” and references TR 26.927, but the contribution does not quote or ensure consistency with the exact TR 26.927 definition; this risks redefining an existing SA4 term and should be cross-checked verbatim. [Technical] The applicability matrix makes strong modality claims (e.g., “Text: prevalent method hard tokens”, “Audio: prevalent method latents + embeddings”) that are architecture-dependent and not stable enough for a TR unless clearly labeled as informative examples; otherwise it may bias later requirements/traffic models incorrectly. [Technical] “Inference results” includes “W3C Media Annotations” as an example, but that is a metadata framework rather than an AI inference output format; the example set should be constrained to representations relevant to 3GPP media workflows (e.g., bounding boxes, masks, captions) and to what is exchanged over 3GPP interfaces. [Editorial] Several terms are introduced without consistent naming/grammar (“Learned based…” vs “Learned-based…”, “latent representation (latent)”, “exchangeable/external representation”), which will read poorly in TR 26.870 and complicate cross-referencing. [Editorial] The contribution proposes “include sections 1 to 3” but does not specify the exact target clause/subclause in TR 26.870, nor provide proposed text with numbering and definitions formatting; this makes it hard to assess integration impact and creates editorial ambiguity for rapporteurs. [Editorial] Examples mix standards and non-standards inconsistently (JPEG AI, MPEG AI-PCC, ONNX, NNEF, GGUF, NNC) without citations; TR text should either cite stable references or avoid listing volatile ecosystem artifacts that may date quickly. <ol> <li> <p><strong>[Technical]</strong> The proposal is not framed as normative 3GPP terminology (no alignment to TR 21.905 style, no indication of scope/authority), so inserting these definitions into TR 26.870 risks creating conflicting “official” definitions versus existing SA4/SA2 terms (e.g., “feature”, “descriptor”, “intermediate data”) without a clear governance statement.</p> </li> <li> <p><strong>[Technical]</strong> “Soft token” is defined as a continuous vector that “replaces or augments a hard token” and is “processed similarly,” but in many architectures tokens remain discrete indices while the <em>embedding</em> is continuous; the current text blurs token vs embedding and will confuse traffic characterization discussions (bits on the wire are typically hard-token indices or coded latents, not “soft tokens”).</p> </li> <li> <p><strong>[Technical]</strong> The definition “Embedding… Not inherently part of a token sequence” is incorrect/incomplete for common transformer pipelines where embeddings are exactly the per-token continuous representations forming the input sequence; this contradiction undermines the intended clarity between “token”, “embedding”, and “latent”.</p> </li> <li> <p><strong>[Technical]</strong> “Learned based media compression (representation)” is described as “syntax-defined coded form derived from latent representation after quantization and entropy coding,” which excludes important learned codecs that do not use explicit entropy coding or use arithmetic coding over discrete tokens; the definition should be generalized or explicitly scoped to avoid being wrong for major classes of neural codecs.</p> </li> <li> <p><strong>[Technical]</strong> The “Model exchange representation” examples include GGUF, which is primarily a model <em>file/container</em> format for specific inference stacks rather than an interoperable operator-graph exchange format like ONNX/NNEF; mixing these may mislead SA4 on what is realistically exchangeable across vendors.</p> </li> <li> <p><strong>[Technical]</strong> The “Internal vs external representation” matrix is largely FFS and asserts “Model exchange representation: Not internal,” but model formats can be internal to a system component boundary (e.g., between orchestrator and accelerator runtime); the internal/external dichotomy needs a 3GPP-entity/interface context (UE/AF/AS/NEF, etc.) to be meaningful.</p> </li> <li> <p><strong>[Technical]</strong> “Intermediate data” is said to “include intermediate coded representation, feature representation or descriptors” and references TR 26.927, but the contribution does not quote or ensure consistency with the exact TR 26.927 definition; this risks redefining an existing SA4 term and should be cross-checked verbatim.</p> </li> <li> <p><strong>[Technical]</strong> The applicability matrix makes strong modality claims (e.g., “Text: prevalent method hard tokens”, “Audio: prevalent method latents + embeddings”) that are architecture-dependent and not stable enough for a TR unless clearly labeled as informative examples; otherwise it may bias later requirements/traffic models incorrectly.</p> </li> <li> <p><strong>[Technical]</strong> “Inference results” includes “W3C Media Annotations” as an example, but that is a metadata framework rather than an AI inference output format; the example set should be constrained to representations relevant to 3GPP media workflows (e.g., bounding boxes, masks, captions) and to what is exchanged over 3GPP interfaces.</p> </li> <li> <p><strong>[Editorial]</strong> Several terms are introduced without consistent naming/grammar (“Learned based…” vs “Learned-based…”, “latent representation (latent)”, “exchangeable/external representation”), which will read poorly in TR 26.870 and complicate cross-referencing.</p> </li> <li> <p><strong>[Editorial]</strong> The contribution proposes “include sections 1 to 3” but does not specify the exact target clause/subclause in TR 26.870, nor provide proposed text with numbering and definitions formatting; this makes it hard to assess integration impact and creates editorial ambiguity for rapporteurs.</p> </li> <li> <p><strong>[Editorial]</strong> Examples mix standards and non-standards inconsistently (JPEG AI, MPEG AI-PCC, ONNX, NNEF, GGUF, NNC) without citations; TR text should either cite stable references or avoid listing volatile ecosystem artifacts that may date quickly.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260280 PDF Edit	[FS_6G_MED] Consideration on Media Delivery Architecture	LG Electronics Inc.	Proposal 1: Take this approach into account as 6G Media Delivery Architecture. Proposal 2. Add the following change to the TR for FS_6G_MED.	You should sign in to be able to post reviews Sign In
S4-260287 PDF Edit	overview of inputs to RAN2#133 on AI traffic characteristics	Huawei Tech.(UK) Co.. Ltd	Extracted Proposals Proposal Take this information into account in FS 6G MED, mainly focus on traffic characteristics of interest to RAN-2 that have explicitly been targeted for coordination with SA4 at the ongoing RAN-2 meeting.	Previous Reviews: manager 2026-02-09 04:28:26 [Technical] The document repeatedly assumes “token” semantics (importance, dependency, token-to-PDU mapping, visibility to RAN) are relevant to RAN handling, but it does not justify how tokens would be observable given end-to-end encryption and application-layer framing; without a concrete exposure mechanism (e.g., explicit QoS marking/metadata), most token-level proposals are not actionable in 3GPP RAN. [Technical] Several items imply new RAN behavior based on “importance differentiation” and “error-tolerant token transmission,” but there is no linkage to existing 5QI/QFI/QoS Flow constructs or a proposal for how importance maps to standardized QoS parameters (PDB, PER, priority level, MDBV, etc.), risking duplication or conflict with the 5GS QoS model. [Technical] The recommendation to reuse XR “PDU Set” mechanisms for AI traffic is underspecified: it does not state which layer/function (PDCP/RLC/MAC) would bind/annotate sets, nor how this interacts with segmentation/retransmissions, making the feasibility of “PDU Set binding/annotation for AI traffic” unclear. [Technical] The Rel-20 vs 6G split is asserted (“Rel-20 uplink/non-real-time; 6G real-time bidirectional”) without evidence from the summarized inputs that downlink real-time AI is out of Rel-20 scope; this may prematurely constrain RAN2 study/work item scope and misalign with SA4/SA1 service requirements. [Technical] “Small packet sizes” and “uplink-heavy” are presented as near-universal characteristics, but the document also includes training/federated learning “bulk, synchronized uploads” and split inference intermediate data, which can be large and periodic; the summary should explicitly separate these regimes to avoid misleading one-size-fits-all conclusions. [Technical] The text suggests “small packet transmission in RRC inactive” as an uplink enhancement, but does not reference existing mechanisms (e.g., RRC Inactive data transmission solutions) or identify what gap remains for AI traffic, making it hard to assess whether new work is needed. [Technical] “Multi-modal synchronization beyond MMSID for QoS control” is mentioned, but MMSID is not defined in this document and the required synchronization primitive (timestamping, cross-flow dependency signaling, common deadline) is not articulated; this risks creating vague requirements that SA4 cannot answer. [Technical] The dependency list asks SA4 for “packet delay budget, PER tolerance, packet importance variability,” but SA4 typically specifies codecs/formats and RTP/transport behavior rather than 5GS QoS targets; the document should clarify which items are expected from SA4 vs SA1/SA2 (service requirements/QoS) vs RAN2 (radio handling). [Technical] The “AI codec vs non-AI codec” categorization (Type 1/2/3) is introduced without defining what constitutes an “AI codec” in 3GPP terms or how it maps to TR 26.847/26.927 scope; this ambiguity undermines the proposed coordination and could lead to inconsistent terminology across groups. [Technical] “Error tolerance determination” contains conflicting directions (SA4 to define vs RAN2 to proactively determine based on task/source data), but the document does not propose a resolution path (e.g., normative parameters, profiles, or measurement methodology), leaving a key design input unresolved. [Editorial] The document reads as a narrative summary but lacks traceability: many “strong consensus/nearly universal” statements are not backed by explicit references to which contributions support them, making it difficult for reviewers to validate the characterization. [Editorial] Several terms are used inconsistently or without definition (e.g., “token-based communication,” “AI-native communication,” “service awareness at L2,” “context-aware traffic flow”), and the document would benefit from a short terminology section to prevent misinterpretation across RAN2/SA4. [Editorial] The “Specific Questions to SA4” list is long and partially overlapping; consolidating into a smaller set of well-scoped questions (with clear expected output: definition, traffic model, or timeline) would improve the likelihood of actionable SA4 feedback. [Editorial] The section “Divergent Views and Open Issues” mentions scope items like “suggests RAN1 scope” and “include RedCap,” but does not state the implication for RAN2 next steps (e.g., whether to forward to RAN1/SA1 or exclude), leaving the reader without a clear decision-oriented outcome. <ol> <li> <p><strong>[Technical]</strong> The document repeatedly assumes “token” semantics (importance, dependency, token-to-PDU mapping, visibility to RAN) are relevant to RAN handling, but it does not justify how tokens would be observable given end-to-end encryption and application-layer framing; without a concrete exposure mechanism (e.g., explicit QoS marking/metadata), most token-level proposals are not actionable in 3GPP RAN.</p> </li> <li> <p><strong>[Technical]</strong> Several items imply new RAN behavior based on “importance differentiation” and “error-tolerant token transmission,” but there is no linkage to existing 5QI/QFI/QoS Flow constructs or a proposal for how importance maps to standardized QoS parameters (PDB, PER, priority level, MDBV, etc.), risking duplication or conflict with the 5GS QoS model.</p> </li> <li> <p><strong>[Technical]</strong> The recommendation to reuse XR “PDU Set” mechanisms for AI traffic is underspecified: it does not state which layer/function (PDCP/RLC/MAC) would bind/annotate sets, nor how this interacts with segmentation/retransmissions, making the feasibility of “PDU Set binding/annotation for AI traffic” unclear.</p> </li> <li> <p><strong>[Technical]</strong> The Rel-20 vs 6G split is asserted (“Rel-20 uplink/non-real-time; 6G real-time bidirectional”) without evidence from the summarized inputs that downlink real-time AI is out of Rel-20 scope; this may prematurely constrain RAN2 study/work item scope and misalign with SA4/SA1 service requirements.</p> </li> <li> <p><strong>[Technical]</strong> “Small packet sizes” and “uplink-heavy” are presented as near-universal characteristics, but the document also includes training/federated learning “bulk, synchronized uploads” and split inference intermediate data, which can be large and periodic; the summary should explicitly separate these regimes to avoid misleading one-size-fits-all conclusions.</p> </li> <li> <p><strong>[Technical]</strong> The text suggests “small packet transmission in RRC inactive” as an uplink enhancement, but does not reference existing mechanisms (e.g., RRC Inactive data transmission solutions) or identify what gap remains for AI traffic, making it hard to assess whether new work is needed.</p> </li> <li> <p><strong>[Technical]</strong> “Multi-modal synchronization beyond MMSID for QoS control” is mentioned, but MMSID is not defined in this document and the required synchronization primitive (timestamping, cross-flow dependency signaling, common deadline) is not articulated; this risks creating vague requirements that SA4 cannot answer.</p> </li> <li> <p><strong>[Technical]</strong> The dependency list asks SA4 for “packet delay budget, PER tolerance, packet importance variability,” but SA4 typically specifies codecs/formats and RTP/transport behavior rather than 5GS QoS targets; the document should clarify which items are expected from SA4 vs SA1/SA2 (service requirements/QoS) vs RAN2 (radio handling).</p> </li> <li> <p><strong>[Technical]</strong> The “AI codec vs non-AI codec” categorization (Type 1/2/3) is introduced without defining what constitutes an “AI codec” in 3GPP terms or how it maps to TR 26.847/26.927 scope; this ambiguity undermines the proposed coordination and could lead to inconsistent terminology across groups.</p> </li> <li> <p><strong>[Technical]</strong> “Error tolerance determination” contains conflicting directions (SA4 to define vs RAN2 to proactively determine based on task/source data), but the document does not propose a resolution path (e.g., normative parameters, profiles, or measurement methodology), leaving a key design input unresolved.</p> </li> <li> <p><strong>[Editorial]</strong> The document reads as a narrative summary but lacks traceability: many “strong consensus/nearly universal” statements are not backed by explicit references to which contributions support them, making it difficult for reviewers to validate the characterization.</p> </li> <li> <p><strong>[Editorial]</strong> Several terms are used inconsistently or without definition (e.g., “token-based communication,” “AI-native communication,” “service awareness at L2,” “context-aware traffic flow”), and the document would benefit from a short terminology section to prevent misinterpretation across RAN2/SA4.</p> </li> <li> <p><strong>[Editorial]</strong> The “Specific Questions to SA4” list is long and partially overlapping; consolidating into a smaller set of well-scoped questions (with clear expected output: definition, traffic model, or timeline) would improve the likelihood of actionable SA4 feedback.</p> </li> <li> <p><strong>[Editorial]</strong> The section “Divergent Views and Open Issues” mentions scope items like “suggests RAN1 scope” and “include RedCap,” but does not state the implication for RAN2 next steps (e.g., whether to forward to RAN1/SA1 or exclude), leaving the reader without a clear decision-oriented outcome.</p> </li> </ol> You should sign in to be able to post reviews Sign In
S4-260317 PDF Summary Proposals Critical Review	[FS_6G_MED] Work Plan for Media Aspects for 6G System	Qualcomm Incorporated (Rapporteur)	No proposals available	You should sign in to be able to post reviews Sign In
S4-260320 PDF Summary Proposals Critical Review	use cases and observations	VODAFONE Group Plc	No proposals available	You should sign in to be able to post reviews Sign In
S4-260323 PDF Summary Proposals Critical Review	[FS_6G_MED] Summary of AI Traffic related Documents and Proposed Way Forward	Qualcomm Incorporated (Rapporteur)	No proposals available	You should sign in to be able to post reviews Sign In
S4-260332 PDF Summary Proposals Critical Review	[FS_6G_MED] Preliminaries: assumptions and requirements	Qualcomm Korea	No proposals available	You should sign in to be able to post reviews Sign In
S4-260341 PDF Summary Proposals Critical Review	[FS_6G_MED] Considerations on Work Topic 4: Ubiquitous access	Qualcomm Korea	No proposals available	You should sign in to be able to post reviews Sign In
S4-260343 PDF Summary Proposals Critical Review	[FS_6G_MED] Considerations on Work Topic 1: Media Delivery Architecture	Qualcomm Korea	No proposals available	You should sign in to be able to post reviews Sign In
S4-260356 PDF Summary Proposals Critical Review	[FS_6G_MED] Some considerations on ways of working	Qualcomm Incorporated (Rapporteur)	No proposals available	You should sign in to be able to post reviews Sign In
S4-260357 PDF Summary Proposals Critical Review	[FS_6G_MED] pCR on network emulator and AI test bed	Qualcomm Atheros, Inc.	No proposals available	You should sign in to be able to post reviews Sign In
S4-260362 PDF Summary Proposals Critical Review	[FS_6G_MED] Summary of AI Traffic related Documents and Proposed Way Forward	Qualcomm Incorporated (Rapporteur)	No proposals available	You should sign in to be able to post reviews Sign In

Total TDocs: 34 | PDFs: 33 | AI Summaries: 25 | AI Proposals: 25

Comments saved: 17 / 34

Review: 11.1

Table Views Available

Extracted Proposals

Extracted Proposals

Previous Reviews:

Previous Reviews:

Extracted Proposals

Previous Reviews:

Extracted Proposals

Previous Reviews:

Previous Reviews:

Previous Reviews:

Extracted Proposals

Previous Reviews:

Previous Reviews:

Previous Reviews:

Extracted Proposals

Previous Reviews:

Extracted Proposals

Previous Reviews:

Proposals

Extracted Proposals

Previous Reviews:

Proposals

Previous Reviews:

Proposals from Document S4-260234

Previous Reviews:

Previous Reviews:

Extracted Proposals

Previous Reviews:

Extracted Proposals

Previous Reviews: