DaCAS-2: Test methodologies and requirements v0.6
This document defines test methodologies and requirements for Device-Assisted Capture Audio Systems (DaCAS) as part of the DaCAS Work Item. It is structured around four main objectives: defining minimum performance requirements for raw microphone signals, evaluating immersive audio capture example solutions, verifying/revising requirements based on example solution performance, and potential alignment with TS 26.260 and 26.261.
The document proposes requirements and recommendations for raw microphone signals (Table 1), covering:
Editor's Note: Decision needed on normative vs. informative requirements.
Compensation is considered optional for example solution development. If proponents provide compensated signals, they shall provide:
- Compensation filter specifications with relevant data
- Instructions on filter application to raw microphone signals
Table 2 defines requirements for compensated signals:
- Compensated Frequency Response: Must be within required/recommended masks (Tables 3-4) when applied to signals used for filter design
- Phase Properties: Should compensate sound source direction-independent phase differences
- Masks defined for frequencies 100 Hz to 16 kHz with tighter tolerances in recommended vs. required masks
Integrated from S4aA250054, this method includes:
Recording Environment Requirements:
- Quiet room with low reverberation
- 7.1.4 surround loudspeaker layout (or known layout with front/rear/side/height differentiation)
- Diffuse uncorrelated pink noise signals (omitting subwoofer)
- Device positioned landscape orientation, camera toward center speaker
- 10 seconds noise recordings + 15 seconds silence for noise floor estimation
Processing Steps:
1. Gain Matching: Derive broadband relative gain G(ch) per channel to align outlier microphones
2. Equalization Estimation:
- Model theoretical impulse responses from device geometry
- Create simulated device microphone signals
- Compare simulated vs. real recordings to estimate port resonance equalization
- Calculate: port(ch,f) = sim(ch,f) – dev(ch,f)
- Convert to banded spectrum and model resonances parametrically
3. Noise Floor Estimation: Generate banded per-channel noise floor estimate NF(ch,b) from silence recording
Compensation Processing (DFT domain):
- Convert input DFT to banded spectrum spec(ch,b)
- Calculate noise floor compensation gains: g(ch,b) = (spec(ch,b) – NF(ch,b))/(spec(ch,b) + eps)
- Apply combined gain: G(ch) * EQ(ch,f) * g(ch,f)
Editor's Note: Clarification needed on linear vs. log scale for applied gain and method status.
Integrated from S4aA260008, based on Integrated Microphone Pressure frequency response measurement:
Integrated microphone pressure frequency response calculated by:
Equation (1): Dividing measured integrated microphone output signal response by probe signal output response at reference point at sound port inlet, with probe microphone calibration
For M microphones, compensated output defined as convolution of raw signal with equalization filter. Target equalization filter compensates integrated microphone response within target frequency response mask to correspond to delayed pressure signal at sound inlet.
Equation (2): Target frequency response in frequency domain within target mask range
Steps:
1. Perform IMPro measurements for all device microphones
2. Prepare UE software for raw microphone recording
3. Setup loudspeaker and device (0.5-2m distance)
4. Prepare sine sweep stimulus (~30dB above background noise)
5. Calibrate probe microphone(s)
6. Perform IMPro measurement (pressure at sound inlet + DUT recording)
7. Time-align signals
8. Calculate integrated microphone pressure frequency response per microphone
9. Design linear equalization filters to align responses within masks
10. Implement equalization filters in UE software
11. Process raw signals with equalization filters
12. Verify compensated signals satisfy frequency response mask requirements
Integrated from S4aA260008, the evaluation procedure includes:
Editor's Note: Clarification needed on cross-evaluation scope, documentation in specification, and minimum dataset.
Evaluation reports shall include:
- Target device(s) used
- Description of example solution input signals
- Details on output signals including IVAS input format(s)
- Example solution output signals (provided)
- Evaluation considering IVAS input format characteristics
- Evaluation results and observations
For self-evaluation, additional tests for realistic scenarios not covered in TS 26.260 may be included with full documentation and recording availability.
Single Source Scenario (Table 1):
- Sound Source: High-quality loudspeaker compliant with TS 26.260 clause 4.0.2
- Source Signal: British English single talk (ITU-T P.501), Male/Female, 20Hz-20kHz, 35.4s, -27 dB RMS, 48 kHz, 16-bit
- Calibration: 75 dB SPL playback, equalized spectrum within ±1 dB (100-200 Hz) and ±0.5 dB (200 Hz-20 kHz)
- Acoustic Environment: Anechoic chamber OR acoustically treated room (ETSI TS 103 224 or ITU-T BS.1116 compliant)
- Positioning:
- Hand-held/Headset: 1-1.5m distance, elevation 0°
- Table-mounted: per TS 26.260 clause 5.4.2.5, elevation 26.6°
- Azimuth angles: 0°, ±30°, ±60°, ±90°
Multi-Source Scenario for ISM Evaluation (Tables 2):
Recording procedure:
1. Record sound sources individually (reference signals)
2. Sum individual recordings to obtain final input signals
Scenario X-1 (Table-mounted):
- UE lying flat on table, screen up
- Source distance: 0.5-1m (equal for both sources)
- Source height: 0.4m relative to UE
- Azimuth angle combinations: [-90°, 90°], [-110°, 70°], [-110°, 90°]
- Overlap pattern: source 1 only (25%) → source 2 only (25%) → both sources (50%)
- Applicable only for smartphone-type devices
Scenario X-2 (Hand-held):
- UE in hand-held landscape orientation, screen toward sources
- Source distance: 0.3-0.5m (equal for both sources)
- Source height: 0m relative to UE
- Azimuth angle combinations: [-30°, 30°], [-45°, 45°], [-30°, 45°]
- Same overlap pattern as X-1
- Applicable only for smartphone-type devices
Delay:
- Assess algorithmic delay only (input to example solution → output signal)
- Mitigates testing inaccuracies and acoustic path impact
- Dependencies on platform where example solution runs recognized
Loudness:
- Use recordings from single source scenario (azimuth=0°, elevation=0°)
- Process with example solution
- Analyze according to TS 26.260 clause 5.6.2
- Editor's Note: ITU-R BS.1770/P.700 via binaural rendering under consideration (pending ATIAS discussion)
Frequency Response:
For Stereo, SBA, MASA capture:
- Use recordings from single source scenario (azimuth=0°, elevation=0°)
- Process with example solution
- Analyze according to TS 26.260 clause 5.6.3
For ISM capture:
- Use recordings from scenarios X-1 and X-2
- Process with example solution
- For each object, calculate frequency response per TS 26.260 clause 5.6.3.2 using corresponding individual sound source recording as reference
Directional Information:
- Based on TS 26.260 clause 5.6.4 for Stereo, SBA, MASA formats
- Assessment directly on example solution output (excluding transmission assumptions)
- Use recordings from single source scenario for all defined sound source directions
- Process with example solution
- Compute directional measurement and metric per TS 26.260 clause 5.6.4
- Editor's Note: For other formats, intermediate rendering to supported format via IVAS reference renderer could be considered
General:
- Programming language: Python
- Available at: forge.3gpp.org/rep/sa4/audio/dacas
- Editor's Notes:
- Licensing, requirements, environment to be added
- Missing components (database reading, loudness test rendering, final report generation, reference signal handling, format support) to be added after DaCAS-2 details finalized
- Updated version to be uploaded to Audio subgroup repo
Core Functions:
- read_wav_file: Read WAV files (16-, 24-, or 32-bit depth); PCM support may be added
- estimate_delay_whole: Calculate delay between channels across whole signal (based on TS 26.260 Annex C)
- p56_active_level: Estimate speech active level (ITU-T P.56)
- compute_panorama: Estimate stereo panorama (TS 26.260 methodology)
- frequency_response: Compute 1/12-octave bandwidth spectrum (ISO-3 R40, 100 Hz-12 kHz)
- p79_slr: Compute Send Loudness Rating (TS 26.260)
Editor's Notes:
- Format support to be added
- IVAS reference renderer to be added
- Decision needed on ITU-R BS.1770/P.700 via binaural rendering
Processing Approach:
- Example solutions process recording database into IVAS input format files with optional metadata
- Scripts read entire audio signal, convert to floating-point, perform offline evaluations
- Current version includes stereo directional information analysis support
Section placeholder - content TBD for:
- Test conditions
- Recording setups and scenarios
- Recording database
- Test methods
Placeholders for requirements on:
- Delay
- Loudness
- Frequency response
- Directional information
Content TBD