Main Content

acousticLoudness

Perceived loudness of acoustic signal

Description

example

loudness = acousticLoudness(audioIn,fs) returns loudness in sones according to ISO 532-1 (Zwicker).

example

loudness = acousticLoudness(audioIn,fs,calibrationFactor) specifies a nondefault microphone calibration factor used to compute loudness.

example

loudness = acousticLoudness(SPLIn) computes loudness using one-third-octave-band sound pressure levels (SPL).

example

loudness = acousticLoudness(___,Name,Value) specifies options using one or more Name,Value pair arguments.

Example: loudness = acousticLoudness(audioIn,fs,'Method','ISO 532-2') returns loudness according to ISO 532-2 (Moore-Glasberg).

example

[loudness,specificLoudness] = acousticLoudness(___) also returns the specific loudness.

example

[loudness,specificLoudness,perc] = acousticLoudness(___,'TimeVarying',true) also returns percentile loudness.

[loudness,specificLoudness,perc] = acousticLoudness(___,'TimeVarying',true,'Percentiles',p) specifies nondefault percentiles to return.

example

acousticLoudness(___) with no output arguments plots specific loudness and displays loudness textually. If TimeVarying is true, both loudness and specific loudness are plotted, with the latter in 3-D.

Examples

collapse all

Measure the ISO 532-1 stationary free-field loudness. Assume the recording level is calibrated such that a 1 kHz tone registers as 100 dB on a SPL meter.

[audioIn,fs] = audioread('WashingMachine-16-44p1-stereo-10secs.wav');

loudness = acousticLoudness(audioIn,fs)
loudness = 1×2

   28.2688   27.7643

Create two stationary signals with equivalent power: a pink noise signal and a white noise signal.

fs = 48e3;
dur = 5;
pnoise = 2*pinknoise(dur*fs);
wnoise = rand(dur*fs,1) - 0.5;
wnoise = wnoise*sqrt(var(pnoise)/var(wnoise));

Call acousticLoudness using the default ISO 532-1 (Zwicker) method and no output arguments to plot the loudness of the pink noise. Call acousticLoudness again, this time with output arguments, to get the specific loudness.

figure
acousticLoudness(pnoise,fs)

[~,pSpecificLoudness] = acousticLoudness(pnoise,fs);

Plot the loudness for the white noise signal and then get the specific loudness values.

figure
acousticLoudness(wnoise,fs)

[~,wSpecificLoudness] = acousticLoudness(wnoise,fs);

Call the acousticSharpness function to compare the sharpness of the pink noise and white noise.

pSharpness = acousticSharpness(pSpecificLoudness);
wSharpness = acousticSharpness(wSpecificLoudness);
fprintf('Sharpness of pink noise = %0.2f acum\n',pSharpness)
Sharpness of pink noise = 2.00 acum
fprintf('Sharpness of white noise = %0.2f acum\n',wSharpness)
Sharpness of white noise = 2.62 acum

Read in an audio file.

[audioIn,fs] = audioread('JetAirplane-16-11p025-mono-16secs.wav');

Plot the time-varying acoustic loudness in accordance with ISO 532-1 and get the percentiles. Listen to the audio signal.

acousticLoudness(audioIn,fs,'SoundField','diffuse','TimeVarying',true)

sound(audioIn,fs)

Call acousticLoudness again with the same inputs and get the percentiles. Print the Nmax and N5 percentiles. The Nmax percentile is the maximum loudness reported. The N5 percentile is the loudness below which is 95% of the reported loudness.

[~,~,perc] = acousticLoudness(audioIn,fs,'SoundField','diffuse','TimeVarying',true);
fprintf('Max loudness = %0.2f sones\n',perc(1))
Max loudness = 89.48 sones
fprintf('N5 loudness = %0.2f sones\n',perc(2))
N5 loudness = 81.77 sones

Read in an audio file.

[audioIn,fs] = audioread('Turbine-16-44p1-mono-22secs.wav');

Call acousticLoudness with no output arguments to plot the specific loudness. Assume a calibration factor of 0.15 and a reference pressure of 21 micropascals. To determine the calibration factor specific to your audio system, use the calibrateMicrophone function.

calibrationFactor = 0.15;
refPressure = 21e-6;
acousticLoudness(audioIn,fs,calibrationFactor,'PressureReference',refPressure)

acousticLoudness enables you to specify an intermediate representation, sound pressure levels, instead of a time-domain input. This enables you to reuse intermediate SPL calculations. Another advantage is that if your physical SPL meter does not report loudness in accordance to ISO 532-1 or ISO 531-2, you can use the reported 1/3-octave SPLs to calculate standard-compliant loudness.

To calculate sound pressure levels from an audio signal, first create an splMeter object. Call the splMeter object with the audio input.

spl = splMeter("SampleRate",fs,"Bandwidth","1/3 octave", ...
    "CalibrationFactor",calibrationFactor,"PressureReference",refPressure, ...
    "FrequencyWeighting","Z-weighting","OctaveFilterOrder",6);

splMeasurement = spl(audioIn);

Compute the mean SPL level, skipping the first 0.2 seconds. Only keep the bands from 25 Hz to 12.5 kHz (the first 28 bands).

SPLIn = mean(splMeasurement(ceil(0.2*fs):end,1:28));

Using the SPL input, call acousticLoudness with no output arguments to plot the specific loudness.

acousticLoudness(SPLIn)

Set up an experiment as indicated by the diagram.

Create an audioDeviceReader object to read from the microphone and an audioDeviceWriter object to write to your speaker.

fs = 48e3;
deviceReader = audioDeviceReader(fs);
deviceWriter = audioDeviceWriter(fs);

Create an audioOscillator object to generate a 1 kHz sinusoid.

osc = audioOscillator("sine",1e3,"SampleRate",fs);

Create a dsp.AsyncBuffer object to buffer data acquired from the microphone.

dur = 5;
buff = dsp.AsyncBuffer(dur*fs);

For five seconds, play the sinusoid through your speaker and record using your microphone. While the audio streams, note the loudness as reported by your SPL meter. Once complete, read the contents of the buffer object.

numFrames = dur*(fs/osc.SamplesPerFrame);
for ii = 1:numFrames
    audioOut = osc();
    deviceWriter(audioOut);
    
    audioIn = deviceReader();
    write(buff,audioIn);
end

SPLreading = 60.4;

micRecording = read(buff);

To compute the calibration factor for the microphone, use the calibrateMicrophone function.

calibrationFactor = calibrateMicrophone(micRecording,deviceReader.SampleRate,SPLreading);

Call acousticLoudness with the microphone recording, sample rate, and calibration factor. The loudness reported from acousticLoudness is the true acoustic loudness measurement as specified by 532-1.

loudness = acousticLoudness(micRecording,deviceReader.SampleRate,calibrationFactor)
loudness = 14.7902

You can now use the calibration factor you determined to measure the loudness of any sound that is acquired through the same microphone recording chain.

Read in an audio signal.

[audioIn,fs] = audioread('TrainWhistle-16-44p1-mono-9secs.wav');

ISO 532-1

Determine the time-varying specific loudness according to the default method (ISO 532-1).

[~,specificLoudness] = acousticLoudness(audioIn,fs,'TimeVarying',true);

ISO 532-1 reports specific loudness over Bark, where the Bark bins are 0.1:0.1:24. Convert the Bark bins to Hz and then plot the specific loudness over Hz across time.

barkBins = 0.1:0.1:24;
hzBins = bark2hz(barkBins);

t = 0:2e-3:2e-3*(size(specificLoudness,1)-1);
surf(t,hzBins,sum(specificLoudness,3).','EdgeColor','interp')
set(gca,'YScale','log')
view([0 90])
axis tight
xlabel('Time (s)')
ylabel('Frequency (Hz)')
colorbar
title('Specific Loudness (sones/Bark)')

ISO 532-2

Determine the stationary specific loudness according to the Moore-Glasberg method (ISO 532-2).

[~,specificLoudness] = acousticLoudness(audioIn,fs,'Method','ISO 532-2');

ISO 532-2 reports specific loudness over the ERB scale, where the ERB bins are 1.8:0.1:38.9. The unit of the ERB scale is sometimes referred to as Cam. Convert the ERB bins to Hz and then plot the specific loudness.

erbBins = 1.8:0.1:38.9;
hzBins = erb2hz(erbBins);

semilogx(hzBins,specificLoudness)
xlabel('Frequency (Hz)')
ylabel('Loudness (sones)')
title('Specific Loudness')
grid on

Read in an audio file.

[x,fs] = audioread('WashingMachine-16-44p1-stereo-10secs.wav');

ISO 532-2 enables you to specify a custom earphone response when calculating loudness. Create a 30-by-2 matrix where the first column is the frequency and the second column is the earphone's deviation from a flat response.

tdh = [   0,    80,   100,  200,  500,  574,  660,  758,  871, 1000, 1149,  1320,  1516,  1741,  2000, ...
       2297,  2639,  3031, 3482, 4000, 4500, 5000, 5743, 6598, 7579, 8706, 10000, 12000, 16000, 20000; ...
        -50, -15.3, -13.8, -8.1, -0.5,  0.4,  0.8,  0.9,  0.5,  0.1, -0.8,  -1.5,  -2.3,  -3.2,  -3.9, ...
       -4.2,  -4.3,  -4.3, -3.9, -3.2, -2.3, -1.1, -0.3,   -2, -5.4,   -9, -12.1, -15.2,   -30,   -50 ].';

Calculate the loudness using ISO 532-2. Specify SoundField as earphones and the earphone response as the matrix you just created.

acousticLoudness(x,fs,'Method','ISO 532-2','SoundField','earphones','EarphoneResponse',tdh)

Create a dsp.AudioFileReader object to read in an audio signal frame-by-frame. Specify a frame duration of 50 ms. This will be the frame duration over which you calculate stationary loudness.

fileReader = dsp.AudioFileReader('Engine-16-44p1-stereo-20sec.wav');

frameDur = 0.05;
fileReader.SamplesPerFrame = round(fileReader.SampleRate*frameDur);

Create an audioDeviceWriter object to write audio to your default output device.

deviceWriter = audioDeviceWriter('SampleRate',fileReader.SampleRate);

Create a timescope object to display stationary loudness over time.

scope = timescope( ...
    'SampleRate',1/frameDur, ...
    'YLabel','Loudness (sones)', ...
    'ShowGrid',true, ...
    'PlotType','Stairs', ...
    'TimeSpanSource','property', ...
    'TimeSpan',20, ...
    'AxesScaling','Auto', ...
    'ShowLegend',true);

In a loop:

  1. Read a frame from the audio file.

  2. Calculate the stationary loudness of that frame.

  3. Play the sound through your output device.

  4. Write the loudness to the scope.

while ~isDone(fileReader)
    audioIn = fileReader();
    loudness = acousticLoudness(audioIn,fileReader.SampleRate);
    deviceWriter(audioIn);
    scope(loudness)
end
release(fileReader)
release(deviceWriter)
release(scope)

Input Arguments

collapse all

Audio input, specified as a column vector (mono) or matrix with two columns (stereo).

Data Types: single | double

Sample rate in Hz, specified as a positive scalar. The recommended sample rate for new recordings is 48 kHz.

Data Types: single | double

Microphone calibration factor, specified as a positive scalar. The default calibration factor corresponds to a full-scale 1 kHz sine wave with a sound pressure level of 100 dB (SPL). To compute the calibration factor specific to your system, use the calibrateMicrophone function.

Data Types: single | double

Sound pressure level (SPL) in dB, specified as a 1-by-28-by-C array or a 1-by-29-by-C array, depending on the Method:

  • If Method is set to 'ISO 532-1', specify SPLIn as a 1-by-28-by-C array, where 28 corresponds to one-third-octave bands between 25 Hz and 12.5 kHz, and C is the number of channels.

  • If Method is set to 'ISO 532-2', specify SPLIn as a 1-by-29-by-C array, where 29 corresponds to one-third-octave bands between 25 Hz and 16 kHz, and C is the number of channels.

For both methods, the SPL input should be measured with a flat frequency weighting (Z-weighting).

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: acousticLoudness(audioIn,fs,'Method','ISO 532-2')

Loudness calculation method, specified as 'ISO 532-1' [1] or 'ISO 532-2' [2].

Note

Only in the ISO 532-1 method, output is reported for each channel independently, and stationary signals are processed after discarding up to the first 0.2 seconds of the signal at the output of the internal 1/3-octave filters.

Data Types: char | string

Input is time-varying, specified as true or false.

Dependencies

To set TimeVarying to true, you must set Method to 'ISO 532-1'.

Data Types: logical

Sound field of audio recording, specified as a character vector or scalar string. The possible values for SoundField depend on the Method:

  • 'ISO 532-1' –– 'free', 'diffuse'

  • 'ISO 532-2' –– 'free', 'diffuse', 'eardrum', 'earphones'

Data Types: char | string

Earphone response, specified as an M-by-2 matrix containing M frequency-amplitude pairs that describe the earphone's deviations from a flat response. The form is as specified in an ISO 11904-1:2002 earphone correction file. Specify the frequency in increasing order in Hz. Specify the amplitude deviation in decibels. Intermediate values are computed by linear interpolation. Values out of the given range are set to the nearest frequency-amplitude pair. The default value corresponds to a flat response.

Dependencies

To specify EarphoneResponse, you must set SoundField to 'earphones'.

Data Types: single | double

Reference pressure for dB calculation in pascals, specified as a positive scalar. The default value, 20 micropascals, is the common value for air.

Dependencies

PressureReference is only used for time-domain input signals.

Data Types: single | double

Percentiles at which to calculate percentile loudness, specified as a vector with values in the range [0, 100]. The defaults, 0 and 5, correspond to the Nmax and N5 percentiles, respectively [1].

Percentile loudness refers to the loudness that is reached or exceeded in X% of the measured time intervals, where X is the specified percentile.

Data Types: single | double

Output Arguments

collapse all

Loudness in sones, returned as a K-by-1 column vector or K-by-2 matrix of independent channels. If TimeVarying is set to false, K is equal to 1. If TimeVarying is set to true, then loudness is computed every 2 ms. If Method is set to 'ISO 532-2', then loudness is computed using a binaural model and always returned as a K-by-1 column vector.

Specific loudness, returned as a K-by-240-by-C array or a K-by-372-by-C array. The first dimension of specific loudness, K, matches the first dimension of loudness. The third dimension of specific loudness, C, matches the second dimension of loudness. The second dimension of specific loudness depends on the Method used to calculate loudness:

  • If Method is set to 'ISO 532-1', then specific loudness is reported in sones/Bark on a scale from 0.1 to 24, inclusive, in 0.1 increments.

  • If Method is set to 'ISO 532-2', then specific loudness is reported in sones/Cam on a scale from 1.8 to 38.9, inclusive, in 0.1 increments.

Percentile loudness in sones, returned as a p-by-1 vector or p-by-2 matrix. The number of rows, p, is equal to the number of Percentiles.

Percentile loudness refers to the loudness that is reached or exceeded in X% of the measured time intervals, where X is the specified percentile.

Dependencies

The percentiles output argument is valid only if TimeVarying is set to true. If TimeVarying is set to false, the perc output is empty.

Algorithms

collapse all

Loudness and loudness level are perceptual attributes of sound. Due to differences among people, measurements of loudness and loudness level should be considered statistical estimators. The ISO 532 series specifies procedures for estimating loudness and loudness level as perceived by persons with ontologically normal hearing under specific listening conditions.

ISO 532-1 and ISO 532-2 specify two different methods for calculating loudness, but leave it to the user to select the appropriate method for a given situation.

ISO 532-1:2017(E) – Zwicker Method

ISO 532-1:2017(E) describes methods for calculating acoustic loudness of stationary and time-varying signals.

Stationary Signals

This method is based on DIN 45631:1991. The algorithm differs from ISO 532:1975, method B, by specifying corrections for low frequencies.

Audio passes through the following stages: level calibration, conversion to 1/3-Octave SPL, low-frequency deemphasis and power summation, level correction and conversion to core loudness, conversion to Bark bins, frequency spreading correction, and finally integration over the specific loudness.

The diagram and the steps provide a high-level overview of the sequence of the method. For details, see [1].

  1. The time-domain signal level is adjusted according to the CalibrationFactor. The following steps of the algorithm assume a true known signal level.

  2. The signal is transformed to a 1/3 octave SPL representation using fractional octave band filtering. The filter bank consists of 28 filters between 25 Hz to 12.5 kHz. The output from this stage is in dB and normalized by the reference pressure.

  3. Low frequency 1/3 octave bands are de-emphasized according to a fixed weighting table. Some of the low-frequency bands are combined to form a total of 20 critical bands.

  4. The levels of the critical bands are corrected for filter bandwidth and the critical band level at the threshold of quiet, and then transformed to core loudness.

  5. Core loudness is mapped to Bark bins.

  6. Frequency spreading is computed using a table of level- and frequency-dependent slopes.

  7. Loudness is calculated as the integral of specific loudness, taking into account the frequency-spreading slopes.

Time-Varying Signals

This method is based on DIN 45631/A1:2010, and is designed to properly simulate the duration-dependent behavior of loudness perception for short impulses. The method for time-varying sounds is a generalization of the Zwicker approach to stationary signals. If the generalized version is applied to stationary sounds, it gives the same loudness values as the non-generalized form for stationary signals.

Audio passes through the following stages: level calibration, conversion to 1/3-Octave SPL, Filtering of SPL variation, low-frequency deemphasis and power summation, level correction and conversion to core loudness, modeling of nonlinear temporal decay, conversion to Bark bins, frequency spreading correction, temporal weighting, and finally integration over the specific loudness.

The diagram and the steps provide a high-level overview of the sequence of the method. For details, see [1].

  1. The time-domain signal level is adjusted according to the CalibrationFactor. The following steps of the algorithm assume a true known signal level.

  2. The signal is transformed to a 1/3 octave SPL representation using fractional octave band filtering. The filter bank consists of 28 filters between 25 Hz to 12.5 kHz. The output from this stage is in dB and normalized by the reference pressure.

  3. The SPL bands are smoothed along time according to band-dependent filters.

  4. Low frequency 1/3 octave bands are de-emphasized according to a fixed weighting table. Some of the low-frequency bands are combined to form a total of 20 critical bands.

  5. The levels of the critical bands are corrected for filter bandwidth and the critical band level at the threshold of quiet, and then transformed to core loudness.

  6. Nonlinear temporal decay is simulated using a diode-capacitor-resistor network. This models the steep perceptual drop after short signals when compared to long signals.

  7. Core loudness is mapped to Bark bins.

  8. Frequency spreading is computed using a table of level- and frequency-dependent slopes.

  9. Temporal weighting is applied to simulate the duration-dependence of loudness perception.

  10. Loudness is calculated as the integral of specific loudness, taking into account the frequency-spreading slopes.

ISO 532-2:2017(E) – Moore-Glasberg Method

ISO 532-2:2017(E) describes a binaural model for calculating acoustic loudness of stationary signals. The method in ISO 523-2 differs from those in ISO 532:1975: it improves the calculated loudness in the low frequency range and the binaural model allows for different sounds for each ear. ISO 532-2 provides a good match to the equal loudness level contours defined in ISO 226:2003, and the threshold of hearing defined in ISO 389-7:2005.

Audio passes through level calibration, and then the following stages are applied to each channel in the binaural model: a model of the tympanic membrane, a model of the middle ear, and then a model of the cochlea. Excitation is then mapped to specific loudness. Finally, the channels are combined using a binaural inhibition model, and the specific loudness is integrated to output loudness.

The diagram and the steps provide a high-level overview of the sequence of the method. For details, see [2].

  1. The time-domain signal level is adjusted according to the CalibrationFactor. The following steps of the algorithm assume a true known signal level.

  2. The signal is transformed to a spectral representation. The spectral representation is transformed according to fixed filters representing the transfer of sound through the tympanic membrane (eardrum). The spectrum is scaled according to the reference pressure.

  3. The signal is transformed using a model of the inner ear. Again, the transfer function is given by a fixed filter specified in the standard. The filter choice depends on the specified sound field.

  4. The signal is transformed from the sound spectrum to an excitation pattern at the basilar membrane. The transformation is accomplished using a series of rounded-exponential filters spread on the ERB scale.

  5. The excitation pattern is converted to specific loudness.

  6. The specific loudness is passed through a model of binary inhibition, where a signal at one ear inhibits the loudness evoked by a signal at the other ear. The output from this stage is the specific loudness in sones/ERB.

  7. The specific loudness is integrated over the ERB scale to give the loudness in sones.

References

[1] ISO 532-1:2017(E). "Acoustics – Methods for calculating loudness – Part 1: Zwicker method." International Organization for Standardization.

[2] ISO 532-2:2017(E). "Acoustics – Methods for calculating loudness – Part 2: Moore-Glasberg method. International Organization for Standardization.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Introduced in R2020a