AUC between different size curves

3 visualizzazioni (ultimi 30 giorni)
Suzuki
Suzuki il 7 Set 2021
Modificato: Suzuki il 11 Set 2021
How can I calculate the area between 2 curves of unequal data size, as in the photo?

Risposta accettata

Julius Muschaweck
Julius Muschaweck il 7 Set 2021
I would use interp1 to interpolate both functions so they have the same high resolution x support. You can easily replace the 0.01 step by a 1e-6 step size in the code below.
Look for interp1 options, to control the interpolation algorithm (you might prefer 'linear' to avoid overshoots), and to control extrapolation (in your probability distribution case, you might want to set the value for extrapolation to 1 if one of your distributions has shorter x range).
Then use trapz to compute the signed integral of the difference or unsigned area between the curves. See code.
x1 = 0:0.2:1.4;
y1 = x1.^2;
x2 = 0:0.14:1.4;
y2 = sqrt(x2);
figure();
hold on;
plot(x1,y1,'-x');
plot(x2,y2,'-x');
xq = 0:0.01:1.4;
y1q = interp1(x1,y1,xq);
y2q = interp1(x2,y2,xq);
my_signed_integral = trapz(xq,y2q-y1q)
my_signed_integral = 0.1701
my_unsigned_area = trapz(xq,abs(y2q-y1q))
my_unsigned_area = 0.4631
test_signed = trapz(xq,sqrt(xq) - xq.^2)
test_signed = 0.1894
test_unsigned= trapz(xq,abs(sqrt(xq) - xq.^2))
test_unsigned = 0.4768
  11 Commenti
Julius Muschaweck
Julius Muschaweck il 11 Set 2021
The trouble in your example is that you extrapolate y1i linearly to the right, which makes the extrapolated values grow to almost 4. This is not what you want. You have cumulative distributions, which are inherently zero on the left, all the way to minus infinity, and one to the right, all the way to plus infinity. So you should extrapolate to the right with constant values of 1, and to the left with constant values of zero.
You run into this problem no matter if you use my initial idea or Walter's actually superior suggestion of using union().
Like this:
clear;
load("bins.mat")
load("cdist.mat")
load("Zbins.mat")
load("Zcdist.mat")
%%
% I would try to control the step size in your xq array
dx = 0.01; % some small step size to decrease discretization error
xmin = min([Zbins';bins']);
xmax = max([Zbins';bins']);
nsteps = ceil((xmax-xmin) / dx) + 1;
xq = linspace(xmin, xmax, nsteps);
% but now that we're at it, let's try also what Walter Roberson suggested:
xq = union(Zbins, bins, "sorted");
% xq = linspace(min([Zbins';bins']), max([Zbins';bins']))'; % New %x Vector For Interpolation
% i check max and min in both and get the max and min among the 2 sets, so I thought this covers the
% whole data range
% extrapolation does a bad job for you here. Look at the values in y1i: They go up to almost 3.9 !
%y1i = interp1(Zbins', Zcdist', xq, 'linear','extrap'); % Interpolate To %xqT
%y2i = interp1(bins', cdist', xq, 'linear','extrap'); % Interpolate To %xq%
%my_unsigned_area = trapz(xq,abs(y2i-y1i))
% without 'extrap', interp1 "extrapolates" to NaN (not a number)
y1i = interp1(Zbins', Zcdist', xq, 'linear'); % Interpolate To %xqT
y2i = interp1(bins', cdist', xq, 'linear'); % Interpolate To %xq%
%now set "outliers" on the left to 0, and on the right to 1
% logical indexing is the non-for-loop way to do this:
idx1_left = xq < Zbins(1); % assuming Zbins and bins are strictly ascending
y1i(idx1_left) = 0;
idx1_right = xq > Zbins(end);
y1i(idx1_right) = 1;
idx2_left = xq < bins(1); % assuming Zbins and bins are strictly ascending
y2i(idx2_left) = 0;
idx2_right = xq > bins(end);
y2i(idx2_right) = 1;
% check if we caught all outliers
assert(~any(isnan(y1i)));
assert(~any(isnan(y2i)));
my_unsigned_area = trapz(xq,abs(y2i-y1i))
my_unsigned_area = 17.8458
Suzuki
Suzuki il 11 Set 2021
Modificato: Suzuki il 11 Set 2021
thank you very much for the kind and detailed explanation.
@Julius Muschaweck I really appreciate your help, I would have not done it without your generous help.

Accedi per commentare.

Più risposte (0)

Tag

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by