MATLAB Answers

Blue
0

How to replace values above a given percentile by nans

Asked by Blue
on 14 Aug 2019
Latest activity Commented on by Blue
on 16 Aug 2019
Hi,
I am trying to replace values above the 99th percentile in a table by nans by using Star Strider excellent little function (https://www.mathworks.com/matlabcentral/answers/127944-how-to-pick-the-j-th-percentile-of-a-vector). An example of my code looks like so:
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
The problem of course is that there are nan values scattered across multiple variables in the table and I therefore receive the following error message: Warning: Columns of data containing NaN values have been ignored during interpolation.
Does anyone has an idea as to how I could replace values above the 99th percentile in a table by nans in a table where there are already multiple nans ? Please note that I do not have the Statistic toolbox.

  0 Comments

Sign in to comment.

2 Answers

Answer by Adam Danz
on 15 Aug 2019
Edited by Adam Danz
on 15 Aug 2019
 Accepted Answer

Here is an alternative function that ignores NaN values. It's not a nice one-liner like Star Striders but it does produce the same output as the prctile function in the stats toolbox.
function pctl = percentile(v,p)
% v is a vector of data; example v = [8 9 nan 13 15 11 nan 3 5 7];
% p is a percentile; example p = 75;
% pctl is the exact p_th percentile of v.
vsort = reshape(sort(v(~isnan(v))),1,[]); %row vector
pd = p/100 * numel(vsort);
idx = floor(pd + 0.5) + [0,1];
md = pd - idx(1);
idx(idx<1) = 1;
idx(idx>numel(vsort)) = numel(vsort);
pctl = sum(vsort(idx) .* (0.5+[-md,md]));
end % <-- only needed if this function is within a script
Comparison
v = [8 9 nan 13 15 11 nan 3 5 7];
p = 82;
percentile(v,p) % = 13.120000 this function
prctile(v,p) % = 13.120000 stat's toolbox
pct(v(~isnan(v)),p) % = 13.115995 the one-liner approximation
To apply it to your data,
t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
t.Var1(t.Var1 > percentile(t.Var1, 99)) = nan;
t.Var2(t.Var2 > percentile(t.Var2, 99)) = nan;

  1 Comment

Thank you Adam

Sign in to comment.


Answer by per isakson
on 15 Aug 2019
Edited by per isakson
on 15 Aug 2019

Ignore the NaNs explicitely when calculating the percentile. Try
%%
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
function z = pct(v,p)
v( isnan(v) ) = [];
z = interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))' ...
, sort(v), p*0.01, 'spline');
end

  1 Comment

If you're going to use the one-liner appoximation, you can keep the function handle and explicitly ignore the NaNs from within the call to the function.
t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1(~isnan(t.Var1)), 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2(~isnan(t.Var2)), 99)) = nan;
% {_____here____}

Sign in to comment.