Repeat for-loop from the beginning after recalculating values
Mostra commenti meno recenti
Ok, I am having a hard time figuring this out. I have a 6 X 6 matrix of values, in this case, pressure measurements in psi (P31.mat attached). I'm trying to apply rejection criteria for the outliers in this set of pressures. Each column of P31 is from a separate pressure transducer (6 measurements from each). I calculate the mean, standard deviation, and the number of elements (not including NaN) in each column using the first loop. In the second loop I calculate the deviation of each measurement (delta_31) from the mean (of the column).
In the third loop I use if-statements to determine a value for tau, according to the number of elements in each column. Then I multiply that tau value by the standard deviation of that column and compare each deviation value (delta_31). If delta_31(i,j) is greater than the calculated value for tau*stddev, then replace that delta value with a NaN, as well as the corresponding (i,j) value in P31. I then recalculate the stdev, mean, and number of elements. The problem is I'm trying to repeat the loop again to determine if there is another outlier in that column and I can't seem to get it. I thought that by stating i = 1, the loop should begin again, but it doesn't. Can anyone help? I apologize if this is confusing.
for k = 1:size(P31,2)
x_31(k) = mean(P31(:,k),'omitnan');
SDP_31(k) = std(P31(:,k),'omitnan');
n_31(k) = numel(P31(:,k)) - sum(isnan(P31(:,k)));
nu_31(k) = n_31(k) - 1;
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
% Conditional if-statements to determine the values for tau
if n_31(j) == 3
tau_31 = 1.150;
elseif n_31(j) == 4
tau_31 = 1.393;
elseif n_31(j) == 5
tau_31 = 1.572;
elseif n_31(j) ==6
tau_31 = 1.656;
elseif n_31(j) == 7
tau_31 = 1.711;
end
% Calculate tau*S to compare with the delta's
tauS_31(j) = tau_31*SDP_31(j);
% if-statement to replace outliers with NaN and recalculate
% everything. This actually works, but only for eliminating one
% outlier from the column.
if delta_31(i,j) > tauS_31(j)
delta_31(i,j) = NaN;
P31(i,j) = NaN;
x_31(j) = mean(P31(:,j),'omitnan');
SDP_31(j) = std(P31(:,j),'omitnan');
n_31(j) = numel(P31(:,j)) - sum(isnan(P31(:,j)));
for k = 1:size(P31,1)
delta_31(k,j) = abs(P31(k,j) - x_31(j));
end
i = 1; % this is supposed to start the i-loop again, but doesn't
else
i = i + 1;
end
end
end
3 Commenti
PATRICK WAYNE
il 6 Set 2018
dpb
il 6 Set 2018
"I thought that by stating i = 1, the loop should begin again..."
No, that is pretty-much a standard implementation in (most) all (modern) programming languages--the loop limit is stored before the loop begins and in Matlab any change to the loop index variable is overwritten by for for the next iteration from a local copy.
dpb
il 6 Set 2018
You could use one of those routines simply by saving the original index order to rearrange the order back to original when done.
The Thompson tau is one-at-a-time so it's sufficient on each iteration to only test the largest absolute difference; the ordering is some algorithm's way to look at the two end points each iteration rather than find max(abs(difference)) each pass.
Risposta accettata
Più risposte (2)
The first loop can be eliminated entirely--use the Matlab array syntax:
x_31=mean(P31,'omitnan');
SDP_31=std(P31,'omitnan');
n_31=sum(isfinite(P31));
nu_31=n_31-1;
I edited your code above to clean up the indenting...looks to me there's a missing end at some point; I didn't try to figure out where that might belong; the trivial assumption that may or may not be right is it's the last one for the big for...need to check on that.
Moving right along...
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
...
Oh. You've got nested loops on j...perhaps the missing end belongs here before the second j loop?
Like
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
...
maybe? If that's so then again use vector syntax--
delta_31(:,3:end)=abs(P31(:,3:end)-x_31);
Now you're down to the removal criteria, but what's the deal with only using the 3:end sensors instead of all six? I don't see what that's all about??? If you have outliers, seems like all sensors would be wanted to be checked.
I'm running out of time/energy here this evening but one last little detail to make code a little easier --
if n_31(j) == 3
tau_31 = 1.150;
elseif n_31(j) == 4
tau_31 = 1.393;
elseif n_31(j) == 5
tau_31 = 1.572;
elseif n_31(j) ==6
tau_31 = 1.656;
elseif n_31(j) == 7
tau_31 = 1.711;
end
% put this at very beginning of your script/function...
tau=[1.150, 1.393, 1.572, 1.656, 1.711]; % lookup table values
Then to use, simply write
tau_31(j)=tau(n_31(j)-2);
I'm sure most if not all of the remainder can also be vectorized with, probably, one loop on the outside to do the multiple comparisons.
Take a look at the way the preceding loops were reduced and give the rest a shot; I'll try to look in again in the morning.
1 Commento
dpb
il 6 Set 2018
For tau, go back to definition...
tcrit=@(n) tinv(1-0.05/2,n-2); % t distribution t(alpha/2,n-2)
fntau=@(n) tcrit(n)./sqrt(n)./sqrt(n-2+tcrit(n).*tcrit(n)).*(n-1);
Then when you need tau, just write
tau=fntau(n);
This has been vectorized so can pass a vector of N
Steven Lord
il 7 Set 2018
0 voti
I'm not familiar with that particular method. Is it close enough to one of the methods supported by the isoutlier function that you can use that function?
If it is not but is used often in a particular application area to detect outliers in data associated with that area, consider filing an enhancement request through Technical Support (using the Contact Us link in the upper-right corner of this page) describing that application area, giving more information and/or a reference for that algorithm, and asking for it to be added to isoutlier.
3 Commenti
This goes back into my "wayback" machine, Steven. I had to go dig through the "old black notebook of stuff" I collected over working career before there was such a thing as online way to go find things and we saved paper copies of papers! :)
Thompson, W.R., 1935, On a criterion for the rejection of observations and the distribution of the ratio of the deviation to the sample standard deviation, Ann. Math. Stat., 6:214-219.
Now I find the reference dissertation I found through Dr. Clark, utmost gentleman and mentor, who was staff Senior Mathematician at B&W NPGD back in the late 60s/early 70s is available via google search!!!
This is all quite dated by now, of course, and I've made no attempt to keep up with modern literature, but the basis for the Thompson test.
And, I forget now just what the conclusions were altho I do know and should have pointed out that repetitive application blindly is fraught with difficulty despite the cuteness of being able to write the algorithm. One issue Grubbs pointed out is that the probability of an arbitrary point being an outlier isn't the same as the probability that the largest observation (say) is an outlier with the respect to the remaining observations, the former of which is the Thompson test, not the latter.
ADDENDUM
Ain't science wunnerful!!! :)
Thompson(1935) full text pdf document of the original paper. I do not seem to have a record of the communications after.
The subsequent modification as used above and that is fairly widely spread on various university sites I'm not positive about the origin, my quick algebra on (4) p36 didn't quite duplicate the above although I won't claim I didn't blunder.
ADDENDUM 2
I didn't have a copy but I did find a scribbled note where I jotted down the Grubbs paper reference and managed to now find it online, too... Grubbs(1950).
While I think in general some additional tools might be incorporated in ML, it's not clear to me what is currently considered "state of the art" but I doubt Thompson is the last word on the subject (altho I wouldn't say it shouldn't be an available option).
I generally when trying to catch up on something haven't looked at in quite some time see what NIST currently says; I see they have the MAD (interesting historical use of that acronym exists :) ), as well as Grubbs (one outlier) and and extended Grubbs for multiple as well as another ESD for unknown number with "just" an expected upper limit.
PATRICK WAYNE
il 10 Set 2018
Modificato: PATRICK WAYNE
il 10 Set 2018
dpb
il 11 Set 2018
I never ran across that text; the similar subject-matter I used was Bevington, Data Reduction and Error Analysis for the Physical Sciences, but it didn't touch on the subject. I see there is a revised edition now, but it seems to be virtually identical in content with just some references to there now being such a thing as a personal computer and available software.
Categorie
Scopri di più su Uniform Distribution (Continuous) in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!