Azzera filtri
Azzera filtri

Do the boxplot stats without boxplot

2 visualizzazioni (ultimi 30 giorni)
Alex
Alex il 13 Ago 2011
Dear all, I have an enormous number of numbers and I want to plot the box plot
Actually I have 7 sets of 125.000.000 numbers (luckily I am running this in a system with huge ram). As boxplot with 7 inputs takes so much time I was thinking if there might be a way to calculate for every set all the statistics boxplot calculates.. and then feed them in a boxplot_type function to do the stupid plotting.
Is there something like that in matlab?
I would like to thank youf or your help
Best Regards Alex

Risposte (4)

Oleg Komarov
Oleg Komarov il 13 Ago 2011
Time boxplot:
tic
boxplot(...)
toc
against:
tic
prctile(...)
max(...)
min(...)
toc
and also against:
tic
sort()
toc
for one of your datasets, if there's significant gain for one of the alternatives then yes you could improve boxplotting...but I doubt it.
EDIT
Avoiding to call boxplot on the big database I create a fake boxplot and adjust it with the stats calculated from the real database:
% Suppose truedata is your dataset A
truedata = rand(1e6 + 123423,7);
sz = size(truedata);
% Create fakedata and boxplot it
fakedata = rand(10,sz(2));
h = boxplot(fakedata,'labels',10:10:10*sz(2));
% Now sort your truedata and calculate min,max,25,50,75 percentile
truedata = sort(truedata);
s.mins = truedata(1,:);
s.maxs = truedata(end,:);
xi = bsxfun(@plus,sz(1).*[0.25; 0.5; 0.75], sz(1) * (0:sz(2)-1));
x = [floor(xi(:)); ceil(xi(:))];
s.ptiles = reshape(interp1(x,truedata(x),xi(:)),3,sz(2));
% Readapt the fake boxplot:
% 1.Adjust upper whisker
set(h(1,:),{'Ydata'}, num2cell([s.ptiles(3,:); s.maxs].',2));
set(h(3,:),{'Ydata'}, num2cell(repmat(s.maxs.',1,2),2));
% 2. Adjust lower whisker
set(h(2,:),{'Ydata'}, num2cell([s.mins; s.ptiles(1,:);].',2));
set(h(4,:),{'Ydata'}, num2cell(repmat(s.mins.',1,2),2));
% 3. Adjust body and median
set(h(5,:),{'Ydata'}, num2cell(s.ptiles([1 3 3 1 1],:).',2));
set(h(6,:),{'Ydata'}, num2cell(repmat(s.ptiles(2,:).',1,2),2));
% 4. Delete outlier marking
delete(h(end,:))
  7 Commenti
Alex
Alex il 16 Ago 2011
just to make it more "formal"
tic;
A=sort(A);
boxplot(A,'symbol','','labels',[1.14 3.43 5.72 8.01 10.3 12.6 14.9]);
set(gca,'YScale','log');
toc;
Elapsed time is 1123.973379 seconds.
I comment out the sort
tic;
%A=sort(A);
boxplot(A,'symbol','','labels',[1.14 3.43 5.72 8.01 10.3 12.6 14.9]);
set(gca,'YScale','log');
toc;
Elapsed time is 1502.582405 seconds.
why it does more time in the second casE?
Oleg Komarov
Oleg Komarov il 16 Ago 2011
Use the profiler to check the differences.
Have you tried my other solution?

Accedi per commentare.


Alex
Alex il 13 Ago 2011
Hmm... what I tried is the following: I only did boxplot for the first set only 1*125.000.000 sets and matlab replied after 15 minutes. Then I tried boxplot with the 7 sets (7*125.000.000 sets) which unfortunately has not retuned after 3 hours..
So far matlab consumed 110 Gb of memory and keeps going on....
If there was a way to do 7*boxplots(for only one set) and then put together I think I will be finished by now. Is there any way to do that?
B.R Alex
  1 Commento
Oleg Komarov
Oleg Komarov il 13 Ago 2011
You said 1*125.000.000 sets, what do you mean?
Sorting 1e8 elements takes 11 seconds on my laptop, after that most of the calculations can be done in instants. I really don't see how boxplot is taking som much time.
Please post the code you're using.

Accedi per commentare.


Alex
Alex il 14 Ago 2011
I wouldl ike to thank you for your answer
size(A)
ans =
131072000 7
so I have a matrix of 131072000*7 and I want a box plot
this boxplot(A','symbol','');
will crash after 6 hours after consumerd 300GB (three hundred gigabytes of ram!)
  3 Commenti
Alex
Alex il 15 Ago 2011
(I hope this time I did reply correct!)
tic; test=sort(A); toc
Elapsed time is 35.341047 seconds.
As you can see that was extremely fast. There is no more swap space and the system administrator is not here to ask him to add more!
One solution is to try to do 7*different Box plots and then with some magic way to make them one. As you can read below , yesterday I have explained what I have tried but all the boxplots get stuck the one over it.
B.R
Alex
Oleg Komarov
Oleg Komarov il 15 Ago 2011
Give a try to my EDIT in my answer.

Accedi per commentare.


Alex
Alex il 14 Ago 2011
I then tried the following I wrote
>> boxplot(A(:,1)','symbol','','labels',[10]); hold on; boxplot(A(:,2)','symbol','','labels',[20]); boxplot(A(:,3)','symbol','','labels',[30]); boxplot(A(:,4)','symbol','','labels',[40]); boxplot(A(:,5)','symbol','','labels',[50]); boxplot(A(:,6)','symbol','','labels',[60]); boxplot(A(:,7)','symbol','','labels',[70]); hold off;
which makes it run but unfortunately all the 7 boxplots are plotted the one over the other. As you might have guessed already these 7 sets are for the 10,20,30,40,50,60,70 cases.
I would like to thank you for your answers
Best Regards Alex
  1 Commento
Oleg Komarov
Oleg Komarov il 14 Ago 2011
Please see the edit on my answer and stop creating additional answers, use comments.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by