How do I use histcounts with overlapping bins?

First off, there's only this post I found with some relevant inputs, although the comments suggested overlapping bins may not work with histcounts?
My question is this: Is there a way to create bin egdes by giving the number of bins (which histcounts does) and the percentage overlap between bins to generate a set of overlapping bins which can be used with accumarray later on?
More specifically, I have vectors x, y and z covering a spatial volume. I need to "discretize" this volume and bin the vector V.. (which is when I found the answer on 3D binning). I am looking for a way to extend this by adding overlapping bins.
Is there a way to achieve this? Any help is appreciated. Thanks!

4 Commenti

Only by the method Walter outlines (or variations thereof) -- all of the Matlab histogram routines require a monotonically increasing binning edges vector. They simply abort otherwise.
What is the use case for having the bins overlap? Why do you want that? How will you interpret it? I deal with 3-D imagery all the time and I've never needed that.
The main use in this case is to generate a more "filled" data set. This could be achieved by making the bins smaller in principle. But if the data to be binned is somewhat sparse then collecting those points over bigger overlapping bins gives a well-averaged effect. This is my understanding of it, which may not be the best reason out there.
Do you need to visualize the overlapping bins (histogram) or just compute with overlapping bins (histcounts)?

Accedi per commentare.

 Risposta accettata

Walter Roberson
Walter Roberson il 28 Mar 2019
Discretize three times per dimension, once with the bins exactly where you want them, once with the bins [overlap] earlier, once with the bins [overlap] later. Do the 27 different 3D binnings (each possible combination of early, middle, late), taking lists of indices. Then take the union of all of the indices in corresponding bins.

6 Commenti

Thanks Walter, specifying the overlapping bin edges explicitly and binning them for each case seems to have worked.
This is a sample of what I've ended up using -
frac = 0.5; % Defines the percentage overlap. In this case 75% since the bin size is 2.
init_shift = 3.5; % Mainly to avoid empty values in histcounts.
xbins=min(x):2:max(x)+init_shift;
ybins=min(y):2:max(y)+init_shift;
zbins=min(z):2:max(z)+init_shift;
for f = 1:4
[~,~,cx]=histcounts(x,xbins);
[~,~,cy]=histcounts(y,ybins);
[~,~,cz]=histcounts(z,zbins);
X{f} = accumarray(cx, x(:),[], @nanmean);
Y{f} = accumarray(cy, y(:),[], @nanmean);
Z{f} = accumarray(cz, z(:),[], @nanmean);
Um{f} = accumarray([cx, cy, cz], U(:),[], @nanmean);
Vm{f} = accumarray([cx, cy, cz], V(:),[], @nanmean);
Wm{f} = accumarray([cx, cy, cz], W(:),[], @nanmean);
xbins=xbins - frac; ybins=ybins - frac; zbins=zbins - frac;
end
I end up with 4 cells of 3D data (accumulated over 4 sets of bins). Not all 4 of these cells have the same size however.
I do have another question regarding how to collate this data in the same sequence as the bins. Should I post a separate query?
Thanks a lot !
This does not give possibilities such as low x, regular y, high z.
At the moment I do not understand why you are using nanmean.
I do not understand the point of collating the data in the same sequence of the bins when you are using nanmean and so destroying the individual identities.
Prodip Das
Prodip Das il 29 Mar 2019
Modificato: Prodip Das il 29 Mar 2019
I agree this doesn't run through all the possible combinations, but at the moment seems enough for the purposes I'm going to use it for.
I'm using nanmean to mainly get a point data from all the data points that fall into each bin. Note: x,y,z,U,V,W are very large vectors.
So now I end up with 4 sets of linearly increasing x,y,z values, albeit each set is shifted a little to the left wrt to the previous corresponding one. And the U,V,W values for each set are in form of 3d matrices. This data is what I need to collate in a proper linear sequence of x,y,z values.
I hope this explanation makes some sense (its kind of hard to properly put it down).
Thanks again,
No, you lose all order information when you take the mean. It does not make sense to use the original order.
shifts = [-3.5 0 3.5];
whichpoints = cell(3,3,3);
cx = cell(3,1);
cy = cell(3,1);
cz = cell(3,1);
for idx = 1:3
[~,~,cx{idx}] = histcounts(x, xbins+shifts(idx));
[~,~,cy{idx}] = histcounts(y, ybins+shifts(idx));
[~,~,cz{idx}] = histcounts(z, zbins+shifts(idx));
end
npoint = length(x);
nbx = length(xbins);
nby = length(ybins);
nbz = length(zbins);
pidx = (1:npoint).';
bs = [nbx, nby, nbz];
for xsi = 1:3
for ysi = 1:3
for zsi = 1:3
whichpoints{xsi,ysi,zsi} = accumarray([cx{xsi}, cy{ysi}, cz{zsi}], pidx, bs, @(idx) {idx} );
end
end
end
allpoints = cell(nbx,nby,nbz);
for K = 1 : numel(whichpoints)
allpoints = cellfun(@union, allpoints, whichpoints{K});
end
Now allpoints should be cell in x y z with each location holding the linear indices of all of the points that have been put into the bin taking into account overlaps. Each cell will have the respective indices in sorted order, and any one index will appear only once in any one cell. You can use the indices for whatever purposes you want, such as
cellfun(@(idx) nanmean(x(idx)), allpoints)
Thanks Walter.
This is going to take me a while to completely get my head around as its not immediately clear to me.
I'll post the matrix collating bit as a separate question.
I think I might have the union loop wrong, possibly.

Accedi per commentare.

Più risposte (1)

Matt J
Matt J il 28 Mar 2019
Modificato: Matt J il 28 Mar 2019
If you're willing to make some approximations in the interest of speed, this is a method that will do the whole 3D accumarray operation. It uses some FEX contributions that you must download, namely KronProd and ndSparse. Basically, it first histograms the x,y,z data normally into super-thin, non-overlapping bins. Then it basically consolidates those into overlapping bins by separable convolution.
%% simulated data
vmin=0; vmax=10; %integer min and max assumed here
x=rand(1,10000)*(vmax-vmin)+vmin;
y=rand(1,10000)*(vmax-vmin)+vmin;
z=rand(1,10000)*(vmax-vmin)+vmin;
%% binning parameter selections
binShift=0.5; binWidth=1;
%% Set-up computations
lowerEdges=vmin:binShift:vmax-binWidth;
upperEdges=lowerEdges+binWidth;
Nbins=numel(lowerEdges);
delta=vmax-vmin;
N=1000*delta;
L=(lowerEdges.')*N/delta+1;
U=(upperEdges.')*N/delta+1;
T=cumsum(sparse(1:Nbins,L,1,Nbins,N+1)-sparse(1:Nbins,U,1,Nbins,N+1),2);
C=KronProd({T(:,1:N)},[1,1,1]); %separable convolution operator
%% Do computation
tic;
e=linspace(vmin,vmax,N);
I=discretize(x,e).';
J=discretize(y,e).';
K=discretize(z,e).';
H=ndSparse.build([I,J,K],1,[N,N,N]);
A=full(C*H); %The "accumarray" result
toc; %Elapsed time is 1.182683 seconds.

1 Commento

Thanks for the answer Matt ! I wasn't certain where the approximations lay, and wasn't very well versed with separable convolution. Needed a more quick fix as of now, will revert back to this in the future hopefully to understand better.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by