How do I use histcounts with overlapping bins?

Question

1 voto

First off, there's only this post I found with some relevant inputs, although the comments suggested overlapping bins may not work with histcounts?

My question is this: Is there a way to create bin egdes by giving the number of bins (which histcounts does) and the percentage overlap between bins to generate a set of overlapping bins which can be used with accumarray later on?

More specifically, I have vectors x, y and z covering a spatial volume. I need to "discretize" this volume and bin the vector V.. (which is when I found the answer on 3D binning). I am looking for a way to extend this by adding overlapping bins.

Is there a way to achieve this? Any help is appreciated. Thanks!

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Prodip Das il 29 Mar 2019

The main use in this case is to generate a more "filled" data set. This could be achieved by making the bins smaller in principle. But if the data to be binned is somewhat sparse then collecting those points over bigger overlapping bins gives a well-averaged effect. This is my understanding of it, which may not be the best reason out there.

Steven Lord il 29 Mar 2019

Do you need to visualize the overlapping bins (histogram) or just compute with overlapping bins (histcounts)?

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Walter Roberson il 28 Mar 2019

1 voto

Discretize three times per dimension, once with the bins exactly where you want them, once with the bins [overlap] earlier, once with the bins [overlap] later. Do the 27 different 3D binnings (each possible combination of early, middle, late), taking lists of indices. Then take the union of all of the indices in corresponding bins.

6 Commenti
Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

Prodip Das il 29 Mar 2019

Apri in MATLAB Online

Thanks Walter, specifying the overlapping bin edges explicitly and binning them for each case seems to have worked.

This is a sample of what I've ended up using -

frac = 0.5; % Defines the percentage overlap. In this case 75% since the bin size is 2.
init_shift = 3.5; % Mainly to avoid empty values in histcounts.
xbins=min(x):2:max(x)+init_shift; 
ybins=min(y):2:max(y)+init_shift; 
zbins=min(z):2:max(z)+init_shift;
for f = 1:4
    [~,~,cx]=histcounts(x,xbins); 
    [~,~,cy]=histcounts(y,ybins);
    [~,~,cz]=histcounts(z,zbins);
     X{f} = accumarray(cx, x(:),[], @nanmean); 
     Y{f} = accumarray(cy, y(:),[], @nanmean);
     Z{f} = accumarray(cz, z(:),[], @nanmean);
     Um{f} = accumarray([cx, cy, cz], U(:),[], @nanmean); 
     Vm{f} = accumarray([cx, cy, cz], V(:),[], @nanmean); 
     Wm{f} = accumarray([cx, cy, cz], W(:),[], @nanmean); 
     xbins=xbins - frac; ybins=ybins - frac; zbins=zbins - frac;
end

I end up with 4 cells of 3D data (accumulated over 4 sets of bins). Not all 4 of these cells have the same size however.

I do have another question regarding how to collate this data in the same sequence as the bins. Should I post a separate query?

Thanks a lot !

Walter Roberson il 29 Mar 2019

Apri in MATLAB Online

No, you lose all order information when you take the mean. It does not make sense to use the original order.

shifts = [-3.5 0 3.5];
whichpoints = cell(3,3,3);
cx = cell(3,1);
cy = cell(3,1);
cz = cell(3,1);
for idx = 1:3
  [~,~,cx{idx}] = histcounts(x, xbins+shifts(idx));
  [~,~,cy{idx}] = histcounts(y, ybins+shifts(idx));
  [~,~,cz{idx}] = histcounts(z, zbins+shifts(idx));
end
npoint = length(x);
nbx = length(xbins);
nby = length(ybins);
nbz = length(zbins);
pidx = (1:npoint).';
bs = [nbx, nby, nbz];
for xsi = 1:3
    for ysi = 1:3
        for zsi = 1:3
            whichpoints{xsi,ysi,zsi} = accumarray([cx{xsi}, cy{ysi}, cz{zsi}], pidx, bs, @(idx) {idx} );
        end
    end
end
allpoints = cell(nbx,nby,nbz);
for K = 1 : numel(whichpoints)
    allpoints = cellfun(@union, allpoints, whichpoints{K});
end

Now allpoints should be cell in x y z with each location holding the linear indices of all of the points that have been put into the bin taking into account overlaps. Each cell will have the respective indices in sorted order, and any one index will appear only once in any one cell. You can use the indices for whatever purposes you want, such as

cellfun(@(idx) nanmean(x(idx)), allpoints)

Prodip Das il 29 Mar 2019

Thanks Walter.

This is going to take me a while to completely get my head around as its not immediately clear to me.

I'll post the matrix collating bit as a separate question.

Walter Roberson il 29 Mar 2019

I think I might have the union loop wrong, possibly.

Accedi per commentare.

Answer 2

Matt J il 28 Mar 2019

Modificato: Matt J il 28 Mar 2019

Apri in MATLAB Online

0 voti

If you're willing to make some approximations in the interest of speed, this is a method that will do the whole 3D accumarray operation. It uses some FEX contributions that you must download, namely KronProd and ndSparse. Basically, it first histograms the x,y,z data normally into super-thin, non-overlapping bins. Then it basically consolidates those into overlapping bins by separable convolution.

%% simulated data
vmin=0; vmax=10;    %integer min and max assumed here
x=rand(1,10000)*(vmax-vmin)+vmin;    
y=rand(1,10000)*(vmax-vmin)+vmin;
z=rand(1,10000)*(vmax-vmin)+vmin;
          
%% binning parameter selections
binShift=0.5;   binWidth=1;    
%% Set-up computations
lowerEdges=vmin:binShift:vmax-binWidth;
upperEdges=lowerEdges+binWidth;
Nbins=numel(lowerEdges);
delta=vmax-vmin;
N=1000*delta;
L=(lowerEdges.')*N/delta+1;
U=(upperEdges.')*N/delta+1; 
T=cumsum(sparse(1:Nbins,L,1,Nbins,N+1)-sparse(1:Nbins,U,1,Nbins,N+1),2);
C=KronProd({T(:,1:N)},[1,1,1]); %separable convolution operator
%% Do computation
tic; 
    e=linspace(vmin,vmax,N);
    I=discretize(x,e).';
    J=discretize(y,e).';
    K=discretize(z,e).';
    H=ndSparse.build([I,J,K],1,[N,N,N]);
    A=full(C*H); %The "accumarray" result
toc; %Elapsed time is 1.182683 seconds.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Prodip Das il 29 Mar 2019

Thanks for the answer Matt ! I wasn't certain where the approximations lay, and wasn't very well versed with separable convolution. Needed a more quick fix as of now, will revert back to this in the future hopefully to understand better.

Accedi per commentare.

How do I use histcounts with overlapping bins?

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Risposta accettata

6 Commenti
Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Categorie

Tag

Community Treasure Hunt

How do I use histcounts with overlapping bins?

4 Commenti Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Risposta accettata

6 Commenti Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Categorie

Tag

Vedere anche

Community Treasure Hunt

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

6 Commenti
Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti