sum along data with different steps

Dear all;
I have three data sets: This is just an example...
data1=[1 5 3 4 2 0 1 2 8 2 10 2 1]; %
data2=[1 10 ....................]
data3=[4 3 .....................]
I would like to sum by a step of 2 ,3 4 etc.. along data1 file. That means data1 becomes now:
data1_2 =[6 7 2 3 10 12] % sum of 2 numbers
data_3 =[9 6 11 14] % sum of three numbers, the last number 1 is not included
data_4 =[13 5 22] % the last number 1 is again not included
data_5 =[15 13] % the last numbers 2 and 1 are not included
I did it and it works..
But I want to improve it to do the following:
Continue the sum calculation till the same length of the original data is reached. That means:
data1_2 =[6 7 2 3 10 12 12 10 3 2 7 6]; % sum from left to right and then the right to the left till the desired length is reached
The result should be in case of summing 3 numbers for example like this:
data1_3=[9 6 11 14 13 12 3 12 9 6 1 14]
Here is my code:
##########################################
for i=1:1:length(data1); % sum of events
step=floor(length(data1)./i);
for j=1:1:step
start_idx=i*(j-1)+1;
end_idx=i+start_idx-1;
Sum_data1(i,j)=sum(data1(start_idx:end_idx));
end
end
#################################
Thanks you gentlemen for your helps…
Cheers

 Risposta accettata

Cedric
Cedric il 28 Ott 2017
Modificato: Cedric il 28 Ott 2017
Interestingly, the following seems to produce what you are looking for:
>> expandSum = @(x,n) sum(reshape(cell2mat(arrayfun(@(k)rot90(data1(1:n*floor(numel(data1)/n)), 2*(k-1)), 1:n, 'Unif', 0)), n, [])) ;
and with this function defined:
>> expandSum(data1, 2)
ans =
6 7 2 3 10 12 12 10 3 2 7 6
>> expandSum(data1, 3)
ans =
9 6 11 14 14 11 6 9 9 6 11 14
>> expandSum(data1, 4)
ans =
13 5 22 22 5 13 13 5 22 22 5 13
Of course, it may not be that useful given like this as an big ugly one-liner, so here was the thought process:
  • Q: Are we able to flip/merge the data automatically based on the number of repetitions? A: Yes using ROT90 and setting the number of rotations as twice (multiple of 180) the repetition index minus one (so the first is not rotated).
  • Q: Are we able to build a cell array of these rotated vectors? A: Yes using ARRAYFUN implicitely iterating from 1 to the number of repetitions (which should be the size of groups for summation).
  • Q: Are we able to concatenate all rotated vector horizontally without developing as a CSL (necessary for a call to HORZCAT)? A: Yes using CELL2MAT.
  • Q: Are we able to sum over groups of size n? A: Yes reshaping first the whole thing in a n x m array and summing over dim 1.
PS: I don't guarantee that it is really working in all situations, you'll have to understand and test if you want to follow this approach.

16 Commenti

Thanks for your answer. But it does not work as I want. What I want is jsut continue sum claculation of the numbers in till reaching the length of the original data.
Is there really a difference between cases where the size of data1 is a multiple of n and other cases? In your example there is a "re-bounce" for n=3:
1 5 3 4 2 0 1 2 8 2 10 2 1 2 10 2 8 2 1 0 2 4 3 5 1 5 3 4 2 0 1 2 8 2 10 2
^ ^
these 1 are not duplicated
and in the case n=2 there isn't:
1 5 3 4 2 0 1 2 8 2 10 2 1 1 2 10 2 8 2 1 0 2 4 3 5 1
^ ^
so it's not that you "continue" like in the case n=3 here.
No difference. Both cases are Ok regardless of duplicated ones..
In this simple example it works but for other examples it does not. Similar examples the only difference is the length of data which very large(e.g 500000 ).
Cedric
Cedric il 2 Nov 2017
Modificato: Cedric il 2 Nov 2017
There is a difference; it is not the same to implement a circular buffer that alternates the direction (like with n=3) or to append the original data vector alternating the direction. You give an example that uses both approaches. If you had the same logic/approach for all values of n as for n=3, the case n=2 would be:
1 5 3 4 2 0 1 2 8 2 10 2 1 2 10 2 8 2 1 0 2 4 3 5 1 5
and sums would be:
6 7 2 3 10 12 3 12 10 1 6 8 6
instead of:
6 7 2 3 10 12 12 10 3 2 7 6
If this difference is what you need, we can find a way to account for it that depends on the value of n, but you have to explain the logic.
I also thought that maybe you wanted to skip the end number(s) before building the circular/permuted buffer, the same way you skip them in the first case, but it is not the case as you account for the final 1.
So the logic is unclear.
Ok. The difference is clear but in my calculation the final result will be the same in both cases. Let us assume I do not want to skip the last one. By the way I tested it again using my original data and it works nicely but it is very slow! I have a huge amount of data and performing such calculation using your one-liner is unfortunately very slow ….Thanks you your help I greatly appreciate it.
Cedric
Cedric il 3 Nov 2017
Modificato: Cedric il 3 Nov 2017
My pleasure! Could you attach a MAT-File with your data or a large part of it? Or how large is your vector or data set?
The one liner was for testing the approach, not knowing how large your data set is. We can certainly make this quite faster once I understand what you can use (skip vs not skip, etc).
Cedric
Cedric il 3 Nov 2017
Modificato: Cedric il 3 Nov 2017
Assuming that you don't want to skip anything, and that you care more for the efficiency than for the number of lines, I wrote the version attached. You can profile it by setting N large and see in the profiler where the time is spent.
Setting N=1 and looking at result, you can see how it is different from the numbers that you give in the example.
Then we can adapt it or find something more efficient first. One candidate may be a convolution and a permutation of terms.
PS1: for running the code in the profiler (for large Ns), type
profile viewer
in the command window, then type main_01 in the filed "Run this code", and click on [Start profiling]. Clicking on main_01 in the report will give you the detail.
PS2: to illustrate the convolution, set e.g. groupSize=3 and execute:
conv( data1, ones( 1, groupSize ), 'same' )
you will recognize most or all of the terms of the final sum (the one with no skip at least) in a weird order. You should also observe that it is about 10 times faster than what is implemented in the file, which should be faster than the one liner already. What is missing is a reordering operation that could be complex and time consuming, but it is worth investigating if you really need to go fast.
Thanks....find attached please the data. the sum of the data starts from 2 to 350.
Cedric
Cedric il 3 Nov 2017
Modificato: Cedric il 3 Nov 2017
This is a vector of 5e5 elements. What sums are you talking about? Is it the group sizes? The code from my previous comment takes 0.8s on a laptop for a group size of 350. It is not a lot, and if you have to do a one shot pass on all sizes between 2 and 350, the computation time will still be quite reasonable (maybe 3 minutes).
Adam
Adam il 3 Nov 2017
Modificato: Adam il 3 Nov 2017
expandSum(my_data1,2): Here we get another vector of 2.5e5 elements but we continue the sum calculation till reaching the size 5e5 and we do the same with other sums
expandSum(my_data1,3)
...
expandSum(my_data1,350)
Cedric
Cedric il 3 Nov 2017
Modificato: Cedric il 3 Nov 2017
The file main_01.m that I attached above does this. You can wrap it into a function if you want:
function result = expandSum( data, groupSize )
% - Build alternating "circular buffer".
data_flr = fliplr( data ) ;
buffer = repmat( [data, data_flr], 1, floor( groupSize/2 ) ) ;
if mod( groupSize, 2 )
buffer = [buffer, data_flr] ;
end
% - Truncate to appropriate length for group size.
len = numel( data ) * groupSize ;
buffer = buffer(1:len) ;
% - Reshape for summing along dim 1 and sum.
result = sum( reshape( buffer, groupSize, [] )) ;
end
The first question is: does this produce an adequate output? The second is: is it efficient enough? As I explain in my previous comment, if you have only a one-time series of computation to perform with group sizes varying between 2 and 350, it is probably efficient enough because it would take ~3 minutes to go through the whole process. If not, there may be faster alternatives, but you'd have to invest quite a bit of your time for building them (e.g. by studying if it is possible to reorder the output of CONV adequately).
Adam
Adam il 15 Nov 2017
Thanks. On my computer it takes also 3 minutes to perform the calculation from 2 to 350 . I kindly appreciate it...
Jan
Jan il 15 Nov 2017
@Adam: What is the answer to "does this produce an adequate output"?
Yes. it does... The last thing which I have to test now is the randomness. I want to do the same calculation but in a random way. For example. from the previous calculation we get for 2: data1=[1 5 3 4 2 0 1 2 8 2 10 2 1]; %
>> expandSum(data1, 2)
ans =
6 7 2 3 10 12 12 10 3 2 7 6
Now instead of summing 2 (3, 4,...350) numbers successively, the sum should be performed in a random way... Any hints will be greatly welcome...Thanks again
Adam
Adam il 16 Nov 2017
I solved it using randperm before performing the sum calculation...I have now to run my calculation with the large amount of data and check the outcome...

Accedi per commentare.

Più risposte (0)

Tag

Richiesto:

il 28 Ott 2017

Commentato:

il 16 Nov 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by