Creating a Matrix with for loop
Mostra commenti meno recenti
Hi together,
I have a question about creating a matrix. For me, at the moment, it is impossible to solve, I just don't get it.
I've got a matrix X
2x9 double
X = [2 1 2 10 3 0 0 0 0,
2 2 3 20 5 0 0 0 0]
and I have a vector z=[2 3];
Now I would like to create a matrix T , which looks like this
5x9 double
T= [2 1 2 10 3 0 0 0 0,
2 2 3 10 3 0 0 0 0,
2 3 4 20 5 0 0 0 0,
2 4 5 20 5 0 0 0 0,
2 5 6 20 5 0 0 0 0];
So this means 2x (z(1)) the first row of X and 3x (z(2)) the 2nd row of X.
In the end, I don't know the final size of X and z and don't know the content, but this is just a small example.
Hope anybody can help. Thank you.
Cheers,
Philipp
Risposta accettata
Più risposte (1)
Stephen23
il 30 Apr 2020
Simpler, no data duplication, fewer intermediate variables with less memory footprint:
>> X = [2,1,2,10,3,0,0,0,0;2,2,3,20,5,0,0,0,0]
X =
2 1 2 10 3 0 0 0 0
2 2 3 20 5 0 0 0 0
>> z = [2,3];
>> fun = @(r,n) r*ones(1,n);
>> idx = cell2mat(arrayfun(fun,1:numel(z),z,'uni',0));
>> T = X(idx,:)
T =
2 1 2 10 3 0 0 0 0
2 1 2 10 3 0 0 0 0
2 2 3 20 5 0 0 0 0
2 2 3 20 5 0 0 0 0
2 2 3 20 5 0 0 0 0
4 Commenti
Rik
il 30 Apr 2020
Just to clarify: I used multiple lines and multiple variables to show the intermediate steps my code was taking. Apart from how the rows are duplicated (repelem versus implicit expansion plus indexing) it seems our solutions are fairly similar. Am I missing something where a condensed version of my code would have a larger memory footprint than yours?
I also fail to see how this is simpler, but simplicity is of course (partially) in the eye of the beholder.
Stephen23
il 30 Apr 2020
"...it seems our solutions are fairly similar."
Ummmm... apart from the fact that they work in quite different ways:
- Your answer splits up the data array into a cell array (thus duplicating the data in memory), applies repelem to those data arrays (thus duplicating the data again), before finally joining it all back together via cell2mat. The output of the anonymous function is the duplicated data. You used mat2cell and num2cell and cellfun.
- My answer generates an index vector which is applied to the original data array (without any intermediate data duplication). As the index is just a vector (of row subscripts) it has a much smaller memory footprint. The data is not duplicated anywhere, nor is repelem used. The output of the anonymous function is a vector of subscripts. I used arrayfun.
I cannot see a lot of similarity between them, except perhaps on a very abstract level: "create some things of the right size, then concatenate them together". However what they create, how they create it, and what they do with it afterwards are all very different. Also note that creating the index (as well as having a smaller memory footprint) means that the index can be repeatedly reapplied to other arrays, whereas your approach requires re-running the entire code for each data array.
"Am I missing something where a condensed version of my code would have a larger memory footprint than yours?"
Lets have a look at the sizes of the intermediate arrays of your code:
>> X2=mat2cell(X,ones(size(X,1),1),size(X,2));
>> whos
Name Size Bytes Class Attributes
X 2x9 144 double
X2 2x1 368 cell
z 1x2 16 double
>> z2=num2cell(z);z2=reshape(z2,size(X2));
>> X2=cellfun(@(data,sz) repelem(data,sz,1),X2,z2,'UniformOutput',false);
>> whos
Name Size Bytes Class Attributes
X 2x9 144 double
X2 2x1 584 cell
z 1x2 16 double
z2 2x1 240 cell
Note that the cell array X2 contains the duplicated data (and so its bytes will scale with the number of elements the input data array). Because the numeric arrays inside X2 are replaced with other numeric arrays, there will also be a point where all of those arrays will need to be in memory (this is not shown here, you would need to use your OS or perhaps the profiler).
Vs. my code:
>> fun = @(r,n) r*ones(1,n);
>> idx = cell2mat(arrayfun(fun,1:numel(z),z,'uni',0));
>> whos
Name Size Bytes Class Attributes
X 2x9 144 double
fun 1x1 32 function_handle
idx 1x5 40 double
z 1x2 16 double
Note the size of array idx (and also those in the temporary cell array) will scale with the number of rows of the input data array (i.e. bytes proportional to rows). It is clear that with the original two-row input array my code has around 10% of the memory footprint of your code (32+40 bytes vs. 584+240 bytes). However we can easily try it with a larger input array and you will see how your variables' memory consumption increases faster than for my code. For example, simply replicating the input arrays 1000 times:
>> X = repmat(X,1000,1);
>> z = repmat([2,3],1,1000);
>> whos X
Name Size Bytes Class Attributes
X 2000x9 144000 double
>> X2=mat2cell(X,ones(size(X,1),1),size(X,2));
>> whos X2
Name Size Bytes Class Attributes
X2 2000x1 368000 cell
>> z2=num2cell(z);z2=reshape(z2,size(X2));
>> X2=cellfun(@(data,sz) repelem(data,sz,1),X2,z2,'UniformOutput',false);
>> whos X2 z2
Name Size Bytes Class Attributes
X2 2000x1 584000 cell
z2 2000x1 240000 cell
Vs. my code (variable bytes scaling with the number of data rows):
>> idx = cell2mat(arrayfun(fun,1:numel(z),z,'uni',0));
>> whos idx fun
Name Size Bytes Class Attributes
fun 1x1 32 function_handle
idx 1x5000 40000 double
The difference is now my code requires just 5% of the memory that your code does... and so it continues :)
"Apart from how the rows are duplicated..."
The original question is entirely about how to duplicate rows. That is rather the point of it.
Rik
il 30 Apr 2020
Thanks for your thorough reply.
It does seem a bit unfair to count X2 in the memory footprint, but not count arrayfun(fun,1:numel(z),z,'uni',0) (not that it is going to matter a lot), but I see what you mean now.
"It does seem a bit unfair to count X2 in the memory footprint, but not count arrayfun(fun,1:numel(z),z,'uni',0)..."
The intermediate cell array has 264 bytes for the original data, it scales with the size and contents of z.
Due to the transient nature of these intermediate arrays, and the unknown sequence in which the JIT compiler might create and destroy them, the only way to really know the actual memory consumption is to measure it: please feel free to do some tests and post the results here.
Categorie
Scopri di più su Logical in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!