matrix pre-allocation

5 views (last 30 days)
Chien-Chia Huang on 13 Apr 2011
What's actually the difference between the following commands
sparse(zeros(10000));
sparse([],[],[],10000,10000,0);
a = sparse([]); for i =1:10000,for j = 1:10000, a(i,j) = 0,end,end
Oftentimes the first line leads to out-of-memory problem while the rest do not when the dimension acretes.
I consulted to some tutorial about memory allocation that says "loop" is at times useful to speed up and avoid memory outflow.
When should I use loop to generate a really "big" matrix? or the second line just dominates?
Thanks

Andrew Newell on 13 Apr 2011
The first command creates the full array zeros(10000) before turning it into a sparse array. The simplest way of creating the matrix you want is
sparse(10000,10000)
This is not a problem where using a loop is a good idea! You can time the commands using tic and toc:
tic
a = sparse([]); for i =1:100,for j = 1:100, a(i,j) = 0; end,end
toc
tic
b = sparse(100,100);
toc
On my computer, the output is
Elapsed time is 0.010886 seconds.
Elapsed time is 0.000298 seconds.
In other words, for an array that is only 100 x 100, the loops are 35 times slower. If you increase the array size, the loops increase in time as the total number of elements, while the second command always takes about the same time. So with your array, the loops would to it 350,000 times slower! And that's assuming that you change the comma after a(i,j) = 0 to a semicolon so the output isn't displayed after each element is changed.
Chien-Chia Huang on 13 Apr 2011
Thanks, Andrew.
I did time the commands and got similar result like yours.
But I was always confounded by my first command which led to out-of-memory problem, while using loop could solve the problem, even though the matrix size is the same.

Matt Fig on 13 Apr 2011
There are cases where loops are much faster than other alternatives. This can sometimes be difficult to predict ahead of time, however, when a loop is not a good idea is often easier to identify. In general:
1. Growing an array in a loop will always be your slowest option. The case you show above where a starts out empty and grows in a loop is going to be the slowest option.
2. The more complicated the indexing maneuvers in a loop, the slower it will be. Simple looping over consecutive elements in existing arrays will be fast.
3. A loop which makes many M-file function calls will be slow compared to a vectorized code. For speed, look for only a minimum of built-in calls within a loop.
4. These guidelines stand out more as the size of the arrays involved grows. For example, not pre-allocating a 1-by-5 array is no big deal, but a 1-by-5e7 array will slow down the code.
In your three examples above, the first example is a good demonstration of how not to pre-allocate a sparse array, as you found out. Your second example generates the ultimate sparse array - 0% density. I don't know enough about sparse arrays to know if this actually pre-allocates any memory at all. The third example is bad whether dealing with sparse arrays or not because you are growing and array in a loop.
Wolfgang Schwanghart on 13 Apr 2011
You can preallocate sparse arrays using the sixth input argument of sparse or the function spalloc.