Why sparse function is slow?

I recently generated a sparse matrix using function: sparse. When I do the profiling, I found the vast majority of the runtime is spent on calling function sparse, which is pretty shocking to me.
To find out if generating a sparse matrix is slow across all the programming languages. I use scipy.sparse.coo_matrix in python to perform the same task. What suprised me is that scipy.sparse.coo_matrix has 10X speed of that of Matlab's sparse function.
Matlab demo Code:
RowInd = repmat(randperm(262144),81,1);
RowInd = RowInd(1:260100*81) ;
ColInd = repmat(randperm(262144),81,1);
ColInd = ColInd(1:260100*81);
Val = randn(260100*81,1);
tStart = tic;
L=sparse(RowInd,ColInd,Val, 262144, 262144 ,260100*81);
tEnd = toc(tStart);
disp(['Runtime of generating a sparse matrix in Matlab:', num2str(tEnd), ' second.']);
Python demo Code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
from time import time
if __name__ == "__main__":
nz_indsRow = np.tile(np.random.permutation(262144), 81)
nz_indsRow = nz_indsRow[:260100*81]
nz_indsCol = np.tile(np.random.permutation(262144), 81)
nz_indsCol = nz_indsCol[:260100*81]
nz_indsVal = np.random.rand(260100*81)
print(nz_indsRow.shape, nz_indsCol.shape, nz_indsVal.shape)
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')
In my desktop: the runtime is 1.2399 s vs 0.12721 s.
Can someone explain to me that why sparse function in Matlab is so slow? How to find a more efficient function that generate a sparse matrix in Matlab?

15 Commenti

Bruno Luong
Bruno Luong il 21 Set 2020
Modificato: Bruno Luong il 21 Set 2020
I didn't read your code (I don't checkout unknown link) but do you happen to call sparse within a loop?
If yes, then you do a bad building workflow. You should build I, J, S arrays in the loop (with preallocation) then call SPARSE once, when I, J, K are ready.
Lantao Yu
Lantao Yu il 21 Set 2020
No, I called sparse function only once after pre-computing row index, col index and all the non-zero entries.
Bruno Luong
Bruno Luong il 21 Set 2020
Modificato: Bruno Luong il 21 Set 2020
Please post you data (Matfile) that contains the input parameters of sparse() command.
Lantao Yu
Lantao Yu il 21 Set 2020
The files for reproducing the results are found in the attachment.
Bruno Luong
Bruno Luong il 21 Set 2020
Modificato: Bruno Luong il 21 Set 2020
I don't have im processing toolbox so I can't run your code.
I need you to save
RowInd, ColInd, Vals, NumPixels, wins_number, WinCardinality in matfile and attached here.
Lantao Yu
Lantao Yu il 21 Set 2020
Hi, Bruno. Please check out the image.mat file as the image variable. I cannot upload Vals.mat because it is way larger than 5 MB.
Bruno Luong
Bruno Luong il 21 Set 2020
Forget it, useless for me.
Lantao Yu
Lantao Yu il 21 Set 2020
Are you able to regenerate the results? If not, how can I help? The website only allows me to upload 10 files per day, each of which shall be no larger than 5MB.
Bruno Luong
Bruno Luong il 21 Set 2020
Modificato: Bruno Luong il 21 Set 2020
No I don't have a IP toolbox (I can't run the im2col command).
But I guess you generate sparse matrix of size (262144 x 262144) with 21068100 non-zeros elements.
I generate a random sparse matrix with similar input sizes it takes 0.72605 second on my PC. How much you get?
EDIT: just see you post the time in the question.
Bruno Luong
Bruno Luong il 21 Set 2020
I don't see anything wrong with your MATLAB sparse command, so it seems that python is much more efficient in building sparse matrix than MATLAB. Though a factor of 10 is huge.
Lantao Yu
Lantao Yu il 21 Set 2020
Modificato: Lantao Yu il 21 Set 2020
Let me cut it short:
Matlab code:
RowInd = repmat(randperm(262144),81,1);
RowInd = RowInd(1:260100*81) ;
ColInd = repmat(randperm(262144),81,1);
ColInd = ColInd(1:260100*81);
Val = randn(260100*81,1);
tStart = tic;
L=sparse(RowInd,ColInd,Val, 262144, 262144 ,260100*81);
tEnd = toc(tStart);
disp(['Runtime of generating a sparse matrix in Matlab:', num2str(tEnd), ' second.']);
Python Code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
from time import time
if __name__ == "__main__":
nz_indsRow = np.tile(np.random.permutation(262144), 81)
nz_indsRow = nz_indsRow[:260100*81]
nz_indsCol = np.tile(np.random.permutation(262144), 81)
nz_indsCol = nz_indsCol[:260100*81]
nz_indsVal = np.random.rand(260100*81)
print(nz_indsRow.shape, nz_indsCol.shape, nz_indsVal.shape)
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')
In my desktop: the runtime is 1.2399 s vs 0.12721 s.
Lantao Yu
Lantao Yu il 21 Set 2020
Modificato: Lantao Yu il 21 Set 2020
It seems sarcastic that a paid programming language is running at 1/10 speed of a free programming language.
the cyclist
the cyclist il 21 Set 2020
I'm not familiar with the COO format, but I'm wary of the fact (stated in this documentation) that one cannot do arithmetic operations directly on it. One has to convert to CSR or CSC format first.
It seems possible to me that this is not a completely fair comparison, as a result. But I really don't know.
What I do know is that cherry-picking one speed test, and then saying that a paid language is "running at 1/10 the speed" is definitely not a particularly useful exercise. Python has many strengths, but I wouldn't base the choice on this one excruciatingly small detail (unless of course that is the single dominant factor for you, for some reason).
Bruno Luong
Bruno Luong il 21 Set 2020
Good point cyclist. For fair comparison, one must run CSC, whih is MATLAB format.
Thank you for your point, cyclist.
I run the following code involving convert COO matrix to CSC/CSR matrix. The print goes:
Runtime of generating a sparse CSC matrix via SicPy: 1.3742189407348633 second.
Runtime of generating a sparse CSR matrix via SicPy: 1.3034861087799072 second.
Now the runtime is close to that in Matlab. I apologize for not conducting a fair comparison.
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
from time import time
if __name__ == "__main__":
nz_indsRow = np.tile(np.random.permutation(262144), 81)
nz_indsRow = nz_indsRow[:260100*81]
nz_indsCol = np.tile(np.random.permutation(262144), 81)
nz_indsCol = nz_indsCol[:260100*81]
nz_indsVal = np.random.rand(260100*81)
print(nz_indsRow.shape, nz_indsCol.shape, nz_indsVal.shape)
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
LL = scipy.sparse.coo_matrix.tocsc(L)
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
LL = scipy.sparse.coo_matrix.tocsr(L)
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')

Accedi per commentare.

Risposte (0)

Categorie

Tag

Richiesto:

il 21 Set 2020

Commentato:

il 21 Set 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by