Generate a 100000x100000 matrix that takes less time and memory

Question

Jay Vaidya il 19 Ott 2020

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/619113-generate-a-100000x100000-matrix-that-takes-less-time-and-memory

Commentato: Jay Vaidya il 20 Ott 2020

I have written a random matrix generator code that generates an adjacency matrix of any size. I am targetting larger size like 100kx100k but the problem that I face is the time to generate that (which is related to the RAM memory). It needs ~ 60 GB to do this.

I presume that there has to be a smarter way to do this, like by using a small int instead of a double word or something similar to the datatype. Any help would be appreciated. Thanks

function [a,ed] = Random_graph_genar_function(nodes, connectivity)
for it=1:3
ni = nodes;
ac= connectivity; 
mi=(ni*(ni-1))/2;
no=round(mi*ac);
a=zeros(ni,ni);
in=randperm(mi,no); p=1;
for i=1:ni
    for j=i+1:ni
        if (any(in(:)==p))
            a(i,j)=1;
            a(j,i)=1;
        end
        p=p+1;
    end
end
p=0;
for i=1:ni
    for j=i+1:ni
        if (a(i,j)==1)
            p=p+1;
            ed(1,p)=i;
            ed(2,p)=j;
        end
    end
end
s=sum(a);
mx=max(s)
for i=1:ni
    bc(i)=mx-s(i);
end
tbc=sum(bc);
end
end

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Walter Roberson il 20 Ott 2020

The percentage of the matrix that is non-zero is random

Is it going to be 0.000378% in one run, but 82.19% in another run? You have no idea what the percentage will be, other than more than 0% and less than 100% ?

Jay Vaidya il 20 Ott 2020

No, I meant that I would like to control the connectivity of the graph (percentage of non-zero elements) using the connectivity parameter in the above function.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Matt J il 20 Ott 2020

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/619113-generate-a-100000x100000-matrix-that-takes-less-time-and-memory#answer_518698

Modificato: Matt J il 20 Ott 2020

Apri in MATLAB Online

Seems to me the whole code can be replaced by,

function [a,ed] = Random_graph_genar_function(nodes, connectivity)
    a=logical(sprandsym(nodes,connectivity));
    a=a-a.*speye(nodes);
    
    G=graph(a);
    
    ed=table2array(G.Edges).';
end

although instead of having the function return a and ed, I suspect that everything you are trying to do is more easily accomplished with the graph object G instead.

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Matt J il 20 Ott 2020

Modificato: Matt J il 20 Ott 2020

Apri in MATLAB Online

But actually the adj matrix 'a' should have diagonal elements as 1. That is not the case in the code that you sent.

It's not the case in the code you posted, as far as I can tell. Your code fills the upper triangle strictly above the diagonal of a, never touching the diagonal. Regardless, you can do

a(1:nodes+1:end)=1;

or

a=a|speye(nodes);

Also, could you tell me what is the reason for having the for loop that you wrote? with nPasses = 10?

This is to reduce the density of the temporary double sparse matrix created by sprandsym(). Ultimately, we want a to be type logical, but there is no way, unfortunately, to create a random sparse logical matrix directly. I doubt it will save you much RAM, but it is something. Compare:

>> A=sprandsym(1e5,.0001);
>> L=logical(A);
>> whos A L
  Name   Size              Kilobytes     Class     Attributes
                                                             
  A      100000x100000         16406     double    sparse    
  L      100000x100000          9571     logical   sparse    

Walter Roberson il 20 Ott 2020

Modificato: Walter Roberson il 20 Ott 2020

If you have a fixed number of iterations to work with, then you will need to proceed by either

growing the graph step by step so that at no point are there disconnected points; or
imposing a maximum distance away from other existing points upon new points, and "reserving" a number of iterations to do nothing but "fix-up" the disconnected subtrees by connecting them to other sub-trees; or
after the initial iterations, find all disconnected subtrees and move them to be attached to the main tree. If you have a constrained geometry, it might take some searching to find attachment points that satisfy the constraints

I suspect that the first option, growing step by step, is the easiest.

Jay Vaidya il 20 Ott 2020

Thanks, Walter and Matt. I agree that growing step by step would be easier. I have made another question about this. It would be great if you have some time to see that. It is here. Thanks in advance.

Accedi per commentare.

Answer 2

Ameer Hamza il 19 Ott 2020

2
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/619113-generate-a-100000x100000-matrix-that-takes-less-time-and-memory#answer_518348

Apri in MATLAB Online

If most of the elements equal to zero, then use sparse array: https://www.mathworks.com/help/matlab/ref/sparse.html. You can also try to create uint8 array which will only use 1/8 memory

a=zeros(ni,ni,'uint8');

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Jay Vaidya il 20 Ott 2020

That gives the matrix that is quite sparse. I needed a matrix that can have the connectivity that I would like to have. Can we control the connectivity/density in this?

Jay Vaidya il 20 Ott 2020

Apri in MATLAB Online

My entire code used to use double datatype. I don't know using sparse can change other things. At the end of the day, I need an adjacency matrix that has 0.1 (connectivity) and 100k nodes (100k rows and 100k columns).

I changed the

a=zeros(ni,ni);

to

a=zeros(ni,ni,'uint8');

But, it is not making a big difference in the above code for n = 1e3 (1000 nodes).

Accedi per commentare.

Answer 3

Walter Roberson il 19 Ott 2020

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/619113-generate-a-100000x100000-matrix-that-takes-less-time-and-memory#answer_518543

Apri in MATLAB Online

a=zeros(ni,ni,'uint8');

You are already using only one byte per entry.

If you were to create logic that packed 8 adjacent entries into one byte, you could potentially get 8:1 compression... and would still need 116.4 gigabytes of memory.

Your only hope would be if you could use a sparse() array. See https://www.mathworks.com/matlabcentral/answers/100287-how-much-memory-a-sparse-matrix-created-using-sprand-with-given-number-of-rows-columns-and-density for a guideline to the amount of memory a sparse array uses. I suspect the 16 is 8 bytes for an offset, plus 8 bytes for storage -- so using a sparse logical array possibly only takes 8+1 = 9 bytes per entry (this could be tested.)

Which is to say that if your occupancy is more than about 1/20 then sparse would be less efficient. If you had a target of (say) 16 gigabytes then you would need to be about only 1/1000 occupied.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Generate a 100000x100000 matrix that takes less time and memory

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Risposta accettata

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (2)

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Generate a 100000x100000 matrix that takes less time and memory

4 Commenti Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Risposta accettata

8 Commenti Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (2)

7 Commenti Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti