Why does using graphminspantree() result in large memory use
Mostra commenti meno recenti
I'm attempting to generate a minimum spanning tree with graphminspantree(). Input is a complete graph, i.e. a distance matrix. My full dataset has ~210.000 rows/columns, but so far I was unable to produce any usable result besides small examples (a few hundred/thousand rows/columns) as memory consumption is enormous. I have access to a machine with 768GB of RAM (not a typo), here graphminspantree() spent about 20 minutes accumulating RAM before I had to terminate MATLAB with 95% memory use when it began swapping. The input was a subset of 80k rows of my full 212k rows of data.
Some benchmarks:
X = load('mydatafile.csv');
D = pdist(X(1:rows,:));
tic; [t,p] = graphminspantree(sparse(squareform(D))); toc
1600 rows 2s <1GB
3200 rows 9s ~2GB
6400 rows 36s ~8GB
12800 rows 160s ~34GB
25600 rows 727s ~107GB
extrapolation:
212000 rows ~13h >>2TB
While I could tolerate ~13 hours of runtime, multiple TB of RAM is a bit much to ask for. Are there any alternative/more efficient ways to generate a minimum spanning tree in MATLAB?
For reference, a quick comparison with an implementation in R indicated no runaway memory use, but an extrapolated runtime of about 4 years for the full data set. Also not exactly amazing.
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Bioinformatics Toolbox in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!