Why does using graphminspantree() result in large memory use

I'm attempting to generate a minimum spanning tree with graphminspantree(). Input is a complete graph, i.e. a distance matrix. My full dataset has ~210.000 rows/columns, but so far I was unable to produce any usable result besides small examples (a few hundred/thousand rows/columns) as memory consumption is enormous. I have access to a machine with 768GB of RAM (not a typo), here graphminspantree() spent about 20 minutes accumulating RAM before I had to terminate MATLAB with 95% memory use when it began swapping. The input was a subset of 80k rows of my full 212k rows of data.
Some benchmarks:
X = load('mydatafile.csv');
D = pdist(X(1:rows,:));
tic; [t,p] = graphminspantree(sparse(squareform(D))); toc
1600 rows 2s <1GB
3200 rows 9s ~2GB
6400 rows 36s ~8GB
12800 rows 160s ~34GB
25600 rows 727s ~107GB
extrapolation:
212000 rows ~13h >>2TB
While I could tolerate ~13 hours of runtime, multiple TB of RAM is a bit much to ask for. Are there any alternative/more efficient ways to generate a minimum spanning tree in MATLAB?
For reference, a quick comparison with an implementation in R indicated no runaway memory use, but an extrapolated runtime of about 4 years for the full data set. Also not exactly amazing.

 Risposta accettata

The recommendation is to use the MATLAB graph 'minspantree' function. The MATLAB graph products are intended to receive more support than the Bioinformatics toolbox. Unless the workflow requires specific API interaction with the bioinformatic toolbox the base MATLAB graph products will likely be more efficient.

Più risposte (0)

Categorie

Scopri di più su Bioinformatics Toolbox in Centro assistenza e File Exchange

Prodotti

Release

R2019a

Tag

Non è stata ancora inserito alcun tag.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by