computing SVD of very large matrix

20 visualizzazioni (ultimi 30 giorni)
Saurabh
Saurabh il 25 Mar 2012
Commentato: Fernando Zigunov il 26 Mag 2021
Hi, I have a very large matrix 1.1 million rows and 1100 columns. I want to compute SVD of this matrix. It is not possible to load the entire matrix into memory. So how do I do this ?
  3 Commenti
mohammad master
mohammad master il 21 Lug 2017
Hi, I have your problem too. I want to reduce dimension of my data which is a 350000 * 800000 matrix. Each row is a sample. The matrix itself does not fit to memory and I don't know how to do PCA or SVD on my data. Can any body help please?
John D'Errico
John D'Errico il 21 Lug 2017
Sorry. It matters not how big and fast computers get. There will always be somebody who wants to compute the SVD of a matrix orders of magnitude larger than anything they can compute. (Other computations too are common here.) It is trivial to create immensely huge problems to solve. That does not make them solvable.
In Mohamed's case, the matrix requires something like
350000 * 800000*8/1024/1024/1024/1024
ans =
2.03726813197136
So 2 terabytes of memory, just to store the matrix. Computing the SVD will take more.
Just wanting to solve a huge problem does not make it solvable. Your computer does not have infinite capability. Yes, the problem can be solved. You just need to find a seriously large computer. It would help if you work for the NSA, or some similar organization. Find someone who won't care if you use their hardware for your personal research tasks. Or you can always pay for time, to Amazon or Microsoft.

Accedi per commentare.

Risposte (2)

Christine Tobler
Christine Tobler il 21 Lug 2017
The simplest way may be to use a computer with more memory - a matrix of size 1100000 x 1100 takes about 9GB of memory, which is doable nowadays.
If this is not an option, since the matrix has many more rows than columns, you may be able to read the matrix into memory one block at a time, compute the QR decomposition of each individual block, and then do the SVD of the combination of these. Do you need the singular vectors too, or only the singular values?
Here's an example: Let's say you have matrices A1 and A2 (which both have many more rows than columns), and you want to compute the SVD of [A1; A2].
% Load A1
[Q1, R1] = qr(A1, 0);
% Discard Q1 and A1
% Load A2
[Q2, R2] = qr(A2, 0);
% Discard Q2 and A2
% Just singular values
s = svd([R1; R2]);
% Right singular vectors too
[~, S, V] = svd([R1; R2], 'econ');
% Left singular vectors of A would have same size as A, so they do not fit in memory
I'm afraid this trick would not work for mohammad master's case, since that matrix has both many rows and many columns. Since that matrix would take about 2 terabytes of memory, it is also too large to fit into memory of any reasonably available machine, right now.
  1 Commento
Fernando Zigunov
Fernando Zigunov il 26 Mag 2021
Christine, I'm not sure this is correct. I tried implementing your code and it does not match against the ground truth. Try the code below:
clear; clc; close all;
A=rand(300,20); %"Large" matrix to perform SVD on
A1=A(:,1:10); %Splits large matrix into two matrices
A2=A(:,11:20);
[Q1, R1]=qr(A1,0); %Performs QR on each matrix
[Q2, R2]=qr(A2,0);
R=[R1; R2]; %Joins the R matrices
[~, s, v] = svd(R, 'econ'); %SVD of the R matrix only
[U, S, V]=svd(A, 'econ'); %full SVD of the "large matrix"
%Compare eigenvalues
figure;
plot(diag(s),'k-'); hold; plot(diag(S),'b-')
%Compare eigenvectors
figure;
subplot(1,2,1); imagesc(v);
subplot(1,2,2); imagesc(V(1:10,1:10));
Neither the eigenvalues, nor the right eigenvectors are matching against the ground truth.

Accedi per commentare.


Tom Wenseleers
Tom Wenseleers il 25 Giu 2018
In R I think it's possible to calculate the truncated SVD from a very large matrix stored out of memory (ie stored as a disk backed file) using the irlba and the bigmemory and bigalgebra packages, see https://stackoverflow.com/questions/17352771/svd-of-very-large-matrix-in-r-program Not sure it will run in reasonable time for such huge problems though...

Categorie

Scopri di più su Linear Algebra in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by