Why are tall arrays producing different results for Principal Component Analysis in underdetermined systems?

Question

MathWorks Support Team il 17 Mar 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1930855-why-are-tall-arrays-producing-different-results-for-principal-component-analysis-in-underdetermined

Risposto: MathWorks Support Team il 17 Mar 2023

Risposta accettata: MathWorks Support Team

I am using MATLAB R2022b and I am getting different results using the "pca" function on my data depending on whether or not I am casting my data to a tall array first.

This is the output if my array is not "tall":

>> test = [1 2 3; 2 3 4];
>> pca(test)
ans =
    0.5774
    0.5774
    0.5774

This is the output if my array is "tall":

>> gather(pca(tall(test)))
Evaluating tall expression using the Parallel Pool 'Processes':
- Pass 1 of 1: Completed in 0.32 sec
Evaluation completed in 0.53 sec
ans =
    0.5774         0    0.8165
    0.5774   -0.7071   -0.4082
    0.5774    0.7071   -0.4082

The results, shown by 'ans', are different. What is causing this?

Accedi per rispondere a questa domanda.

Answer 1

MathWorks Support Team il 17 Mar 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1930855-why-are-tall-arrays-producing-different-results-for-principal-component-analysis-in-underdetermined#answer_1195570

Apri in MATLAB Online

For tall arrays, "pca" cannot compute the principal components directly. Instead, it first creates the full covariance matrix, and then uses "pcacov" on the covariance matrix.

For overdetermined systems (i.e. "n x m" matrices where "n>=m", representing n observations of m variables) the two methods produce the same result.

For underdetermined systems, on the other hand, where m>n, the covariance matrix and therefore the result of "pcacov" is an m x m matrix. However, the result of "pca" for ordinary (i.e. not tall) "n x m" arrays is an "m x n - 1" matrix.

To reproduce the behaviour of the tall arrays on an ordinary array, you can use "pcacov(cov(test))" instead of "pca(test)".

To reproduce the behaviour of the ordinary arrays in the tall arrays, you can use the following code:

>> pcaTall = gather(pca(tall(test)));
>> pcaNotTall = pcaTall(1:size(test, 1), 1:size(test, 2));
 

For more information on "pcacov" and "cov", please find the following documentation pages:

https://www.mathworks.com/help/releases/R2022b/matlab/ref/cov.html

https://www.mathworks.com/help/releases/R2022b/stats/pcacov.html

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Why are tall arrays producing different results for Principal Component Analysis in underdetermined systems?

Risposta accettata

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Why are tall arrays producing different results for Principal Component Analysis in underdetermined systems?

Risposta accettata

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti