Fast Principal Component Analysis for high dimensional data
Versione 2.2 (2,39 KB) da
dpblum
Implementation of PCA that is much faster in analysis of high-dimensional data, compared to MATLAB's or Python's in-built function.
[COEFF,SCORE,LATENT,EXPLAINED] = fastpca(data)
Fast Principal Component Analysis for very high dimensional data (e.g. voxel-level analysis of neuroimaging data), implemented according to C. Bishop's book "Pattern Recognition and Machine Learning", p. 570. For high-dimensional data, fastpca.m is substantially faster than MATLAB's in-build function pca.m.
According to MATLAB's PCA terminology, fastpca.m needs an input-matrix with each of N rows representing an observation (e.g. subject) and each of p columns a dimension (e.g. voxel). fastpca.m returns principal component (PC) loadings COEFF, PC scores (SCORE), variances explained by the PCs cumulatively in absolute values (LATENT) and in percent (EXPLAINED). Additionally, fastpca returns the PC loading of the small covariance matrix (COEFFs).
Decrease in computation time results from calculating PCs first from the (usually smaller NxN) covariance matrix of the transposed input-matrix "data" and then projecting them onto the observations, in order to obtain the PCs of the large DxD covariance matrix.
By default, fastpca removes the mean of each observation. In this implementation of fastpca, I skipped calculation of Hotelling’s T-Squared Statistic.
Example:
In medical image analysis, there are often datasets with few to several hundreds of observations (subjects) and hundreds of thousands dimensions (voxels). As an example, I compare MATLABs PCA and fastpca.m using a random matrix with 300 rows and 500000 columns:
data = rand(300,500000);
tic; [COEFF,SCORE,LATENT,~,EXPLAINED] = pca(data); toc
>> Elapsed time is 37.295108 seconds.
tic; [COEFF,SCORE,LATENT,EXPLAINED] = fastpca(data); toc
>> Elapsed time is 4.853614 seconds.
Version 2.2 from 02/08/2021: fastpca is now implemented in Python and available on GitHub: https://github.com/dpblum/fastpca.git
Version 1.21 from 12/07/2021.
Version 1.0 from 08/08/2019.
Implemented by Dominik Blum.
E-Mail: dominikblum1987@gmail.com
Cita come
dpblum (2024). Fast Principal Component Analysis for high dimensional data (https://www.mathworks.com/matlabcentral/fileexchange/72396-fast-principal-component-analysis-for-high-dimensional-data), MATLAB Central File Exchange. Recuperato .
Compatibilità della release di MATLAB
Creato con
R2017a
Compatibile con qualsiasi release
Compatibilità della piattaforma
Windows macOS LinuxCategorie
- AI, Data Science, and Statistics > Statistics and Machine Learning Toolbox > Dimensionality Reduction and Feature Extraction >
Scopri di più su Dimensionality Reduction and Feature Extraction in Help Center e MATLAB Answers
Tag
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Scopri Live Editor
Crea script con codice, output e testo formattato in un unico documento eseguibile.
Versione | Pubblicato | Note della release | |
---|---|---|---|
2.2 | fastpca is now implemented in Python and available on GitHub: https://github.com/dpblum/fastpca.git |
||
2.1 | fastpca is now implemented in Python and available on GitHub: https://github.com/dpblum/fastpca.git |
||
1.211 | . |
||
1.2 | . |
||
1.1 | . |
||
1.0 |