Problem 2043. Six Steps to PCA - Step 1: Centre and Standardize

Introduction

Principal Component Analysis (PCA) is a classic among the many methods of multivariate data analysis. Invented in 1901 by Karl Pearson the method is mostly used today as a tool in exploratory data analysis and dimension reduction, but also for making predictive models in machine learning.

Step 1: Centre and Standardize

A first step for many multivariate methods begins by removing the influence of location and scale from variables in the raw data. Also commonly known as the z-scores of X, Z is a transformation of X such that the columns are centered to have mean 0 and scaled to have standard deviation 1 (unless a column of X is constant, in which case that column of Z is constant at 0). Strictly speaking, z-scores are based on population parameters, whereas the analogous calculation based on sample mean and standard deviation is the Student's t-statistic.

Task

Write a function to centre and standardize the input matrix X, returning as the output a structure with the following fields:

  • Z: the centred and standardized matrix corresponding to the input X
  • Mu: a vector of the original means of columns of X
  • Sigma: a vector of the original standard deviations of columns of X

Tips

  • Matlab's zscore function is part of the Stats Toolbox which is not available in Cody. You'll have to write your own.
  • You should take care to avoid division by zero when a column is invariant.

Following problems in the series

Solution Stats

13.79% Correct | 86.21% Incorrect
Last Solution submitted on Jan 14, 2024

Problem Comments

Solution Comments

Show comments

Problem Recent Solvers18

Suggested Problems

More from this Author1

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!