I am trying to use pearson correlation coefficient for feature selection in my data. I have a 21392x1974 table, with the 1974 columns as variables/features and the 21392 rows as observations. I have looked into Mathworks documentation on corrcoeff() but most of the examples are for small size data. I am quite confused how I can apply it to such a huge dataset. Also, I am not sure if Pearson Correlation Coefficient can be applied to the 1974th column of my data which has various string type labels (like Apple, Ball, Cat, etc., - Total 14 different classes of labels). My aim is to:-
- Calculate the Pearson correlation coefficient between 7th column vs each column of my data. Thereby, 7th column will generate perfect correlation (1) as it is correlated with itself. My aim is to find how correlated all features are with the 7th column of the data. I would also like to display the column indices in the orginal data for which the Pearson Correlation Coefficient is >= 0.70.
2. I would secondly like to know if it is possible to find pearson correlation coefficient between 1974th column )labels/classes) vs each column of my data as a second scenario I would like to ascertain.
I have looked at various resources like http://matlab.izmiran.ru/help/techdoc/ref/corrcoef.html and https://uk.mathworks.com/help/matlab/ref/corrcoef.html , but am really confused as to how this can be done for my data. Any help in this regard would be really appreciated. Cheers and Thanks!