Answering your queries:
1.Which feature selection method to be used? The best method depends on your data and your goal. For regression problems (ie when target is numeric):
- ReliefF (for regression),
- F-test (filter method),
- LASSO regression (embedded method) and
- Sequential Feature Selection (wrapper method).
while for classification problems (ie when target is categorical):
- ReliefF (for classification),
- F-test/ANOVA,
- Sequential Feature Selection.
Also, Sequential Feature Selection (using sequentialfs) is a robust and general-purpose method, as it works for both regression and classification, and can be paired with any model.
2. Which function should be used for feature selection?
MATLAB R2021a provides several functions for feature selection. The most general and commonly used are:
- sequentialfs:Sequential feature selection for regression or classification.
- relieff:Ranks features using the ReliefF algorithm.
- lasso:Performs LASSO regression and selects features by shrinking coefficients to zero.
3. Below is the sample code that will help you achieve your results in MATLAB R2021a.
data = readmatrix('yourfile.csv');
fun = @(Xtrain, Ytrain, Xtest, Ytest) ...
mean((Ytest - predict(fitlm(Xtrain, Ytrain), Xtest)).^2);
opts = statset('display','iter');
[fs, history] = sequentialfs(fun, X, Y, 'cv', 5, 'options', opts);
disp('Selected feature columns:');
plot(history.Crit, 'o-');
xlabel('Number of features');
ylabel('Cross-validated MSE');
title('Feature selection history');
Hope this helps!