Exported Regression Model is outputting NaN's when given new data.

2 visualizzazioni (ultimi 30 giorni)
I built a model using the Regression Leaner app, exported the model, called it using new data, but the model outputs roughly 50% NaN's when given new data.
I'm using the carbig.mat sample data that comes with MATLAB Machine Learning toolbox, and trying to build a regression model to predict car Acceleration.
load carbig.mat;
featurenames = ["Acceleration","Cylinders","Displacement","Horsepower","MPG","Mfg","Model","Model_Year","Origin","Weight","cyl4","org","when"];
data = table(Acceleration,Cylinders,Displacement,Horsepower,MPG,Mfg,Model,Model_Year,Origin,Weight,cyl4,org,when,'VariableNames',featurenames);
data = rmmissing(data); %Remove any NaNs
% Convert text to categorical
data.Mfg = categorical(cellstr(data.Mfg));
data.Model = categorical(cellstr(data.Model));
data.Origin = categorical(cellstr(data.Origin));
data.cyl4 = categorical(cellstr(data.cyl4));
data.org = categorical(cellstr(data.org));
data.when = categorical(cellstr(data.when));
% split data into traning and testing datasets
PercentTrain = 90;
TestLogic = rand(height(data),1)>(PercentTrain/100);
dataTrain = data(~TestLogic,:);
dataTest = data(TestLogic,:);
I then:
1) Load the Regression Learner app > New Session > Data Set Variable is dataTrain > Response is Acceleration > Start
2) Select the Medium Gaussian model > Leave Everything Default > Run. Get an RMSE of about 1.5.
3) Generate Function to export the creation of this model as a function, save the function as a .m (I've attached this function below)
Then I can run the following to train the model using my training data (90% of the original set) and test it using my test data (10% of the original set), to get Pred which is the models predicted Acceleration.
[trainedModel, validationRMSE] = trainRegressionModel(dataTrain);
Pred = trainedModel.predictFcn(dataTest);
The problem is Pred contains about 56% NaNs and I dont know why. I'm I unversampling the data or something?
I also tried using quick train and train all, to train loads of different models. All seem to have the same issue of producing lots of NaN's when solved with new data.
Thank you for your help.
function [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% Returns a trained regression model and its RMSE. This code recreates the
% model trained in Regression Learner app. Use the generated code to
% automate training the same model with new data, or to learn how to
% programmatically train models.
%
% Input:
% trainingData: A table containing the same predictor and response
% columns as those imported into the app.
%
% Output:
% trainedModel: A struct containing the trained regression model. The
% struct contains various fields with information about the trained
% model.
%
% trainedModel.predictFcn: A function to make predictions on new data.
%
% validationRMSE: A double containing the RMSE. In the app, the Models
% pane displays the RMSE for each model.
%
% Use the code to train the model with new data. To retrain your model,
% call the function from the command line with your original data or new
% data as the input argument trainingData.
%
% For example, to retrain a regression model trained with the original data
% set T, enter:
% [trainedModel, validationRMSE] = trainRegressionModel(T)
%
% To make predictions with the returned 'trainedModel' on new data T2, use
% yfit = trainedModel.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
% trainedModel.HowToPredict
% Auto-generated by MATLAB on 04-Mar-2023 11:24:28
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'Cylinders', 'Displacement', 'Horsepower', 'MPG', 'Mfg', 'Model', 'Model_Year', 'Origin', 'Weight', 'cyl4', 'org', 'when'};
predictors = inputTable(:, predictorNames);
response = inputTable.Acceleration;
isCategoricalPredictor = [false, false, false, false, true, true, false, true, false, true, true, true];
% Train a regression model
% This code specifies all the model options and trains the model.
responseScale = iqr(response);
if ~isfinite(responseScale) || responseScale == 0.0
responseScale = 1.0;
end
boxConstraint = responseScale/1.349;
epsilon = responseScale/13.49;
regressionSVM = fitrsvm(...
predictors, ...
response, ...
'KernelFunction', 'gaussian', ...
'PolynomialOrder', [], ...
'KernelScale', 3.5, ...
'BoxConstraint', boxConstraint, ...
'Epsilon', epsilon, ...
'Standardize', true);
% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(regressionSVM, x);
trainedModel.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));
% Add additional fields to the result struct
trainedModel.RequiredVariables = {'Cylinders', 'Displacement', 'Horsepower', 'MPG', 'Mfg', 'Model', 'Model_Year', 'Origin', 'Weight', 'cyl4', 'org', 'when'};
trainedModel.RegressionSVM = regressionSVM;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2022b.';
trainedModel.HowToPredict = sprintf('To make predictions on a new table, T, use: \n yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'Cylinders', 'Displacement', 'Horsepower', 'MPG', 'Mfg', 'Model', 'Model_Year', 'Origin', 'Weight', 'cyl4', 'org', 'when'};
predictors = inputTable(:, predictorNames);
response = inputTable.Acceleration;
isCategoricalPredictor = [false, false, false, false, true, true, false, true, false, true, true, true];
% Perform cross-validation
KFolds = 5;
cvp = cvpartition(size(response, 1), 'KFold', KFolds);
% Initialize the predictions to the proper sizes
validationPredictions = response;
for fold = 1:KFolds
trainingPredictors = predictors(cvp.training(fold), :);
trainingResponse = response(cvp.training(fold), :);
foldIsCategoricalPredictor = isCategoricalPredictor;
% Train a regression model
% This code specifies all the model options and trains the model.
responseScale = iqr(trainingResponse);
if ~isfinite(responseScale) || responseScale == 0.0
responseScale = 1.0;
end
boxConstraint = responseScale/1.349;
epsilon = responseScale/13.49;
regressionSVM = fitrsvm(...
trainingPredictors, ...
trainingResponse, ...
'KernelFunction', 'gaussian', ...
'PolynomialOrder', [], ...
'KernelScale', 3.5, ...
'BoxConstraint', boxConstraint, ...
'Epsilon', epsilon, ...
'Standardize', true);
% Create the result struct with predict function
svmPredictFcn = @(x) predict(regressionSVM, x);
validationPredictFcn = @(x) svmPredictFcn(x);
% Add additional fields to the result struct
% Compute validation predictions
validationPredictors = predictors(cvp.test(fold), :);
foldPredictions = validationPredictFcn(validationPredictors);
% Store predictions in the original order
validationPredictions(cvp.test(fold), :) = foldPredictions;
end
% Compute validation RMSE
isNotMissing = ~isnan(validationPredictions) & ~isnan(response);
validationRMSE = sqrt(nansum(( validationPredictions - response ).^2) / numel(response(isNotMissing) ));

Risposta accettata

Sulaymon Eshkabilov
Sulaymon Eshkabilov il 4 Mar 2023
Modificato: Sulaymon Eshkabilov il 4 Mar 2023
There is one crtical point that is in the input data for the support vector machine regression model the input data needs to be numerical values and not categorical. Here are corrected codes:
load carbig.mat;
featurenames = ["Acceleration","Cylinders","Displacement","Horsepower","MPG","Model_Year","Weight"];
data = table(Acceleration,Cylinders,Displacement,Horsepower,MPG,Model_Year,Weight,'VariableNames',featurenames);
data = rmmissing(data); % Remove any NaNs
PercentTrain = 90; % The 90% of the data taken for the model training
TestLogic = rand(height(data),1)>(PercentTrain/100);
dataTrain = data(~TestLogic,:);
dataTest = data(TestLogic,:);
%% Run
[trainedModel, validationRMSE] = trainRegressionModel(dataTrain)
trainedModel = struct with fields:
predictFcn: @(x)svmPredictFcn(predictorExtractionFcn(x)) RequiredVariables: ["Cylinders" "Displacement" "Horsepower" "MPG" "Model_Year" "Weight"] RegressionSVM: [1×1 RegressionSVM] About: 'This struct is a trained model exported from Regression Learner R2022b.' HowToPredict: 'To make predictions on a new table, T, use: ↵ yfit = c.predictFcn(T) ↵replacing 'c' with the name of the variable that is this struct, e.g. 'trainedModel'. ↵ ↵The table, T, must contain the variables returned by: ↵ c.RequiredVariables ↵Variable formats (e.g. matrix/vector, datatype) must match the original training data. ↵Additional variables are ignored. ↵ ↵For more information, see How to predict using an exported model.'
validationRMSE = 1.5789
Pred = trainedModel.predictFcn(dataTest);
function [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
inputTable = trainingData;
predictorNames = ["Cylinders","Displacement","Horsepower","MPG","Model_Year","Weight"];
predictors = inputTable(:, predictorNames);
response = inputTable.Acceleration;
isCategoricalPredictor = [false, false, false, false, true, true, false, true, false, true, true, true];
% Train a regression model
% This code specifies all the model options and trains the model.
responseScale = iqr(response);
if ~isfinite(responseScale) || responseScale == 0.0
responseScale = 1.0;
end
boxConstraint = responseScale/1.349;
epsilon = responseScale/13.49;
regressionSVM = fitrsvm(...
predictors, ...
response, ...
'KernelFunction', 'gaussian', ...
'PolynomialOrder', [], ...
'KernelScale', 3.5, ...
'BoxConstraint', boxConstraint, ...
'Epsilon', epsilon, ...
'Standardize', true);
% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(regressionSVM, x);
trainedModel.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));
% Add additional fields to the result struct
trainedModel.RequiredVariables = ["Cylinders","Displacement","Horsepower","MPG","Model_Year","Weight"];
trainedModel.RegressionSVM = regressionSVM;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2022b.';
trainedModel.HowToPredict = sprintf('To make predictions on a new table, T, use: \n yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
% Extract predictors and response
% This code processes the data into the right shape for training the model.
inputTable = trainingData;
predictorNames = ["Cylinders","Displacement","Horsepower","MPG","Model_Year","Weight"];
predictors = inputTable(:, predictorNames);
response = inputTable.Acceleration;
% Perform cross-validation
KFolds = 5;
cvp = cvpartition(size(response, 1), 'KFold', KFolds);
% Initialize the predictions to the proper sizes
validationPredictions = response;
for fold = 1:KFolds
trainingPredictors = predictors(cvp.training(fold), :);
trainingResponse = response(cvp.training(fold), :);
foldIsCategoricalPredictor = isCategoricalPredictor;
% Train a regression model
% This code specifies all the model options and trains the model.
responseScale = iqr(trainingResponse);
if ~isfinite(responseScale) || responseScale == 0.0
responseScale = 1.0;
end
boxConstraint = responseScale/1.349;
epsilon = responseScale/13.49;
regressionSVM = fitrsvm(...
trainingPredictors, ...
trainingResponse, ...
'KernelFunction', 'gaussian', ...
'PolynomialOrder', [], ...
'KernelScale', 3.5, ...
'BoxConstraint', boxConstraint, ...
'Epsilon', epsilon, ...
'Standardize', true);
% Create the result struct with predict function
svmPredictFcn = @(x) predict(regressionSVM, x);
validationPredictFcn = @(x) svmPredictFcn(x);
% Add additional fields to the result struct
% Compute validation predictions
validationPredictors = predictors(cvp.test(fold), :);
foldPredictions = validationPredictFcn(validationPredictors);
% Store predictions in the original order
validationPredictions(cvp.test(fold), :) = foldPredictions;
end
% Compute validation RMSE
isNotMissing = ~isnan(validationPredictions) & ~isnan(response);
validationRMSE = sqrt(nansum(( validationPredictions - response ).^2) / numel(response(isNotMissing) ));
end
  2 Commenti
Gregory Smith
Gregory Smith il 6 Mar 2023
Perfect, that was the issue. Thank you.
I'm suprised MATLAB doesn't give a warning for that.

Accedi per commentare.

Più risposte (0)

Prodotti


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by