fitrsvm fails if epsilon is generated using a for loop

Question

Adam White il 24 Mar 2018

2
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/390441-fitrsvm-fails-if-epsilon-is-generated-using-a-for-loop

Commentato: Adam White il 28 Mar 2018

I wanted to run a grid search to find suitable parameters for my SVM model but I have discovered that fitrsvm gives inconsistent errors if the value of the epsilon parameter is generated using a ‘for loop’. For example the RMSE for my model with epsilon = 0.8 will be different if I use the for loop:

for epsilon = 0.8:.1:1.2

compared with if I use the for loop

for epsilon = 0.1:.1:1.2

The RMSEs are 2.6868 and 2.7020 respectively

I thought this might be some floating point error, so I tried to ensure that the epsilon value passed to fitrsvm was exactly 0.8. I did this by creating variable d_epsilon (line 17) and passing its value to fitrsvm (ie by changing line 26 to ‘Epsilon’ = d_epsilon but this did not work. By contrast using c_epsilon which is completely independent of the for loop (line 16) does work.

In my real project, I use nested loops to search for values for Epsilon, Boxconstraint, and KernelScale. The inconsistencies in my results are about 10%. (I am using a grid search as the parameters returned using OptimizeHyperparameters perform worse that some of the parameters cited in journal articles for my dataset (UCI’s auto-mpg).

clear all
%%read in auto-mpg.csv. This is a cleaned version of UCI dataset auto-mpg
data = readtable('auto-mpg.csv','ReadVariableNames',false);
VarNames = {'mpg','cylinders' 'displacement' 'horsepower' 'weight' 'acceleration' ...
    'modelYear' 'origin' 'carName'};
data.Properties.VariableNames = VarNames;
data = [data(:,2:9) data(:,1)];
data.carName=[];
%%carry out 10 fold cross-validation with different epsilon values
testResults_SVM=[];
testActual_SVM=[];
rng('default')
c = cvpartition(data.mpg,'KFold',10);
for epsilon = 0.1:0.1:1.2
    %c_epsilon= 0.80000;
    %d_epsilon = str2double(string(round(epsilon,2)))
    for fold = 1:10
        cv_trainingData = data(c.training(fold), :);   
        cv_testData = data(c.test(fold), :);
        AutoSVM = fitrsvm(cv_trainingData,'mpg',...
                'KernelFunction', 'gaussian', ...
                'PolynomialOrder', [], ...
                'KernelScale', 5.5, ...
                'BoxConstraint', 100, ...
                'Epsilon', epsilon, ...
                'Standardize', true);
        convergenceChk(fold)=AutoSVM.ConvergenceInfo.Converged; 
        testResults_SVM=[testResults_SVM;predict(AutoSVM,cv_testData)];
        testActual_SVM=[testActual_SVM;cv_testData.mpg];
     end
  %%generate summary statistics and plots
    residual_SVM = testResults_SVM-testActual_SVM;
    AutoMSE_SVM=((sum((residual_SVM).^2))/size(testResults_SVM,1));
    AutoRMSE_SVM = sqrt(AutoMSE_SVM);
    if round(epsilon,4) == 0.8
        AutoRMSE_SVM
    end    
end

A copy of my dataset and code is attached or can be accessed via: https://drive.google.com/open?id=1ph1KwdGgFbmNVSwI63LREcXEDN3hkP_Q

Does anyone know a workaround to this? I am using Matlab R2017b

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Don Mathis il 28 Mar 2018

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/390441-fitrsvm-fails-if-epsilon-is-generated-using-a-for-loop#answer_312460

Apri in MATLAB Online

Can we simplify things a bit? Here's a version of your code that uses built-in validation instead of explicit loops.

The first loop below uses the range .1:.1:1.2. The second uses .8:.1:1.2, and the third uses the values .1,.2,...,1.2 individually.

In all 3 cases the cross-validation loss of the SVM is exactly the same. Notice that this is true even though there is roundoff error in the epsilons calculated in the first loop compared to the individual values in the last loop. So the SVM fitting is robust to tiny differences in epsilon (on the order of 1e-15 here).

So it doesn't look like SVM has a problem with epsilons generated in a loop.

clear all
%%read in auto-mpg.csv. This is a cleaned version of UCI dataset auto-mpg
data = readtable('auto-mpg.csv','ReadVariableNames',false);
VarNames = {'mpg','cylinders' 'displacement' 'horsepower' 'weight' 'acceleration' ...
    'modelYear' 'origin' 'carName'};
data.Properties.VariableNames = VarNames;
data = [data(:,2:9) data(:,1)];
data.carName=[];
rng('default')
c = cvpartition(data.mpg,'KFold',10);
LossesLoop1 = [];
LossesLoop2 = zeros(1,7);
LossesIndividual = [];
for epsilon = 0.1:0.1:1.2
    AutoSVM = fitrsvm(data,'mpg',...
        'CVPartition', c,...
        'KernelFunction', 'gaussian', ...
        'PolynomialOrder', [], ...
        'KernelScale', 5.5, ...
        'BoxConstraint', 100, ...
        'Epsilon', epsilon, ...
        'Standardize', true);
    LossesLoop1(end+1) = kfoldLoss(AutoSVM);
end
for epsilon = 0.8:0.1:1.2
    AutoSVM = fitrsvm(data,'mpg',...
        'CVPartition', c,...
        'KernelFunction', 'gaussian', ...
        'PolynomialOrder', [], ...
        'KernelScale', 5.5, ...
        'BoxConstraint', 100, ...
        'Epsilon', epsilon, ...
        'Standardize', true);
    LossesLoop2(end+1) = kfoldLoss(AutoSVM);
end
for epsilon = [.1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1 1.2]
    AutoSVM = fitrsvm(data,'mpg',...
        'CVPartition', c,...
        'KernelFunction', 'gaussian', ...
        'PolynomialOrder', [], ...
        'KernelScale', 5.5, ...
        'BoxConstraint', 100, ...
        'Epsilon', epsilon, ...
        'Standardize', true);
    LossesIndividual(end+1) = kfoldLoss(AutoSVM);
end
LossesLoop1
LossesLoop2
LossesIndividual
isequal(LossesLoop1, LossesIndividual)
isequal(LossesLoop2(8:end), LossesIndividual(8:end))
isequal(.1:.1:1.2, [.1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1 1.2])
[.1:.1:1.2] - [.1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1 1.2]

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Adam White il 28 Mar 2018

Great! Thanks for your answer. The problem disappears when I do the cross validation within fitrsvm. I still do not see why there is a problem with my original code or with using 'd_epsilon'. However I now have a way of doing my grid searches, so I am happy.

Accedi per commentare.

Answer 2

Walter Roberson il 24 Mar 2018

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/390441-fitrsvm-fails-if-epsilon-is-generated-using-a-for-loop#answer_311741

Apri in MATLAB Online

Observe:

>> V1 = 0:.1:1
V1 =
  Columns 1 through 5
                         0                       0.1                       0.2                       0.3                       0.4
  Columns 6 through 10
                       0.5                       0.6                       0.7                       0.8                       0.9
  Column 11
                         1
>> V2 = (0:10)/10
V2 =
  Columns 1 through 5
                         0                       0.1                       0.2                       0.3                       0.4
  Columns 6 through 10
                       0.5                       0.6                       0.7                       0.8                       0.9
  Column 11
                         1
>> V1-V2
ans =
  Columns 1 through 5
                         0                         0                         0      5.55111512312578e-17                         0
  Columns 6 through 10
                         0                         0                         0                         0                         0
  Column 11
                         0
The colon operator works by starting at the lowest value and adding the closest floating point representation of the increment value. Each time through the increment is added. If the increment is not exactly representable in floating point, or you span powers of 2 in the range, then you are going to have cumulative floating point round-off problems.

Except... if you check carefully, the bit pattern does not exactly match this description. I am not sure how, precisely, colon is currently implemented.

The take-away lesson here is that numbers generated by the colon operator are subject to floating point round-off and should not be used for comparison by equality.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Adam White il 24 Mar 2018

Modificato: Adam White il 24 Mar 2018

Apri in MATLAB Online

Unfortunately the inconsistent results are not caused by ‘floating point rounding-off’. This can be illustrated by removing the % from line 17 and changing line 26 so that:

d_epsilon = str2double(string(round(epsilon,2)))
    for fold = 1:10
        cv_trainingData = data(c.training(fold), :);   
        cv_testData = data(c.test(fold), :);
        AutoSVM = fitrsvm(cv_trainingData,'mpg',...
                'KernelFunction', 'gaussian', ...
                'PolynomialOrder', [], ...
                'KernelScale', 5.5, ...
                'BoxConstraint', 100, ...
                'Epsilon', d_epsilon, ...
                'Standardize', true);

d_epsilon is now equal to exactly 0.08. Yet the RMSE is still incorrectly calculated as being 2.7020. The problem appears to be with fitrsvm accepting any calculated value. If you then change the code so that:

AutoSVM = fitrsvm(cv_trainingData,'mpg',...
                'KernelFunction', 'gaussian', ...
                'PolynomialOrder', [], ...
                'KernelScale', 5.5, ...
                'BoxConstraint', 100, ...
                'Epsilon', 0.08, ...
                'Standardize', true);

the RMSE becomes 2.6868. So it seems that there is a bug in fitrsvm that prevents being able to perform grid searches.

Accedi per commentare.

fitrsvm fails if epsilon is generated using a for loop

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

fitrsvm fails if epsilon is generated using a for loop

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti