Score transform for RUSBoost in fitcensemble

Question

the cyclist il 12 Apr 2024

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2106231-score-transform-for-rusboost-in-fitcensemble

Commentato: the cyclist il 11 Mag 2024

The documentation for the predict function of fitcensemble lists the score transforms (to convert scores to probabilities) for the following model methods:

Bag (none)
AdaBoostM1 (doublelogit)
GentleBoost (doublelogit)
LogitBoost (doublelogit)

But it does not list a score transform for several other possible fitcensemble methods. Are these documented somewhere else?

I'm currently most interested in RUSBoost, because that is the method I am using.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Sarthak il 24 Apr 2024

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2106231-score-transform-for-rusboost-in-fitcensemble#answer_1446986

Modificato: Sarthak il 2 Mag 2024

Apri in MATLAB Online

Hey,

The RUSBoost is a boosting algorithm that uses random undersampling, which is an effective method for the cases where the classes are imbalanced. The actual boosting mechanism used under the hood is still the multi-class AdaBoost mechanism as mentioned in the following MathWorks documentation:

https://www.mathworks.com/help/stats/ensemble-algorithms.html

If we don't have a binary classification problem, the score transforms logit, or doublelogit, do not convert the scores of the ensembles into to 0-1 range; the operations for these score transforms are given as

out = 1./(1+exp(-in));
out = 1./(1+exp(-2*in));

That is why the idea of using ScoreTransform as doublelogit is mentioned for only the binary classification models.

Now, if you look at the Multi-class AdaBoost paper (https://hastie.su.domains/Papers/samme.pdf), in particular, Eq. 4 and the equation right below it, you can see how the scores can be converted to probabilities. We can basically use the softmax function to do it. The way it would work is, we would pass (1/K-1)*s_i, where K is the number of classes and i is the score corresponding to ith class, with i = 1,...,K, into the softmax function. For your convenience, I've written this function:

function out = mysoftmax(in)
% number of columns of in is the number of classes
K = size(in,2);
in = (1/(K-1)).*in;
inmax = max(in,[],2);
in = in-inmax;
% in = bsxfun(@minus,in,inmax);
numerator = exp(in);
denominator = sum(numerator,2);
denominator(denominator == 0) = 1;
out= numerator./denominator;
end

After training the ensemble, the user can create this MATLAB file and then set the ScoreTransform to this function (note that ScoreTransform can be set to a function handle):

mdl.ScoreTransform = @mysoftmax;

Once you do that and call predict, the scores will not be normalized to lie between 0 and 1, and it will be using the conditional probability formulation provided in the aforementioned paper.

I hope this helps!

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Sarthak il 2 Mag 2024

Apri in MATLAB Online

Hi,

First of all, you're correct; those two lines do the same thing. It should be written this way:

function out = mysoftmax(in)
% number of columns of in is the number of classes
K = size(in,2); 
in = (1/(K-1)).*in;
inmax = max(in,[],2);
in = in-inmax;
numerator = exp(in);
denominator = sum(numerator,2);
denominator(denominator == 0) = 1;
out = numerator./denominator;
end

Note that this code can potentially be written in a more efficient way; it's not meant to be production code.

Softmax formula is given in https://en.wikipedia.org/wiki/Softmax_function. Considering that for a binary classification, the scores for 2 classes are negatives of each other (i.e., if f_first is the score for first class and f_second is the score for the second class, f_second = -f_first), we can write the softmax output in the following way:

softmax(f_first) = e^(f_first)/(e^(f_first)+e^(f_second)) = e^(f_first)/(e^(f_first)+e^(-f_first))

Now, multiply all terms by e^(-f_first):

softmax(f_first) = 1/(1+e^(-2*f_first)).

We can do a similar thing for the second class, but another way to calculate it is to subtract it from 1 since they add up to 1.

So, yes; softmax reduces to doublelogit for binary classification. I ran this code to see if the answers match:

>> load ionosphere;
>> obj = fitcensemble(X,Y,Method = 'AdaBoostM1',ScoreTransform = 'doublelogit');
>> [l1,s1] = predict(obj,X);
% Using the updated softmax function
>> obj.ScoreTransform = @mysoftmax;
>> [l2,s2] = predict(obj,X);
>>  max(abs(s1-s2))

1.0e-86 *
0.0503    0.8043

So the error is negligible between the scores from the model that uses doublelogit and the one that uses softmax, as expected.

The above softmax function is not doing something different than doublelogit; after removing the redundant line, I think they're doing the same thing.

Note that this function is subtracting maximum value first and then calculating softmax values to avoid overflows:

https://stats.stackexchange.com/questions/304758/softmax-overflow

Hope this helps!

the cyclist il 11 Mag 2024

Thanks for the additional comments, which are very helpful.

Accedi per commentare.

Score transform for RUSBoost in fitcensemble

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Score transform for RUSBoost in fitcensemble

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente