under-sample an imbalance dataset(data preprocessing)

Question

0 voti

I have an imbalance dataset that has totally 8528 signals (four classes of bio-signals) here is the numbers of signals in each classes

A:5050 - B:2456 - C:738 - D:284 . (as you can see numbers and distribution of different types of classes are not balance)

How can I under-sample my imbalance dataset in order to achieve more f1score while training the dataset with different methods of machine learning?

clear all
close all
clc
Data=importdata ('REFERENCE-original.csv') ;%labels of signals from signal number1 to signal number8528
 
%% features extraction
num_data=length(Data);
for number_data=1:num_data
  clc
    
  
    number_data
    num_data
    name=Data{number_data,1}(1:6);
    N_label=Data{number_data,1}(8);
  
    data=load (['D:\dataset\',name,'.mat']);
    
    signal=data.val;
  DATA{number_data}=signal;
  % normal=0    af=1   other=2  noise=3
    if N_label=='A'
    label(number_data)=0;
    end
    
        if N_label=='B'
    label(number_data)=1;
    end
    
       if N_label=='C'
    label(number_data)=2;
    end
     if N_label=='D'
    label(number_data)=3;
     end

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Sai Pavan il 17 Apr 2024

Apri in MATLAB Online

0 voti

Hello Yasaman,

I understand that you want to achieve more F1 score while training an imbalance dataset. The problem of imbalance dataset can be countered using class weights to modify the training.

Class weights define the relative importance of each class to the training process. To prevent the network being biased towards more prevalent classes, we can calculate the class weights that are inversely proportional to the frequency of the classes. Please refer to the below section of the example that illustrates the method of calculating class weights: https://www.mathworks.com/help/deeplearning/ug/sequence-classification-using-inverse-frequency-class-weights.html#:~:text=TTest%20%3D%20labelsImbalanced(idxTest)%3B-,Determine%20Inverse%2DFrequency%20Class%20Weights,-For%20typical%20classification

We can then create a custom loss function that takes predictions Y and targets T and returns the weighted cross-entropy loss for training a classification network as shown in the code snippet attached below:

lossFcn = @(Y,T) crossentropy(Y,T,NormalizationFactor="all-elements",Weights=classWeights, ...
    WeightsFormat="C")*numClasses;

Please refer to the below example that uses class weights to counter the imbalance dataset problem: https://www.mathworks.com/help/deeplearning/ug/sequence-classification-using-inverse-frequency-class-weights.html

Hope it helps!

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

under-sample an imbalance dataset(data preprocessing)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposte (1)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Tag

Community Treasure Hunt

under-sample an imbalance dataset(data preprocessing)

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposte (1)

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti