Hightly imbalanced class weights

2 visualizzazioni (ultimi 30 giorni)
Ève
Ève il 5 Ago 2025
Commentato: Ève il 14 Ago 2025
I’m building a semantic segmentation network, which contains 8 different classes. The occurence of elements for each of these classes vary greatly, so i’ve aimed to balance the weights associated with each class. Since this is a common problem, I’ve followed some examples proposed by the MATLAB help center, such as:
where the inverse of the frequency for each class is used.
I must specify that one of these classes is hightly represented (represents maybe around 90% of the samples) and it is the least important one, as I want my network to focus on the other 7 (identify them correctly).
In my case, to calculate my class weights, I tried the following (freq is the count of elements in the data for each subclasses):
weights = sum(freq) ./ freq;
norm_weights = weights ./ max(weights);
The resulting weight vector for each class is
norm_weights = [0.0001 0.0199 0.0380 0.1118 1.0000 0.0995 0.0576 0.0464]
where norm_weights(1) corresponds to my hightly represented class and norm_weights(5) corresponds to the least one.
When I train my network (with a smaller dataset than what I will train it with once this problem will be resolved), my training accuracy seems stagnant, my validation accuracy looks a bit unstable, and the training loss doesnt go down much. I also want to specify that I'm trying to reproduce results, where the research paper mentions that they've calculated the weight coefficients of the classes using the inverse of the population sample sizes, and that they range between 0 and 1, so unless there is a mistake elsewhere in my code, this should be possible.
What I would like to know is, are my training results normal considering there were only 300 iterations in my training***, or is there a similar but better way to calculate my class weights to avoid the unstable training results that I seem to be observing?
***on the research paper i'm trying to reproduce, it took around 2000 iterations to reach a training accuracy of 90%, while my training accuracy began at ~10% and finished at 11-12% after 300 iterations
Any help is greatly appreciated, thank you for your time!
  6 Commenti
Meg Noah
Meg Noah il 7 Ago 2025
Are these 3-band or 3-color satellite Earth image windows/chips? If so, are the data from someplace like https://earthexplorer.usgs.gov/ - if so, which satellite?
Ève
Ève il 7 Ago 2025
the data is from https://search.earthdata.nasa.gov/search/granules?p=C2667982885-LARC_ASDC (it's CALIOP) and the 3 refers to 3 different spectral quantities derived from the satellite data

Accedi per commentare.

Risposte (1)

saish
saish il 14 Ago 2025
Hey Eve,
Yes, it is quite normal for training to be slow and accuracy to be low after only 300 iterations, especially with a highly imbalanced dataset. But this is not necessarily true and will depend on the results obtained on 2000 or more iterations and when you use the entire dataset.
In this Accuracy metrics can be misleading in imbalanced settings, since the network can just predict the majority class and still get high accuracy. So I think main issue here is to get the weights in normalized range.
It is better to use the any of the following and check the results:
  1. Log inverse frequency to reduce the difference between the class weights.
weights = log(sum(freq) ./ freq);
norm_weights = weights ./ max(weights);
2. Median frequency balancing which is most commonly used for image segmentation tasks.
median_freq = median(freq);
weights = median_freq ./ freq;
norm_weights = weights ./ max(weights);
3. Focal loss, which down-weights majority class examples and focuses learning on minority class.
As I can see in the comments, you have used transformed the data, which worsen up the issue. Instead of random transformations, consider augmenting only those image patches that contain a significant portion of rare classes. Apply augmentations to these patches and then reinsert them into the dataset, or use them as additional training samples. This ensures that rare classes are not lost during augmentation.
  1 Commento
Ève
Ève il 14 Ago 2025
Thank you for your response! I will try to train it with my imbalanced weight values on my hole dataset and if it doesn't work, i'll try one of your suggestions. Currently i'm having problems with my server so this is blocking my progress

Accedi per commentare.

Categorie

Scopri di più su Reference Applications in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by