Fit Gaussian mixture model with weighted observations
25 views (last 30 days)
Hi everyone, looking at the help of fitgmdist, I cannot see that there is the possibility to weight observations. Is there a reason? Many functions of the Statistics and Machine Learning toolbox support weights. Does anyone have an idea how to include weights, or can anyone point me to an alternative?
Kaashyap Pappu on 26 Nov 2019
The function fitgmdist fits a distribution to a given data set. This data set generally has points belonging to the same class therefore the ‘weight’ parameter is not needed, since you are essentially just fitting a distribution model to given data.
Functions such as fitcknn, fitcsvm have weights because those are classification models. Weights become essential when data from multiple classes is present for training, but there is a class imbalance, that is data points for each class are not in equal proportion. To account for this imbalance, weights are used and are essential input arguments.
Hope this helps!
Jeff Miller on 26 Nov 2019
It's not exactly clear (to me either) what it means to weight the different observations in this context, but maybe you have something like this in mind:
You have observations X(1:n) with weights W(1:n). Let sumW = sum(W).
Make a new dataset Y with (say) 10000 observations consisting of
round(W(1)/sumW*10000) copies of X(1)
round(W(2)/sumW*10000) copies of X(2)
etc--that is, round(W(i)/sumW*10000) copies of X(i)
Now use fitgmdist with Y. Every Y value will be weighted equally, but the different X's will have weights approximately proportional to their original W values--because their numbers will be in those proportions.
I hope that is clear.