Generating PDF and Overplotting from Subset of Data Using Gaussian Mixture Model
3 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Assume there is a data set that contains n-measurments (In this case n=2) and that overlap. I wish to discriminate a subset of the total set by identifying the greater proportion and then applying a probability density function fit onto the distribution. I tried doing just that by applying the fitgmdist function onto the data set, knowing n=2 and then chose the higher "componentproportion" as the true subset I wish to keep the fit for. I thought what I could do was apply makedist using the mu and sigma from the distirbution I chose (TDist), then create a probability density function using pdf so I can overplot it on top of the histogram data.
clc; clear;
DataSize = 10000;
linLength = 1000;
Data1 = normrnd(-2,1,[1,DataSize]);
Data2 = normrnd(3,2,[1,DataSize]);
Data = [Data1,Data2]';
Dist = fitgmdist(Data,2); %Create fits for both distributions
DistNdex = find(Dist.ComponentProportion==max(Dist.ComponentProportion)); %Find distribuition with greater contribution
TDist.Mu = Dist.Mu(DistNdex); %Average
TDist.Sigma = Dist.Sigma(DistNdex); %Std. Dev
PD = makedist('Normal','mu',TDist.Mu,'sigma',TDist.Sigma); %Create normal distribution using mu/sigma
xPD = linspace(TDist.Mu - 3*TDist.Sigma,TDist.Mu + 3*TDist.Sigma,linLength); %Create linspace that spans 3 sigma
pdfValues = pdf('Normal',xPD,TDist.Mu,TDist.Sigma); %Create non-normalized pdf over defined linspace
NormPdfValues = normpdf(xPD,TDist.Mu,TDist.Sigma); %Create normalized pdf over defined linspace
%Plotting
figure
histogram(Data)
hold on
plot(xPD,pdfValues,'r','LineWidth',5)
plot(xPD,NormPdfValues,'g','LineWidth',5)
hold off
But my issue here is that the max y-value for the fit is incredibly small wrt the data (Regardless of whether it is normalized or not). Why is this and how do I specify what the max y-value should be for it's associated component distribution? I'm thinking I'm lacking a piece of stats knowledge here rather than having a Matlab issue.
PS - I know this code gives an error when I ran it within the browser. Not sure why it doesn't recognize the substructure "Mu" but it works just fine and runs on my local without issues.
1 Commento
Risposta accettata
Chris
il 1 Dic 2022
Modificato: Chris
il 1 Dic 2022
I believe the area under the curve of these PDFs is 1.
One way to work around that (though probably not the most correct way) would be to scale the pdf by its max value, to the maximum of the histogram.
DataSize = 10000;
linLength = 1000;
Data1 = normrnd(-2,1,[1,DataSize]);
Data2 = normrnd(3,2,[1,DataSize]);
Data = [Data1,Data2]';
Dist = fitgmdist(Data,2); %Create fits for both distributions
DistNdex = find(Dist.ComponentProportion==max(Dist.ComponentProportion)); %Find distribuition with greater contribution
TDist.Mu = Dist.mu(DistNdex); %Average
TDist.Sigma = Dist.Sigma(DistNdex); %Std. Dev
PD = makedist('Normal','mu',TDist.Mu,'sigma',TDist.Sigma); %Create normal distribution using mu/sigma
xPD = linspace(TDist.Mu - 3*TDist.Sigma,TDist.Mu + 3*TDist.Sigma,linLength); %Create linspace that spans 3 sigma
pdfValues = pdf('Normal',xPD,TDist.Mu,TDist.Sigma); %Create non-normalized pdf over defined linspace
% NormPdfValues = normpdf(xPD,TDist.Mu,TDist.Sigma); %Create normalized pdf over defined linspace
%Plotting
figure
h = histogram(Data);
hold on
mult = max(h.BinCounts)/max(pdfValues);
plot(xPD, mult*pdfValues,'k--','LineWidth',2)
2 Commenti
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Startup and Shutdown in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!