Solving for the parameters of a distribution

Question

RP il 14 Feb 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1911955-solving-for-the-parameters-of-a-distribution

Commentato: the cyclist il 14 Feb 2023

If I know 2 percentiles, say the 1/40 and the 1/100, then how do I solve for the parameters of a distribution e.g. lognormal. Is there an in built function in matlab that does this?

e.g. say the 1/40 percentile is 217,800 and the 1/100 percentile is 1,250,000. What are the parameters of the lognormal distribution that fits through these points.

Thanks

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

the cyclist il 14 Feb 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1911955-solving-for-the-parameters-of-a-distribution#answer_1170865

Modificato: the cyclist il 14 Feb 2023

Apri in MATLAB Online

Using the formula for the CDF, from the Wikipedia page that @Torsten referenced:

syms mu sig
x1 = 217800;
x2 = 1250000;
eqn1 = erfc(-(log(x1)-mu)/(sig*sqrt(2)))/2 == 1/100;
eqn2 = erfc(-(log(x2)-mu)/(sig*sqrt(2)))/2 == 1/40;
sol = vpasolve(eqn1,eqn2,mu,sig); % solve function did not manage this symbolically, and reported vpasolve result
sol.mu
ans = 23.385919484067596489312881405711
sol.sig
ans = 4.7691005796632736971218010173524
% Verify
double(logncdf(x1,sol.mu,sol.sig))
ans = 0.0100
double(logncdf(x2,sol.mu,sol.sig))
ans = 0.0250

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

the cyclist il 14 Feb 2023

Modificato: the cyclist il 14 Feb 2023

The 1/100 quantile must be smaller than the 1/40, by definition of the cumulative distribution function. So, I swapped the values.

Instead, maybe you meant the 99th and 97.5th percentile?

Accedi per commentare.

Answer 2

John D'Errico il 14 Feb 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1911955-solving-for-the-parameters-of-a-distribution#answer_1170870

Modificato: John D'Errico il 14 Feb 2023

Apri in MATLAB Online

It is rarely a good idea to try to infer distribution parameters from deep tail percentiles. Those points deep in the tails would be determined from only a few data points. As such, they are very poorly known. And that means your estimates of the distribution parameters will be highly variable, highly sensitive to noise.

Even worse,by the use of two quantiles very near to each other, you are effectively using only a very few points way into the fringes, and they are the same data points used for both quantiles.

Instead you will be far better off using a quantiles that would be closer to the mode of the distribution. For example, you might have chosen the 50 and 75% points.

Is it a great idea to use only two quantiles to try to infer the distribution parameters? Probably not. But if you are going to do something, you want to choose a scheme that is robust to noise.

As an example, I'll choose a random sampling of 1000 numbers from a lognormal distribution. Then compute several sets of quantiles, and see how well I can recover the original distribution parameters for the lognormal. Next, I'll do this operation multiple times, and we can see how robust is the operation, to infer distribution parameters from two quantiles, and see how well the computation performs, depending on the actual quantiles chosen. Surely this is a fair test.

The actual distribution parameters for the lognormal are irrelevant, since that does nothing to change such a test. As such, I'll choose a standard lognormal for my computations. A standard lognormal has distribution parameters of mu=0, sigma=1. Again, all that matters is how well I can predict those numbers.

For the computational engine, I used the code written by @the cyclist, which seems reasonable (though really slow, since it uses vpasolve). fsolve would be faster, but vpasolve will do entirely well for my purposes. First, does the code work?

X = lognrnd(0,1,[1,1000]);
P = [50 75];
X = prctile(X,P)
X = 1×2
    0.9591    1.8413
[muest,sigest] = lognest(P,X)
muest = -0.0417
sigest = 0.9669

As you can see, this code seems to work. It reproduces the population parameters reasonably well.

Next, a simulation, to test the performance of the scheme, using various quantile sets. I'll choose two quantile sets, one deep in the tails, and the second using quantiles that will live where the distribution has significant mass.

Qset1 = [50 75];
Qset2 = [97.5 99]; % The set chosen by RP
Nsim = 1000; % Total number of sims to be performed. relatively small, but adequate
SimSize = 1000; % the number of sampels taken within each simulated run
muest = zeros(2,Nsim);
sigest = zeros(2,Nsim);
for i = 1:Nsim
  Xsim = lognrnd(0,1,[1,1000]);
  
  Xset1 = prctile(Xsim,Qset1);
  Xset2 = prctile(Xsim,Qset2);
  
  [muest(1,i),sigest(1,i)] = lognest(Qset1,Xset1);
  [muest(2,i),sigest(2,i)] = lognest(Qset2,Xset2);
end
% Next, plot histograms of the population parameter estimates
hist(muest',100)
title('Histogram of estimates for mu')

hist(sigest',100)
title('Histogram of estimates for sigma')

In each histogram, you can see the two sets of estimates for the parameters mu and sigma. In blue are the estimates when the 50% and 75% quantiles were used to infer the lognormal parameters. In yellow, you see the estimates of those same two parameters, but this time using the 97.5% and the 99% quantiles from the data.

function [muest,sigest] = lognest(perc,x)
  % Given two quantile amounts, and the locations for x of those measured
  % quantiles from a sample, return the lognormal parameters mu, sigma
  % perc is assumed to be scaled as a percentile, so in the interval
  % [0,100].
  
  syms m s real
  eqn1 = erfc(-(log(x(1))-m)/(s*sqrt(2)))/2 == perc(1)/100;
  eqn2 = erfc(-(log(x(2))-m)/(s*sqrt(2)))/2 == perc(2)/100;
  [muest,sigest] = vpasolve(eqn1,eqn2,[m,s]); 
  muest = double(muest);
  sigest = double(sigest);
end

You need ot understand that the scheme you want to use to infer the lognormal parameters using deep tail quantiles is a poor one. It is highly sensitive to random perturbations from only a few data points. As such, it is not at all robust. Worse, it will also be highly sensitive to deviations from a true lognormal distritribution, since it is deep in the tails that a lognormal distribution will be most different from some other distribution.

Finally, using only two quantiles to infer the population parameters is itself arguably a poor scheme, since you are not really using the entire set of data. A better method would be maximum likelihood estimation, where all of the data can be employed, instead of effectively only a few data points.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

the cyclist il 14 Feb 2023

Very nice analysis, @John D'Errico. For me, it brought back not-so-fond memories of when I tried to explain to finance execs why their estimates of 95th percentile were garbage, because of the small sample (typically less than N=100) used to estimate it.

FYI, I agree that fsolve was the better tool here. But I don't have the Optimization Toolbox, and didn't want to try to do the coding here instead of my local machine. :-)

Accedi per commentare.

Answer 3

Torsten il 14 Feb 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1911955-solving-for-the-parameters-of-a-distribution#answer_1170830

The formula for the quantiles of the lognormal distribution as a function of p, mu and sigma is given here:

https://en.wikipedia.org/wiki/Log-normal_distribution

Thus 2 equations in 2 unknowns.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Solving for the parameters of a distribution

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (2)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Solving for the parameters of a distribution

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (2)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti