Asked by Clarisha Nijman
on 22 Oct 2018

Hello,

Given a series of values x, I want to estimate the probabilities of a range of numbers U, in(using) the probability distribution of the given series x. My code works for one value, but I need probabilities of a range, Can somebody give me some feedback please?

Thank you in advance.

This is the code:

%%Generate some data/series

x=randi([-2 50],25,1);

%Values/ranges of interest

U=[-100:100];

%define histogram and probability distribution of x

h = histogram(x);

h.Normalization = 'probability';%Changing count in probabilities

h.Values(U); %finding probabilities of range U

Answer by Bruno Luong
on 22 Oct 2018

Edited by Bruno Luong
on 22 Oct 2018

Accepted Answer

Use HISTCOUNTS then

N = histcounts(x, [-Inf, U, Inf]);

P = N(2:end) / sum(N)

Bruno Luong
on 22 Oct 2018

No that is not problem. There is 200 intervals from 201 edges.

- Interval 1, [-100,-99)
- Interval 2, [-99,-98)
- ...
- Interval 200, [99,100)

However if you want to include the "tail" [100,Inf) as well then go to the last.

N = N(2:end) / sum(N)

Just wonder why you keep the tail but not the head.

For such detail, may be you ought to read the doc of HISTCOUNTS carefully and adapt the code to your need rather than take my code literally.

Clarisha Nijman
on 22 Oct 2018

Ok, that is a good idea to study this topic again in Matlab, with this new insight you gave me today!

Thank a lot!

Clarisha Nijman
on 23 Oct 2018

x=randi([-3 3],10,1); U=[-5:5];

N = histcounts(x, [-Inf, U, Inf ]) prob = N(2:end) / sum(N)

%alternative code f=hist(x,U); prob=f/sum(f);

Now I fully understand your answer. With this small example it is clear. With the tails you are getting 2 extra intervals. An arbitrary value for U, let's say 2 is associated with interval <1,2] Such that we have eleven intervals, and since the left tail does not live in U, it is excluded, and that's why use (2:end) in the code. Thanks a lot!

Sign in to comment.

Answer by Torsten
on 22 Oct 2018

%%Generate some data/series

X=randi([-2 50],25,1);

%Values/ranges of interest

U=[-100:100];

X = sort(X)

[countsX, binsX] = hist(X)

cdfX = cumsum(countsX) / sum(countsX)

extrap_left = (min(U) > max(X));

extrap_right = (max(U) > max(X));

p_U_left = interp1(binsX,cdfX,min(U),'linear',extrap_left)

p_U_right = interp1(binsX,cdfX,max(U),'linear',extrap_right)

p_U = p_U_right - p_U_left

Torsten
on 22 Oct 2018

- Sort X.

- Count the number of occurences of each distinct element in X and divide by the number of elements of X. This gives you the empirical probability of the elements in X.

- For each u in U, look whether it is also an element of X. If no, assign probability 0, if yes, assign the empirical probability of u as an element in X.

Clarisha Nijman
on 22 Oct 2018

Torsten
on 22 Oct 2018

If you get discrete values from a random variable, say [ 1 2 4 5 6 ], how should it be possible to tell p({3}) ? (Hint: It's impossible).

In my opinion, the most reasonable estimate would be p=0 since it does not appear in the list.

If you know the distribution the values stem from, you can get a Maximum Likelihood Estimate (MLE) of the parameters describing the distribution. Having calculated these parameters, you can give estimates of probabilities for elements of your choice.

Sign in to comment.

Answer by Bruno Luong
on 22 Oct 2018

Edited by Bruno Luong
on 22 Oct 2018

not sure, is it what you want?

x=randi([-2 50],10000,1);

U=[-100:100];

h = histogram(x, U);

Clarisha Nijman
on 22 Oct 2018

Let's say x is the profit of a shop observed 20 times. and the values are: 2,5,7,2,20,25,35,15,6,-2,15,27,2,20,15,5,7,2,20,25

This can be associated with a probability distribution. And you can plot it.

Now it is asked to estimate the probability of the values in between, and also in the tails. U=-[5 -4 -3 -2 -1 0 1 2 .... 40]

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.