How to best determine the probability of a distribution given an outlying observation?

Question

Tim il 12 Set 2012

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/47940-how-to-best-determine-the-probability-of-a-distribution-given-an-outlying-observation

Hi,

I have a classification problem. I have a set of data from a reference process (let's call that "known") and a set of data from a second process (let's call that "test").

Hypothesis 0 is that the test sample came from an identical process as the "known", and will therefore have the same distribution.

Hypothesis 1 is that the test sample came from a different process. However, here is the catch: for all but one sample, this process has an identical distribution to the "known". Just one sample will be "suspiciously" low.

I will add a picture to better explain:

In this case, the red histogram is the reference "known" distribution. The blue histogram is the questioned "test" distribution. In this case, I already know that the test came from a different process. It might not be completely clear due to the overlaying, but it can be seen that the distributions pretty well match, except for a single blue sample which is suspiciously low.

What I need now is to take each distribution and work out some method of returning a probability that the extremely low blue value would be observed given the distribution is the "known" distribution. I know how to calculate the probability of a particular single observation, but how do I properly balance this with the number of observations? Would just a KS test be appropriate? It strikes me as stats 101, but it's been a while, and I don't want to get this wrong.

Thanks in advance.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Ilya il 12 Set 2012

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/47940-how-to-best-determine-the-probability-of-a-distribution-given-an-outlying-observation#answer_58632

Modificato: Ilya il 12 Set 2012

If you know the reference distribution analytically, you can compute its cdf at the smallest observed value. Suppose this cdf value is p. The p-value for your test would be then one minus the binomial probability of not observing any successes in N trials, where N is the sample size and p is the success probability. That is, it would be 1-(1-p)^N.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Tim il 19 Set 2012

Oh, so obvious now! Thank you. I was over-thinking it with the variance of the variance and all that jazz. My only excuses are lack of sleep and rusty stats - honestly, I avoid them when I can.

Accedi per commentare.

Answer 2

per isakson il 12 Set 2012

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/47940-how-to-best-determine-the-probability-of-a-distribution-given-an-outlying-observation#answer_58549

See: FBD - "Find the Best Distribution" tool in the File Exchange

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Tim il 12 Set 2012

Apri in MATLAB Online

Thanks for your answer, per, but I'm not sure that this is what I'm looking for. I'll try and clarify with a simple code example.

KnownSet = randn(1000,1);
TestSet1 = randn(100,1);
TestSet2 = [randn(99,1); -4];

In this case, I know all three sets of data are mostly drawn from the same Gaussian distribution. However, TestSet2 has an outlier. The value -4 is very unlikely, and I'm hoping to use that single outlying value to provide a probability that each TestSet is purely from the same distribution as KnownSet. In this case, TestSet1 should have a high 'p-value', and TestSet2 should have a low 'p-value' and be rejected. I use the term p-value, but there might be something else.

FBD would help me determine the distribution of KnownSet (which I can assume is at least for the most part the same as that of the TestSets), but that is only the first step. How do I go from there to determining how likely/unlikely the set of observations is, given the distribution, and given the outlier?

Accedi per commentare.

How to best determine the probability of a distribution given an outlying observation?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

How to best determine the probability of a distribution given an outlying observation?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti