Fixing biased random number generation

Question

Jose David il 3 Nov 2024 alle 8:16

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2163685-fixing-biased-random-number-generation

Commentato: Jose David il 4 Nov 2024 alle 20:53

Hello. I am currently generating a dataset for a machine learning model, however I am having trouble with one of the variables.

First I generate 3000 values of a variable H between 100 and 1000.

Then I need a variable a that can have any value between 10% and 90% of the corresponding value of H. The problem is that when I look at the histogram of a it is clearly biased and I can´t find out why.

How could I generate an unbiased a?

Here is a piece of the code I am currently using:

N = 3000;
H = 100 + (1000 - 100) * rand(N, 1);
a_min = 0.1 * H; 
a_max = 0.9 * H;  
a = a_min + (a_max - a_min) .* rand(N, 1);

This is the histogram I am getting:

Some additional context:

The goal is to develop a machine learning (ML) model that is able to predict stress concentration factors (SCF) in a sheet with a hole in it. This has already been solved analitically, but I am doing this as an exercise. The SCF depends on H and a, which are geometrical constraints, they define the width of the plate and position of the hole respectively. So you can see why a is dependant on the values of H. The range 0.1*H to 0.9*H is just to make sure that hole won´t be located outside of the plate and that hole has a reasonable size. Of course there are other geometrical constriants and additional steps before getting to the ML part, but I believe this is enough for this post.

2 Commenti
Mostra NessunoNascondi Nessuno

Bruno Luong il 4 Nov 2024 alle 7:15

Modificato: Bruno Luong il 4 Nov 2024 alle 8:02

Apri in MATLAB Online

The constraint

% 0.1 H <= a <= 0.9 H

Is conditional probability so a must be "biased". It is simply an innevitable fact.

This is showed in the rejection random method code started from a uniform unconditional probabilty:

N = 3000;

H = 100 + (1000 - 100) * rand(N, 1);

clear a

for k = N:-1:1

a(k) = randone(H(k), 1000);

end

histogram(a)

function a = randone(H, maxa)

L = 0.1*H;

U = 0.9*H;

while true

a10 = maxa*rand(1,10);

k = find(a10 > L & a10 < U, 1, 'first');

if ~isempty(k)

a = a10(k);

return

end

Jose David il 4 Nov 2024 alle 20:53

Very interesting, I didn´t think about in that way. Thank you very much.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

埃博拉酱 il 3 Nov 2024 alle 8:59

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2163685-fixing-biased-random-number-generation#answer_1540200

Modificato: 埃博拉酱 il 4 Nov 2024 alle 1:38

It's a math problem. If you try to take a uniform random variable H as an upper bound on another uniform random variable a, then a is mathematically impossible to still be uniformly distributed. You have to rethink your whole problem.

What on earth are you really trying to do?

Update 20241104

Based on your description, I think it is likely that H should not be uniformly distributed. Since your H seems to correspond to a certain type of item that exists in reality, the probability distribution of H should also refer to that item in reality (usually it should be β or other normal-like distribution).

Secondly, from your question, there is no need to pursue that a must be uniformly distributed. Since a cannot be greater than H, the probability distribution of a should be skewed to the smaller side.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Jose David il 4 Nov 2024 alle 20:52

Yes you are right. I understand now. When H takes a value closer to 100 (lower limit) the range of values that a can take is also smaller and near the global lower limit of a. So this kind of "forcing" a to have smaller values. a cannot take a bigger value the smaller H is. So it is, as you said, skewed to the smaller side.

Thank you very much.

Accedi per commentare.

Answer 2

Bruno Luong il 3 Nov 2024 alle 12:34

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2163685-fixing-biased-random-number-generation#answer_1540230

Apri in MATLAB Online

% a_max .* rand(N, 1)

is a product of 2 independent uniform variables, it is NOT a uniform random variable as you expect since a_max is not constant.

x1 = rand(1,10000);

x2 = rand(1,10000);

x1Xx2 = x1.*x2;

histogram(x1Xx2)

Ths alone makes your "intuiltion" falling apart.

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Jose David il 3 Nov 2024 alle 20:33

Hello, I added some additional information to my original question.

Thank you, I undestand.

I can´t use max(H) and min(H) because there may be cases where a could be bigger than its corresponding H, which means the hole would be located outside of the plate. H(i) must be used as a parameter to establish the range of values that a(i) can take. Is there another way to do this?

Bruno Luong il 3 Nov 2024 alle 20:59

Modificato: Bruno Luong il 3 Nov 2024 alle 21:17

In which way you need a to be uniform in your ML problem? Does it really matter? Why?

Looks like what you want is not possible (What on earth are you really trying to do? as some of use keep asking)

May be what you call "bias" is simply a wrong expectation.

One thing people always want is that conditional probabimity have the same distribution as unconditional one. This assumption is always wrong.

Accedi per commentare.

Answer 3

John D'Errico il 3 Nov 2024 alle 12:42

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2163685-fixing-biased-random-number-generation#answer_1540240

Modificato: John D'Errico il 3 Nov 2024 alle 15:02

What is the goal of this? That is a good question, as asked by @埃博拉酱.

You should recognize that the result of this operation will not be uniformly distributed. Does that matter? Or is it just a surprise to you, that starting with uniform random variables, you expect the result to also be uniform? Is there a reason why you need the result to be uniform?

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Jose David il 3 Nov 2024 alle 20:23

Hello, I added some additional information to the post.

I understand. In order to have an unbiased ML model I need unbiased variables. I guess I could begin the training with an slightly biased dataset, however I would like to have one as unbiased as posible. Is there really no other way to this?

Accedi per commentare.

Fixing biased random number generation

2 Commenti
Mostra NessunoNascondi Nessuno

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (2)

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Fixing biased random number generation

2 Commenti Mostra NessunoNascondi Nessuno

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (2)

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

2 Commenti
Mostra NessunoNascondi Nessuno

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti