find the column that most similar uniform random matrix 138 X 10000 and calculate the number that occurs and shows the column which occurred of this calculated

Question

Suwicha Sokul il 11 Giu 2019

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/466578-find-the-column-that-most-similar-uniform-random-matrix-138-x-10000-and-calculate-the-number-that-oc

Chiuso: MATLAB Answer Bot il 20 Ago 2021

Could you help me. I want to use a=randi([0,1],138,10000). And I want to find probability that occurred similar even by calcullate in percentage. Moreover, I want to know what the column that occurred the most similar data.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Eric Paul il 16 Nov 2019

Did the code work?

Questa domanda è chiusa.

Answer 1

James Browne il 12 Giu 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/466578-find-the-column-that-most-similar-uniform-random-matrix-138-x-10000-and-calculate-the-number-that-oc#answer_378806

Apri in MATLAB Online

Greetings,

I am not sure that I fully understand your question but I think you are wanting to find the column of a matrix of randomly generated 1s and 0s which most closely matches a uniform distribution?

Additionally, I am not an expert in statistics and it has been a long time since I have taken a statistics class but I think I can atleast help point you in the right direction. If you consider each column in the matrix as an experiment and each value in a given column as a trial, then you would be looking at a series of binomial trials.

When working on this solution, I referenced the wikipedia page for the binomial distribution which can be found here:

https://en.wikipedia.org/wiki/Binomial_distribution

If each trial is independant, then I believe the experimental data should follow the binomial distribution. If we assume that the binomial distribution is the ideal case, then in order to determine which column of data from the matrix is a best match to the binomial distribution, we can determine the number of successes (1s) in a given column and then calculate the probability of that number of successes occuring with the binomial probability density function, the description for which can be found here:

https://www.mathworks.com/help/stats/binopdf.html

The solution to the problem then, as I see it (and I could be wrong so please double check), will be to first count the number of succeses in each column, then using the probability mass function for the binomial distribution, calculate the probability that each of the total number of successes occured. In this way, the column which has the number of successes which corrosponds to the highest probability of occurence (as determined by the probability mass function) will be the best match to a uniform, binomial distribution.

The following is a script that I wrote to accomplish the previously described solution:

a = randi([0,1],138,10000);
%Set theoretical probability of success for one trial
p = 0.5;
%Determine the number of trials in each column vector (number of rows)
nTrials = size(a,1);
%Determine the number of experiments in the matrix (number of columns)
nExperiments = size(a,2);
%Preallocate memory for storing the number of successes in each column
nSuccesses = zeros(1,nExperiments);
%Preallocate memory for storing the probability of total successes for each
%column
successProbabilities = zeros(1,nExperiments);
%Calculate the number of successes in each column, each iteration if i
%represents the evaluation of a column
for i = 1:nExperiments
    nSuccesses(i) = sum(a(:,i));
end
%Calculate the probability that the total number of succeses should occur
%for each column, using the binomial probability density function, each
%iteration of i represents the calculation of the probability of the
%occurence of a measured number of succeses in an experiment (column)
for i = 1:nExperiments
    successProbabilities(i) = binopdf(nSuccesses(i),nTrials,p);
end
%Determine the theoretically most likely number of successes to occur for
%the given number of trials and plot the theoretical evaluation so that the
%theoretical results can be doublechecked
possiblesuccesses = 0:nTrials;
y = binopdf(possiblesuccesses,nTrials,p);
plot(possiblesuccesses,y)
xlabel('Number of Successes')
ylabel({'Probability of Occurence in a Binomial Distribution','With 138 Trials and a Probability of Success of 50%'})
title('Plot of Binomial Probability Values')
[highestTheoProb,idxT] = max(y);
mostLikelyNSuccess = possiblesuccesses(idxT);
%Print results of theoretical analysis to command window
fprintf('The theoretically most likely number of successes to occur is: %2i\n',mostLikelyNSuccess)
fprintf('The probability that this number of successes will occur is: %5.4f (%5.4f percent)\n\n',highestTheoProb,100*highestTheoProb)
%Determine the highest probability of occurence among the measured
%successes
[highestMeasuredProb,idxM] = max(successProbabilities);
bestMatchNSuccesses = nSuccesses(idxM);
%Search for duplicate occurences of the best match of measured number of 
%successes to the theoretically mostl likely number of successes to occur
count = 0;
for i = 1:nExperiments
    if(nSuccesses(i) == bestMatchNSuccesses)
        count = count + 1;
        bestMatchLocations(count) = i;
    end
end
%Determine the number of occurences (columns) of the best matched number of 
%successes
nOccurences = length(bestMatchLocations);
%Display results
if (bestMatchNSuccesses == mostLikelyNSuccess)
    fprintf('The theoretical number of successes occured %5i times, the column numbers (for matrix "a")\n',nOccurences)
    fprintf('in which the best match successes occured are stored in the variable "bestMatchLocations"\n')
end
if (bestMatchNSuccesses ~= mostLikelyNSuccess)
    fprintf('Theoretical most likely number of successes not found in measured data\n')
    fprintf('The closest match was %2i successes.\n',bestMatchNSuccesses)
    fprintf('The probability of occurence of the best match successes was %5.4f',highestMeasuredProb)
    fprintf('The closest match occured %5i times, the column numbers (for matrix "a")\n',nOccurences)
    fprintf('in which the best match successes occured are stored in the variable "bestMatchLocations"\n')
end

Note that the figure which this script generates will not change unless the dimensions of the matrix "a" change. As described in the output of the script, the column numbers where the measured number of successes had the highest probability of occurence (according to the binomial probability density function and the assumption that the random number generator has a 50% chance of assigning a 1 to a given location in the matrix "a") are stored in the variable "bestMatchLocations".

As such, the values that are stored in te variable "bestMatchLocations" should corrospond to the columns of the matrix "a" which are the closest fit to a nominal binomial distribution.

I hope this at least points you in the right direction~

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Suwicha Sokul il 3 Ott 2019

Thank you very much

find the column that most similar uniform random matrix 138 X 10000 and calculate the number that occurs and shows the column which occurred of this calculated

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Tag

Prodotti

Release

Community Treasure Hunt

find the column that most similar uniform random matrix 138 X 10000 and calculate the number that occurs and shows the column which occurred of this calculated

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Tag

Prodotti

Release

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti