I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

Question

0 voti

% Example data matrix (2000 rows and 9 columns)
data_matrix = randi([0, 10], 2000, 9); % Replace this with your actual data
% Example empirical frequency (a vector with 9 elements)
empirical_frequency = [10, 20, 30, 40, 50, 60, 70, 80, 90]; % Replace this with your actual empirical frequency
% Initialize vectors to store results
chi_squared_results = zeros(2000, 1);
p_values = zeros(2000, 1);
for i = 1:2000
    % Select the data for row i
    row_i = data_matrix(i, :);
    
    % Calculate the chi-squared statistic manually
    chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency);
    
    % Determine the degrees of freedom (df)
    df = length(row_i) - 1;
    
    % Calculate the p-value using the chi-squared distribution
    p = 1 - chi2cdf(chi_squared, df);
    
    % Store the results in vectors
    chi_squared_results(i) = chi_squared;
    p_values(i) = p;
end
unique(p_values)
ans = 0

The problem is that chicdf return 0.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

PEDRO ALEXANDRE Fernandes il 2 Nov 2023

Yes that is the problem

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

dpb il 2 Nov 2023

Modificato: dpb il 2 Nov 2023

Apri in MATLAB Online

2 voti

% Example data matrix (2000 rows and 9 columns)

data_matrix = randi([0, 10], 2000, 9); % Replace this with your actual data

% Example empirical frequency (a vector with 9 elements)

empirical_frequency = [10, 20, 30, 40, 50, 60, 70, 80, 90]; % Replace this with your actual empirical frequency

% Initialize vectors to store results

chi_squared_results = zeros(2000, 1);

p_values = zeros(2000, 1);

for i = 1:2000

% Select the data for row i

row_i = data_matrix(i, :);

% Calculate the chi-squared statistic manually

chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency);

% Determine the degrees of freedom (df)

df = length(row_i) - 1;

% Calculate the p-value using the chi-squared distribution

p = 1 - chi2cdf(chi_squared, df);

% Store the results in vectors

chi_squared_results(i) = chi_squared;

p_values(i) = p;

end

histogram(chi_squared_results)

%unique(p_values)

[min(chi_squared_results) max(chi_squared_results)]

ans = 1×2

321.3518 421.4471

chi2cdf(ans, df)

ans = 1×2

1 1

What would you expect when compare a random vector from 1:10 against an expected cumulative distribution frequency of 10:10:100?

As the above indicates, the minimum ch-square statistic calculated was 323; that's so far from being within the range of a realistic test statistic the actual percentage less than unity underflows the precision of a double and so is returned as identically 1. Try something more like

row_i=randi([0, 100], 1, 9)  % test vector between 0-100 instead 0-1
row_i = 1×9
    97    27     9    47    15    34    43    54    38
chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)
chi_squared = 859.9504
p = 1 - chi2cdf(chi_squared, df)
p = 0

That's still way out of reason; by chance for the given vector the essentially full cdf value turned out to be in the first element; not exactly surprising it ends up with identically zero estimate.

Now, keep the same vector but sort it to get what could be an approximation to a cdf...

row_i=sort(row_i)
row_i = 1×9
     9    15    27    34    38    43    47    54    97
chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)
chi_squared = 26.7983
p = 1 - chi2cdf(chi_squared, df)
p = 7.6598e-04

Now, the above random vector starts out not too bad in comparison to exected, with several quite low values in the 50:80 range that make it not fit all that well--but at least it's computable.

figure
plot(empirical_frequency,sort(row_i))
xlabel('Expected','Observed')

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

dpb il 2 Nov 2023

Modificato: dpb il 3 Nov 2023

Apri in MATLAB Online

I didn't want to ruin the great for illustration random vector created last run above so I didn't actually rerun to plot the observed versus expected...

row_i=[97 27 9 47 15 34 43 54 38];

empirical_frequency=[10:10:90];

subplot(3,1,1)

hold on

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,row_i,'b*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','random','location','north')

subplot(3,1,2)

hold on

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,sort(row_i),'r*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','sorted','location','northwest')

subplot(3,1,3)

hold on

row_i=[9 15 27 34 53 57 78 78 97 ];

chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)

chi_squared = 4.3887

df=numel(row_i)-1;

p = 1 - chi2cdf(chi_squared, df)

p = 0.8205

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,row_i,'g*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','adjusted','location','northwest')

Now if in the end we take a set of data that actually do follow roughly the path of the empirical cdf, then, by golly, we get a chi-square statistic that actually indicates that set of observations couldn't really be ruled out as having come from the parent distribution. As noted, the "corrections" made to the random vector were to raise the 5th thru 8th values up to some values that were roughly in line...then the deviations from empirical weren't nearly so large...

Accedi per commentare.

I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Tag

Community Treasure Hunt

I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Tag

Vedere anche

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti