# Attempt on k-nearest neighbor pdf estimate in 1D

2 visualizzazioni (ultimi 30 giorni)
Jonne Klockars il 31 Ott 2022
Risposto: Akshat il 1 Set 2023
I'm trying to write a function for estimating k-nearest neighbours pdf in one dimension. I've been going through this several times already and can't figure out what is wrong. The visualisation shows that my 'pdf' is clearly not how it should be: there's a peak on top of one sample and a sample-wise more dense area is flat. Any advice and corrections appreciated! Here is my code, the test data 't122' is a 1x10 vector i.e. ten 1D samples:
x = [0.553766713954610,0.683388501459509,0.274115313899635,0.586217332036812,0.531876523985898,0.369231170369473,0.456640797769432,0.534262446653865,0.857839693972576,0.776943702988488];
d = size(x,1);
d2 = size(x,2);
% k samples inside the Parzen window
k = 3; % sqrt(N) is a good guess for optimal k
% plotting the samples and the estimated pdf
xAxis = linspace(0,1,100);
plot(xAxis,nnPdf(xAxis,x,k));
title('t122 on the real line with nn-estimated pdf');
hold on;
plot(x,0,'o','MarkerSize',25);
legend(sprintf('%d nearest neighbours pdf',k),'t122');
And here is the function:
% k nearest neighbours 1D pdf-estimator function nnPdf()
% inputs:
% x0 = interval for the pdf
% x = data for which the pdf is estimated
% k = number of samples in every Parzen window
% output:
% V = 1D-pdf estimated with k nearest neighbours
function V = nnPdf(x0,x,k)
v = zeros(length(x0),size(x,2)); % for distances to all samples
V = zeros(length(x0),1); % for distance needed to include k samples
if k > size(x,2)
disp('*Invalid value for k: not so many samples in the data.');
return
end
standardize(x);
for i = 1:length(x0)
for j = 1:size(x,2)
% distance from interval point to all samples
v(i,j) = abs(x0(i)-x(j));
end
% sorted distances so v_ik is the distance for reaching to the
% kth sample from the point x0_i
sort(v,2);
% window size V at point x0_i based on the distance (volume in 1D)
V(i) = (k/size(x,2)) * 1/v(i,k);
end
end
And the outcome:
##### 0 CommentiMostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

### Risposte (1)

Akshat il 1 Set 2023
Hi Jonne,
I have reproduced your code at my end, and I am currently using R2023A version of MATLAB. Kindly note the following differences, and I will paste the code below as well. I have attached the required PDF graph here.
1. In the "nnPdf" function, you have used “standardize” but it isn’t defined in MATLAB. The function “zscore” can perform the task of standardizing.
2. While sorting, you haven’t assigned the values back to v, and hence it isn’t working. Code:
[v(i,:), ~] = sort(v(i,:));
1. The line where you calculate the window size "V(i)" is incorrect. Instead of dividing by "v(i,k)", you should divide by the distance to the k-th nearest neighbor, which is "v(i,k+1)" (since MATLAB indexing starts from 1).
Finally the code which gave me the attached result is:
x = [0.553766713954610,0.683388501459509,0.274115313899635,0.586217332036812,0.531876523985898,0.369231170369473,0.456640797769432,0.534262446653865,0.857839693972576,0.776943702988488];
d = size(x,1);
d2 = size(x,2);
% k samples inside the Parzen window
k = 3; % sqrt(N) is a good guess for optimal k
% plotting the samples and the estimated pdf
xAxis = linspace(0,1,100);
plot(xAxis,nnPdf(xAxis,x,k));
title('t122 on the real line with nn-estimated pdf');
hold on;
plot(x,0,'o','MarkerSize',25);
legend(sprintf('%d nearest neighbours pdf',k),'t122');
% k nearest neighbours 1D pdf-estimator function nnPdf()
% inputs:
% x0 = interval for the pdf
% x = data for which the pdf is estimated
% k = number of samples in every Parzen window
% output:
% V = 1D-pdf estimated with k nearest neighbours
function V = nnPdf(x0,x,k)
v = zeros(length(x0),size(x,2)); % for distances to all samples
V = zeros(length(x0),1); % for distance needed to include k samples
if k > size(x,2)
disp('*Invalid value for k: not so many samples in the data.');
return
end
zscore(x);
for i = 1:length(x0)
for j = 1:size(x, 2)
% distance from interval point to all samples
v(i, j) = abs(x0(i) - x(j));
end
% sorted distances so v_ik is the distance for reaching to the
% kth sample from the point x0_i
[v(i, :), ~] = sort(v(i, :));
% window size V at point x0_i based on the distance (volume in 1D)
V(i) = (k / size(x, 2)) * (1 / v(i, k+1));
end
end
Hope it helps!
##### 0 CommentiMostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

### Categorie

Scopri di più su Hypothesis Tests in Help Center e File Exchange

R2022a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by