Knn classification on a dataset

Hi i have this dataset and i want knn classification for this and also find accuracy of performance of this classification and showing the number of wrong and true classifications with confusion matrix. Any one can help me with this .please.

3 Commenti

Explain what each triplet of csv files represents.
Mary Gh
Mary Gh il 31 Dic 2020
Well they are A to I classes with 2 train datas (two lables,)and one test data
Mary Gh
Mary Gh il 31 Dic 2020
Actually i dont think they represent anything it didnt say in question

Accedi per commentare.

 Risposta accettata

Image Analyst
Image Analyst il 31 Dic 2020

0 voti

So one is to train, the other "training" one is for validation (run it through and see how accurate the predictions are compared to the known values), and the third one is a test set (which you do not know the correct answers for).
I have attached a KNN demo. See if you can adapt that to your homework problem. If not, come back for more hints.

21 Commenti

Thanks for your help it did work but i had to transpose my training coords and unknown coords to work does it have any problem if i did this ??and i need another hint please actually i also want to find accuracy of performance of this classification and showing the number of wrong and true classifications with confusion matrix And also how can i make a validation set....thanks in advance
You have to have a set of observations where you know the absolute true answer for - what class they are. If you don't, how can you assess it's accuracy? So then you take a portion of the known "ground truth" data, like 80% or something and train with that. Then the remaining 20% are your validation set. You run those through knnsearch() and see what classes it assigned to each observation. Since you know the truth for each of those, you can determine the percent it got correct.
Bcz im new to this field is there any demo like last one to understand what happening i will be appreciated .
Sir, i have question too about knn. Can you help me please?
@Frisda Sianipar, I'll try. What is the question?
this is the code that i have
but error when i run this code
x=readtable("datatraining.xlsx");
latih=x;
group=latih(:,3);
latih = [latih(:,1) latih(:,2)];
for i = 1 : 80
y=readtable("datatesting.xlsx");
sampel = y;
test = [sampel(:,1) sampel(:,2)];
%sampel = [2.6136 0.1284 1.3017 -0.8089 0.0441 -0,2084];
hasil=knnclassify(test,latih,group);
end
nama = "hasil KNN.xlsx";
hasil = [sampel(:,1) sampel(:,2) sampel(:,3) hasil];
xlswrite(nama,hasil);
Image Analyst
Image Analyst il 1 Mag 2021
Modificato: Image Analyst il 1 Mag 2021
@Frisda Sianipar, you forgot to read the posting guidelines.
which means that you forgot to attach your "datatraining.xlsx" thus delaying an answer. So, what can I do with no data to work with? Start a new question and this time attach your code, screenshot, AND xlsx workbook.
i forgot sir, this is the "datatraining.xlsx" and "datatesting.xlsx"
Thankyou in advance
yes, this is the same question sir, please help me to answer it
@Image Analyst sir, i'm sorry to disturb your time. This task will be collected soon, but until now I have not found a solution to the error that I have posted sir. Could you help me sir to find the solution of my task? Thankyou in advance
hi
please anyone can help me with this problem. i want to calculate the euclidean distance of my dataset using knn
I have for example the training and test data as well as the class.
x_Train = [4 6 7 5 8; 5.2 6.3 9 11 10];
y_Train = [3 7 8 5 8;4.5 1.3 6 7 9.1];
x_test=[0.8 14 2 5 4.3; 7.2 6.5 4.1 18 3.6]
y_test=[1 4.8 5.9 14 3.4 ;9 17 12 16 2.9]
trainingclass=[1 2 2 1 1 ]
thank'you
I don't know what that means. How many classes do you have? What are the "true" classes of the points in your training set? Once we know that, plus what value of K you want, we can assign class numbers to the test set.
merlin toche
merlin toche il 3 Dic 2022
Modificato: Image Analyst il 3 Dic 2022
Thank you sir for the feedback.
For my previous question,
x_Train = [4 6 7 5 8; 5.2 6.3 9 11 10];
y_Train = [3 7 8 5 8;4.5 1.3 6 7 9.1];
x_test=[0.8 14 2 5 4.3; 7.2 6.5 4.1 18 3.6]
y_test=[1 4.8 5.9 14 3.4 ;9 17 12 16 2.9]
I have six classes namely
trainclass=['None','OCF','SCF','P','SBD','OCI']
, k value is 5. Please sir I don't understand what you mean by real classes.
Sir other concern, please I have partitioned my data into training and test data using cvpartition command
For example of score
sample data (150x4)
mydata1=rand(150.4);
% cross variation(Train: 80%, Test:20%)
cv=cvpartition(size(mydata1,1),'holdout',0.2);
idx=cv.test;
% sparse training and test data
mydata1_train=mydata1(~idx,:);
mydata1_test=mydata1(idx,:);
% evaluation of data
I would like to know how one can build in a benchmark such that the test and training points appear.
Thank you for your support.
I don't know what this means
x_Train = [4 6 7 5 8; 5.2 6.3 9 11 10]
x_Train = 2×5
4.0000 6.0000 7.0000 5.0000 8.0000 5.2000 6.3000 9.0000 11.0000 10.0000
y_Train = [3 7 8 5 8;4.5 1.3 6 7 9.1]
y_Train = 2×5
3.0000 7.0000 8.0000 5.0000 8.0000 4.5000 1.3000 6.0000 7.0000 9.1000
x_test=[0.8 14 2 5 4.3; 7.2 6.5 4.1 18 3.6]
x_test = 2×5
0.8000 14.0000 2.0000 5.0000 4.3000 7.2000 6.5000 4.1000 18.0000 3.6000
y_test=[1 4.8 5.9 14 3.4 ;9 17 12 16 2.9]
y_test = 2×5
1.0000 4.8000 5.9000 14.0000 3.4000 9.0000 17.0000 12.0000 16.0000 2.9000
trainclass=['None','OCF','SCF','P','SBD','OCI']
trainclass = 'NoneOCFSCFPSBDOCI'
Why are there two rows in your x and y values?
Do you think you're getting kmeans confused with KNN?
The way KNN works is you specify K, the data, and the class number that each data point belongs to (it's "TRUE" class assignment). For example if you had 100 data training points (100 x and 100 y in vectors) and 5 classes, then you must have a training vector that says what class number each of those 100 training points really is. For example if trainingClasses = [1,3,5,2,.......3,2] then it says training point 1 is defined to be a member of class 1, and training point 2 is defined to be a member of class 3, and training point 3 is defined to be a member of class 5, and training point 4 is defined to be a member of class 2, and ... training point 99 is defined to be a member of class 3, and training point 100 is defined to be a member of class 2.
Now you can call knnsearch and it will tell you the K indexes in your training set that each point in your test set is closest to
%-----------------------------------------------------------------------------------------------------------------
% Now do a K Nearest Neighbor Search.
% Get the classes of the unknown data.
% First collect all the training data into one tall array
trainingCoords = [x_Train(:), y_Train(:)]
trainingCoords = 10×2
4.0000 3.0000 5.2000 4.5000 6.0000 7.0000 6.3000 1.3000 7.0000 8.0000 9.0000 6.0000 5.0000 5.0000 11.0000 7.0000 8.0000 8.0000 10.0000 9.1000
unknownCoords = [x_test(:), y_test(:)]
unknownCoords = 10×2
0.8000 1.0000 7.2000 9.0000 14.0000 4.8000 6.5000 17.0000 2.0000 5.9000 4.1000 12.0000 5.0000 14.0000 18.0000 16.0000 4.3000 3.4000 3.6000 2.9000
[indexes, distancesOfTheIndexes] = knnsearch(trainingCoords, unknownCoords, ...
'NSMethod', 'exhaustive',...
'k', 5,... % Get indexes of the 5 nearest points
'distance', 'euclidean') % Regular Pythagorean formula for distance
indexes = 10×5
1 4 2 7 3 5 9 3 10 6 8 6 10 9 5 10 5 9 3 8 7 2 1 3 5 5 3 9 10 7 5 9 10 3 6 10 8 9 6 5 1 2 7 4 3 1 2 7 4 3
distancesOfTheIndexes = 10×5
3.7736 5.5082 5.6223 5.8000 7.9398 1.0198 1.2806 2.3324 2.8018 3.4986 3.7202 5.1420 5.8728 6.8000 7.6968 8.6406 9.0139 9.1241 10.0125 10.9659 3.1321 3.4928 3.5228 4.1485 5.4231 4.9406 5.3488 5.5866 6.5742 7.0576 6.3246 6.7082 7.0007 7.0711 8.9443 10.5646 11.4018 12.8062 13.4536 13.6015 0.5000 1.4213 1.7464 2.9000 3.9812 0.4123 2.2627 2.5239 3.1385 4.7508
% Extract the classes
Now you can see that point 1 of the test set is closest to points 1, 4, 2, 7, and 3 of the training set, in order of decreasing closeness (increasing distance). If I wanted to determine the classes of those 5 training points, I need to know what class they are in -- their "true" class that they are known to be without any doubt. For example if points 1, 4, and 2 are all in class 3, then the majority of points near test point 1 are class 3 so we're now going to define test point 1 as being in class 3 also.
That's KNN. Does that explain it?
Now with kmeans, you don't have a training set because it's an upsupervised classification. All you have is a training set and a known/desired number of clusters to force them into. So it will try to figure out where the clusters are and assign the unknown points into one of the k clusters. But there is not ground truth, or training set, where you know the true class for any points.
merlin toche
merlin toche il 5 Gen 2023
Modificato: merlin toche il 5 Gen 2023
hello
please need help. When I run my code the error below appears. I need your help please. I tried to solve the problem without success
the error who i get is below
Index exceeds matrix dimensions.
Error in Untitled5 (line 26)
plot(x_train(trainClass == 1),y_train(trainClass == 1), 'b.', 'MarkerSize', 30);
I can't run an image, but it's saying that you don't have the same number of elements in trainClass as you have elements in x_train. How many are in each vector? They must have the same number of elements.
Add this after you assign trainClass and before you call plot():
if numel(trainClass) ~= numel(x_train)
errorMessage = sprintf('Wrong number of training classes.\ntrainClass should have %d elements (like x_train),\nbut it has %d elements.\n',...
numel(x_train), numel(trainClass))
uiwait(errordlg(errorMessage));
return;
end
thank you very much sir I reduced it work.
please sir i have since been trying to replace my data with another set of data but it's still the old one running in my program. what to do?
@merlin toche call "clearvars" before running your program.
If you have any more questions, then attach your data and code to read it in with the paperclip icon after you read this:
We should discuss this in your own discussion thread rather than continually bugging the original poster, @Mary Gh, with emails about your problem.
thank you sir! i'm very happy.
please, sir I have other questions via my profile as you said concern your previous answers.
best regards
@merlin toche Not sure what that means, but in your profile, it says you've submitted only one question, and one answer. If you have asked other questions in the comments to someone else's question, then that won't show up. That's why it's best to ask your own questions in your own thread rather than try to post the questions in someone else's thread, like here in @Mary Gh's thread. She'll get emails every time there is activity in this thread.

Accedi per commentare.

Più risposte (1)

merlin toche
merlin toche il 9 Dic 2022

0 voti

thank you sir for all you do for me. well explained and well understood. excuse me for continuing to bother you, please, I'm still learning machine learning, and you are a good teacher for me. I have two concerns sir:
My first question is sir after partitioning my data (I did it using cvpartion mydata1=rand(150.4);
% cross-variation (Train: 80%, Test: 20%)
cv=cvpartition(size(mydata1,1),'holdout',0.2);
idx=cv.test;
% sparse training and test data
mydata1_train=mydata1(~idx,:);
mydata1_test=mydata1(idx,:);
% data evaluation), which command used to do the job you just explained to me? for example with my 150 data partitioned, would I still need to declare the vectors below before doing the work you explained to me? if not how i have to call this data in order to build it?
x_Train = [4 6 7 5 8; 5.2 6.3 9 11 10];
y_Train = [3 7 8 5 8;4.5 1.3 6 7 9.1];
x_test=[0.8 14 2 5 4.3; 7.2 6.5 4.1 18 3.6];
y_test=[1 4.8 5.9 14 3.4 9 17 12 16 2.9];
['None', 'OCF', 'SCF', 'P', 'SBD', 'OCI']
best regards

4 Commenti

Image Analyst
Image Analyst il 9 Dic 2022
Modificato: Image Analyst il 9 Dic 2022
Instead of that last part you'd do
x_Train = mydata1_train(:, 1);
y_Train = mydata1_train(:, 2);
etc.
But you still need to know the "true" class for the training data if you're going to use KNN.
Hi !
please anyone can help me. i need the code to sort and evaluate accuracy in knn.
thank'you
You have this:
x_train = mydata1_train(:,1); %[4 6 7 5 8];
y_train = mydata1_train(:,2); % [3 7 8 5 8];
% Now you say your classes are c=['open,'short','short','open','open']
% so let's make those class numbers.
trainClass =[1,2,2,1,1];
However your x_train and y_train have 120 elements. So you need to define trainClass to have 120 elements also. You need to know the "true" class for every one of your training points. Right now you have only 5, not 120.
please can someone help me?
I want to detect a series of faults using the fuzzy-KNN algorithm. for this I have 5 name data classes, I wrote a code but errors appear, I would like your help to reread and make the necessary corrections.
attached my code and my dataset
THANKS

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by