I have a dataset. I want to do cross validation. How can I divide it into testing and training data ?

1 visualizzazione (ultimi 30 giorni)
Hi, I have got 100 folders of which 50 are male and 50 are female. Each folder contains 6 images. I want to implement cross validation. I just know how to implement the cross validation if I just have the 100 distinct images. But here I have 100 different folders how to divide them for cross validation ? Need help.

Risposte (1)

dpb
dpb il 22 Ago 2016
Just randomize over the overall list...
  2 Commenti
chinnurocks
chinnurocks il 22 Ago 2016
When I had just 100 images, each from 100 subject. I am able to randomise. But struck with how to randomise folders. For your reference I am attaching you my code.
if true
clc;
clear;
pngFiles = dir('*.png'); %Gets all the png files
%csvFiles = dir('*.csv');
numFiles = length(pngFiles);
mydata = cell(1,numFiles); % Creates a cell to store the images.
data= cell (numFiles,1); % Creates cell to store the features obtained.
%mydata = zeros(numFiles);
% Reads all the files into the mydata cell and gets lbp into data cell.
for k = 1:numFiles
mydata{k} = imread(pngFiles(k).name);
img = mydata{k};
data{k,:}=lbp(img,1,8,0,'hist');
%data{k,:} = data{k,:}./1000;
%csvwrite(csvFiles(k).name,J);
end
%Shifting that feature data to a variable 'b'.
b=[];
for k= 2:numFiles
data{k,1} = [data{k-1,1};data{k,1}];
end
c= data{numFiles,1}; % moves data to c
% Creates a vector 'a' and assigns Label to my data.
for b=1:100
if b<51
a{b,1} = 'male';
else
a{b,1} = 'female';
end
end
groups = ismember(a,'male'); % ismember gives logic '1' if it finds male or else '0'.
%# load iris dataset
%groups = ismember(species,'setosa'); %# create a two-class problem by giving 1 if setosa is found in the species
%# number of cross-validation folds:
%# If you have 50 samples, divide them into 10 groups of 5 samples each,
%# then train with 9 groups (45 samples) and test with 1 group (5 samples).
%# This is repeated ten times, with each group used exactly once as a test set.
%# Finally the 10 results from the folds are averaged to produce a single
%# performance estimation.
p=10;
cvFolds = crossvalind('Kfold', groups, p); %# get indices of 10-fold CV of "groups" observation
cp = classperf(groups); %# init performance tracker
for i = 1:p %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train an SVM model over training instances
svmModel = svmtrain( c(trainIdx,:), groups(trainIdx),'Showplot',false, ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1000);
%# test using test instances
pred = svmclassify(svmModel, c(testIdx,:));
%# evaluate and update performance object
cp = classperf(cp, pred, testIdx);
end
%# get accuracy cp.CorrectRate
end
dpb
dpb il 22 Ago 2016
Return the list of subdirectories first in an array then select randomly from that array....there are quite a number of threads with code on Answers that show how to traverse a subdirectory if that's an issue...

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by