I have just found the solution for my problem.
In order to divide my data in testSet and otherSet I will be using a code found here and that I have modified a little bit:
function [ X, y, partition ] = generar_sets( X, y, k )
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Author: Pree Thiengburanathum
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Description:
% To ensure that the training, testing, and validating dataset have similar
% proportions of classes (e.g., 20 classes). This stratified sampling
% technique provided the analyst with more control over the sampling process.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Input:
% X - dataset
% k - number of fold
% classData - the class data
%
% Output:
% X - new dataset
% partition - fold index
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
n = size(X, 1);
partition = zeros(n, 1);
% shuffle the dataset
[~, idx] = sort(rand(1, n));
X = X(idx, :);
y = y(idx);
% find the unique class
group = unique(y);
nGroup = numel(group);
% find min max number of sample per class
nmin = 100;
for i=1:nGroup
idx = find(y == group(i));
ni = length(idx);
nmin = min(nmin, ni);
end
% create fold indices
foldIndices = zeros(nGroup, nmin);
for i=1:nGroup
idx = find(y == group(i));
foldIndices(i, 1:numel(idx)) = idx;
end
% compute fold size for each fold
foldSize = zeros(nGroup, 1);
for i=1:nGroup
% find the number of element of the class
numElement = numel(find(foldIndices(i,:) ~= 0));
% calculate number of element for each fold
foldSize(1,i) = floor(numElement*0.25); % foldsize: |-------| clase 1 | clase 2|
% testSet | | |
% |-------|---------|--------|
% otroSet | | |
% |--------------------------|
foldSize(2,i) = floor(numElement*0.75);
end
ptr = ones(nGroup, 1);
for i=1:k % Elijo que grupo formar (test u otro)
for j=1:nGroup % Elijo por qué clase empezar
if ptr(j)+foldSize(i,j)>size(foldIndices,2)
idx = foldIndices(j, (ptr(j): (ptr(j)+foldSize(i,j)-1) ));
else
idx = foldIndices(j, (ptr(j): (ptr(j)+foldSize(i,j)) ));
end
if(idx(end) == 0)
idx = idx(1:end-1);
end
partition(idx) = i;
ptr(j) = ptr(j)+foldSize(i,j);
end
end
% dump the rest of index to the last fold
idx = find(partition == 0);
partition(idx) = k;
data = [X partition];
for i=1:k
idx = find(data(:, end) == i);
fold = y(idx);
disp(['fold# ', int2str(i), ' has ', int2str( numel(fold) ) ]);
for j=1:nGroup
idx = find(fold == group(j));
percentage = (numel(idx)/numel(fold)) * 100;
disp(['class# ', int2str(j), ' = ', num2str(percentage), '%']);
end
disp(' ');
end
end
For dividing otherSet in validation and training set and applying k-fold cv I will be using cvpartition function.
I am quite sure this would work exactly as I expected but, if not, I am still interested in your answers,
Thank you