Split dataset into three different size sets without overlapping
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
I am working on image processing using Matlab. I need to split a large dataset into three non-overlapped subsets (25%, 25% and 50%). The dataset (let's say has 1K images) has 10 classes (each has 100 images). from class 1, 25% of images should be in the training set, other 25% should be stored in the validation set and the rest (50%) should be stored in the testset. there should not repetition. I mean if an image from a class has been stored in a subset, it must not be stored in other subsets of the class. How do I do that in Matlab?
My code is as follows:
load ('data.mat')
for i = 1:size(data, 1)
for j = 1:78
if mod(i,2)==0
trainingset(i/2,j) = data(i,j);
else
remainset((i-1)/2+1,j) = data(i,j);
end
end
end
for i = 1:size(remainset, 1)
for j = 1:78
if mod(i,2)==0
testset(i/2,j) = remainset(i,j);
else
validationset((i-1)/2+1,j) = remainset(i,j);
end
end
end
Although it somehow works, I am looking for a better algorithm as some parts of data are lost.
2 Commenti
Risposte (1)
Frank B.
il 8 Mag 2018
Here is a quick answer using datasample, for a single vector named data. Loop over your classes or use indexes if they have to be shared.
load ('data.mat')
% Declaring data division ratio
% 25% for training, 25% for validation, 50% for test
dataset_div=[0.25 0.25 0.5];
% Number of data in each set
nb_train=(dataset_div(1)/sum(dataset_div))*length(data);
nb_valid=(dataset_div(2)/sum(dataset_div))*length(data);
nb_test=(dataset_div(3)/sum(dataset_div))*length(data);
% Splitting data in 3 un-overlapping vector
% Training data
[data_train,idx_sample]=datasample(data,nb_train,'Replace',false);
% Removing used values
idx_left=1:length(data);
idx_left(idx_sample)=[];
val_left=data(idx_left);
% Validation data
[data_valid,idx_sample]=datasample(val_left,nb_valid,'Replace',false);
% Removing used values
idx_left=1:length(val_left);
idx_left(idx_sample)=[];
val_left=data(idx_left);
% Test data
[data_test,idx_sample]=datasample(val_left,nb_test,'Replace',false);
Cheers
0 Commenti
Vedere anche
Categorie
Scopri di più su Deep Learning Toolbox in Help Center e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!