Loading and splitting data (double + string) into train and test data

5 visualizzazioni (ultimi 30 giorni)
Hi
I have below codes for importing data and splitting the data into 80 + 20%
Unfortunately, I can only split the data containing double from column 1 to 8, but when I include column-9 with string data I get following error.
I want to split data column 1 to 9 containing both double and string data, so that I can track which data belong to which experiments in both traind and test data.
Can someone please helpme with this.
clc
clear all
close all
set(0,'DefaultAxesFontName','Times New Roman')
% Specify the file name and sheet name
filename = 'Data.xlsx';
sheet = 'Sheet1';
%Import data from Excel
data = readtable(filename, 'Sheet', sheet);
%Convert the table to a matrix
matrix_data = table2array(data);
%Display the size of the matrix (should be 101x7)
disp('datasize=');
disp(size(matrix_data));
%Randomly shuffle the data
rng('default'); % Set random seed for reproducibility
shuffled_data = matrix_data(randperm(size(matrix_data, 1)), :);
%Calculate the number of rows for training (80%) and testing (20%)
num_rows = size(shuffled_data, 1);
num_train = round(0.8 * num_rows);
num_test = num_rows - num_train;
%Split the data into training and testing sets
train_data = shuffled_data(1:num_train, :);
test_data = shuffled_data(num_train+1:end, :);
% Specify the file name for the Excel sheet
train_filename = 'train_data.xlsx';
test_filename = 'test_data.xlsx';
% Get column names from the original data
column_names = data.Properties.VariableNames;
% Write train_data to Excel with column names
writetable(array2table(train_data, 'VariableNames', column_names), train_filename);
disp('traindata=');
disp(size(train_data));
% Write test_data to Excel with column names
writetable(array2table(test_data, 'VariableNames', column_names), test_filename);
disp('testdata=');
disp(size(test_data));
disp('Data split>> Train & Test data completed.');
Error:
Error using {}
Unable to concatenate the table variables 'Temp' and 'Codes', because their types are double and cell.
Error in table2array (line 37)
a = t{:,:};
Error in Random_codes (line 14)
matrix_data = table2array(data);

Risposta accettata

Drew
Drew il 13 Apr 2024
You can leave your data in the table, and use cvpartition or randsample to split your data. If you are aiming to do Classification or Regression, you could also load the data into Classification Learner or Regression Learner, and let the app split your data.
t = array2table(randn(101,7));
cvp = cvpartition(size(t,1),"HoldOut",0.2); % Select 20% of the data for testing
train_data = t( training(cvp) ,:);
test_data = t( test(cvp) ,:);
If this answer helps you, please remember to accept the answer.

Più risposte (0)

Prodotti


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by