How to split a string and make multiple rows from the original data for each split?

6 views (last 30 days)
Hi everyone,
I have an excel sheet with some students data and it has 11 columns and about 500 rows. I want to manipulate data in the grade and class columns.
The current format is the grade column is written in A,B,D,C format and class is written Chem, Math,Eng, Sci format.
Assuming the order is correct,
I want to split the grades and the classes at the comma and make each grade and class in a new row and duplicating the rest of the data like the name, count etc.
For example: the first row has Smith and his grades A,B,D,C as shown below
I want it to make 4 rows for Smith, each row represent one grade and one class like this
My code:
data = readtable("New Microsoft Excel Worksheet.xlsx")
Conv =string(data.Grades);
convList = [];
for i = 1:length(Conv)
convList = [convList; split(Conv(i), ",")];
end
Conv2 =string(data.Class);
convList2 = [];
for i = 1:length(Conv2)
convList2 = [convList2; split(Conv2(i), ",")];
end
convList2
This code will split string at the comma and I'm stuck trying to duplicate the data for each run. I would really appreciate any help

Accepted Answer

Karim
Karim on 16 Aug 2022
Edited: Karim on 16 Aug 2022
Hi, there are several methods to do this. One way is to create an index vector and use this to copy the variable. See below :)
data = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/1099480/New%20Microsoft%20Excel%20Worksheet.xlsx");
% split the grades and class into columns
Grades_split = split(string(data.Grades),',');
Class_split = split(string(data.Class),',');
% detele spaces in the 'class' strings
Class_split = strtrim(Class_split);
% reshape into a row vector
Grades_split = reshape(pagetranspose(Grades_split),[],1);
Class_split = reshape(pagetranspose(Class_split),[],1);
% create an index vector
MyIndex = repmat(1:height(data),4,1);
MyIndex = reshape(MyIndex,[],1);
% create the new table based on the index
NewTable = data(MyIndex,:);
% insert splitted data
NewTable.Grades = Grades_split;
NewTable.Class = Class_split;
NewTable
NewTable = 1992×11 table
Name Count White Black API ASIAN Two_or_More_Races Hispanic Grades Notes Class ____________ __________ ______ ______ ______ ______ _________________ ________ ______ ______________ ______ {'SMITH' } 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "A" {'Some Notes'} "Chem" {'SMITH' } 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "B" {'Some Notes'} "Math" {'SMITH' } 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "D" {'Some Notes'} "Eng" {'SMITH' } 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "C" {'Some Notes'} "Sci" {'JOHNSON' } 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "B" {'Some Notes'} "Chem" {'JOHNSON' } 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "C" {'Some Notes'} "Math" {'JOHNSON' } 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "A" {'Some Notes'} "Eng" {'JOHNSON' } 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "D" {'Some Notes'} "Sci" {'WILLIAMS'} 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "A" {'Some Notes'} "Chem" {'WILLIAMS'} 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "A" {'Some Notes'} "Math" {'WILLIAMS'} 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "A" {'Some Notes'} "Eng" {'WILLIAMS'} 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "B" {'Some Notes'} "Sci" {'BROWN' } 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "C" {'Some Notes'} "Chem" {'BROWN' } 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "D" {'Some Notes'} "Math" {'BROWN' } 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "C" {'Some Notes'} "Eng" {'BROWN' } 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "D" {'Some Notes'} "Sci"

More Answers (1)

Cris LaPierre
Cris LaPierre on 16 Aug 2022
I think you could get there using the stack function. There is some preprocessing needed first, namely splitting the Grades into their own variables, but that can be done using split and splitvars. To work with strings, set your 'TextType' property to 'string' in readtable.
Here is a brief example.
data = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/1099480/New%20Microsoft%20Excel%20Worksheet.xlsx",...
'TextType','string')
data = 498×11 table
Name Count White Black API ASIAN Two_or_More_Races Hispanic Grades Notes Class ___________ __________ ______ ______ ______ ______ _________________ ________ _________ ____________ _____________________ "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "A,B,D,C" "Some Notes" "Chem, Math,Eng, Sci" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "B,C,A,D" "Some Notes" "Chem, Math,Eng, Sci" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "A,A,A,B" "Some Notes" "Chem, Math,Eng, Sci" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "C,D,C,D" "Some Notes" "Chem, Math,Eng, Sci" "JONES" 1.4255e+06 0.5519 0.3848 0.0044 0.01 0.0261 0.0229 "A,B,D,C" "Some Notes" "Chem, Math,Eng, Sci" "GARCIA" 1.1661e+06 0.0538 0.0045 0.0141 0.0047 0.0026 0.9203 "B,C,A,D" "Some Notes" "Chem, Math,Eng, Sci" "MILLER" 1.1614e+06 0.8411 0.1076 0.0054 0.0066 0.0177 0.0217 "A,A,A,B" "Some Notes" "Chem, Math,Eng, Sci" "DAVIS" 1.1164e+06 0.622 0.316 0.0049 0.0082 0.0245 0.0244 "B,D,C,D" "Some Notes" "Chem, Math,Eng, Sci" "RODRIGUEZ" 1.0949e+06 0.0475 0.0054 0.0057 0.0018 0.0018 0.9377 "A,B,D,C" "Some Notes" "Chem, Math,Eng, Sci" "MARTINEZ" 1.0602e+06 0.0528 0.0049 0.006 0.0051 0.0022 0.9291 "B,C,A,D" "Some Notes" "Chem, Math,Eng, Sci" "HERNANDEZ" 1.0433e+06 0.0379 0.0036 0.006 0.0019 0.0016 0.9489 "A,A,A,B" "Some Notes" "Chem, Math,Eng, Sci" "LOPEZ" 8.7452e+05 0.0486 0.0057 0.0102 0.0038 0.0025 0.9292 "C,D,C,D" "Some Notes" "Chem, Math,Eng, Sci" "GONZALEZ" 8.4102e+05 0.0403 0.0035 0.0038 0.0014 0.0013 0.9497 "A,B,D,C" "Some Notes" "Chem, Math,Eng, Sci" "WILSON" 8.0188e+05 0.6736 0.2599 0.006 0.0106 0.0238 0.0261 "B,C,A,D" "Some Notes" "Chem, Math,Eng, Sci" "ANDERSON" 7.844e+05 0.7517 0.1893 0.0061 0.0074 0.0211 0.0244 "A,A,A,B" "Some Notes" "Chem, Math,Eng, Sci" "THOMAS" 7.5614e+05 0.5263 0.3875 0.0244 0.0107 0.0258 0.0254 "C,D,C,D" "Some Notes" "Chem, Math,Eng, Sci"
data.Grades = split(data.Grades,",")
data = 498×11 table
Name Count White Black API ASIAN Two_or_More_Races Hispanic Grades Notes Class ___________ __________ ______ ______ ______ ______ _________________ ________ ________________________ ____________ _____________________ "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "C" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci" "JONES" 1.4255e+06 0.5519 0.3848 0.0044 0.01 0.0261 0.0229 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "GARCIA" 1.1661e+06 0.0538 0.0045 0.0141 0.0047 0.0026 0.9203 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "MILLER" 1.1614e+06 0.8411 0.1076 0.0054 0.0066 0.0177 0.0217 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "DAVIS" 1.1164e+06 0.622 0.316 0.0049 0.0082 0.0245 0.0244 "B" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci" "RODRIGUEZ" 1.0949e+06 0.0475 0.0054 0.0057 0.0018 0.0018 0.9377 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "MARTINEZ" 1.0602e+06 0.0528 0.0049 0.006 0.0051 0.0022 0.9291 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "HERNANDEZ" 1.0433e+06 0.0379 0.0036 0.006 0.0019 0.0016 0.9489 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "LOPEZ" 8.7452e+05 0.0486 0.0057 0.0102 0.0038 0.0025 0.9292 "C" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci" "GONZALEZ" 8.4102e+05 0.0403 0.0035 0.0038 0.0014 0.0013 0.9497 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "WILSON" 8.0188e+05 0.6736 0.2599 0.006 0.0106 0.0238 0.0261 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "ANDERSON" 7.844e+05 0.7517 0.1893 0.0061 0.0074 0.0211 0.0244 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "THOMAS" 7.5614e+05 0.5263 0.3875 0.0244 0.0107 0.0258 0.0254 "C" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci"
dataNew = splitvars(data,'Grades')
dataNew = 498×14 table
Name Count White Black API ASIAN Two_or_More_Races Hispanic Grades_1 Grades_2 Grades_3 Grades_4 Notes Class ___________ __________ ______ ______ ______ ______ _________________ ________ ________ ________ ________ ________ ____________ _____________________ "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "C" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci" "JONES" 1.4255e+06 0.5519 0.3848 0.0044 0.01 0.0261 0.0229 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "GARCIA" 1.1661e+06 0.0538 0.0045 0.0141 0.0047 0.0026 0.9203 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "MILLER" 1.1614e+06 0.8411 0.1076 0.0054 0.0066 0.0177 0.0217 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "DAVIS" 1.1164e+06 0.622 0.316 0.0049 0.0082 0.0245 0.0244 "B" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci" "RODRIGUEZ" 1.0949e+06 0.0475 0.0054 0.0057 0.0018 0.0018 0.9377 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "MARTINEZ" 1.0602e+06 0.0528 0.0049 0.006 0.0051 0.0022 0.9291 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "HERNANDEZ" 1.0433e+06 0.0379 0.0036 0.006 0.0019 0.0016 0.9489 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "LOPEZ" 8.7452e+05 0.0486 0.0057 0.0102 0.0038 0.0025 0.9292 "C" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci" "GONZALEZ" 8.4102e+05 0.0403 0.0035 0.0038 0.0014 0.0013 0.9497 "A" "B" "D" "C" "Some Notes" "Chem, Math,Eng, Sci" "WILSON" 8.0188e+05 0.6736 0.2599 0.006 0.0106 0.0238 0.0261 "B" "C" "A" "D" "Some Notes" "Chem, Math,Eng, Sci" "ANDERSON" 7.844e+05 0.7517 0.1893 0.0061 0.0074 0.0211 0.0244 "A" "A" "A" "B" "Some Notes" "Chem, Math,Eng, Sci" "THOMAS" 7.5614e+05 0.5263 0.3875 0.0244 0.0107 0.0258 0.0254 "C" "D" "C" "D" "Some Notes" "Chem, Math,Eng, Sci"
dataFinal = stack(dataNew,["Grades_"+(1:4)])
dataFinal = 1992×12 table
Name Count White Black API ASIAN Two_or_More_Races Hispanic Notes Class Grades_1_Grades_2_Grades_3_Grades_4_Indicator Grades_1_Grades_2_Grades_3_Grades_4 __________ __________ ______ ______ ______ ______ _________________ ________ ____________ _____________________ _____________________________________________ ___________________________________ "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "Some Notes" "Chem, Math,Eng, Sci" Grades_1 "A" "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "Some Notes" "Chem, Math,Eng, Sci" Grades_2 "B" "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "Some Notes" "Chem, Math,Eng, Sci" Grades_3 "D" "SMITH" 2.443e+06 0.709 0.2311 0.005 0.0089 0.0219 0.024 "Some Notes" "Chem, Math,Eng, Sci" Grades_4 "C" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "Some Notes" "Chem, Math,Eng, Sci" Grades_1 "B" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "Some Notes" "Chem, Math,Eng, Sci" Grades_2 "C" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "Some Notes" "Chem, Math,Eng, Sci" Grades_3 "A" "JOHNSON" 1.9328e+06 0.5897 0.3463 0.0054 0.0094 0.0256 0.0236 "Some Notes" "Chem, Math,Eng, Sci" Grades_4 "D" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "Some Notes" "Chem, Math,Eng, Sci" Grades_1 "A" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "Some Notes" "Chem, Math,Eng, Sci" Grades_2 "A" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "Some Notes" "Chem, Math,Eng, Sci" Grades_3 "A" "WILLIAMS" 1.6253e+06 0.4575 0.4768 0.0046 0.0082 0.0281 0.0249 "Some Notes" "Chem, Math,Eng, Sci" Grades_4 "B" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "Some Notes" "Chem, Math,Eng, Sci" Grades_1 "C" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "Some Notes" "Chem, Math,Eng, Sci" Grades_2 "D" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "Some Notes" "Chem, Math,Eng, Sci" Grades_3 "C" "BROWN" 1.437e+06 0.5795 0.356 0.0051 0.0087 0.0255 0.0252 "Some Notes" "Chem, Math,Eng, Sci" Grades_4 "D"

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by