Merging CSV files with similar data structure into table

2 visualizzazioni (ultimi 30 giorni)
Merging CSV files into table
I am trying to find a way to read each csv file with read table and merge into a big table without “heavy” coding. I am having problems merging the tables due to a somewhat non-uniform data structure and wonder if the process of making the table uniform can be done in Matlab (instead of copy pasting in excel).
Questions:
How to I check if variable exist in csv file?
How do I concatenate specific variables to existing table?
Problem: The original data are in csv files and organized as follows:
There is one csv file for each year
The data structure is not the same in each csv file. Eg file2001.csv may have variables “height” only; file 2005.csv may have variable “height” and “colour” and 20015.csv may have “height”, “size” and “colour”.
Generic Code:
Table2001=readtable(file2001.csv); % contains subset of variables
Table2005=readtable(file205.csv); % contains subset of variables
Table2015=readtable(file2015.csv); % contains all variables)
Create big table
T = Table2015; % starting table as it contains all variables
Check if “height” is a variable in Table2001;
if yes, concatenate column below T.height
if no, add column of NAN below T.height
continue process for all variables and files
  2 Commenti
Miyuru K
Miyuru K il 22 Ago 2016
Modificato: Walter Roberson il 22 Ago 2016
Do you want to store the year as well? Would your final table look like this?
Year height colour size
2015 1 1 1
...
2001 1 NAN NAN
Do you want to uniquely identify each row in the table using a key variable? If not, just load them into arrays.
Björn
Björn il 23 Ago 2016
There will be more than one observation for each calendar year.

Accedi per commentare.

Risposta accettata

Peter Perkins
Peter Perkins il 23 Ago 2016
Bjorn, you can create a table with three variables, full of NaNs, and then just assign the variables you have read in from each csv. Something like this:
Create an example of a table that might have been read in using readtable:
>> t2 = array2table(randn(5,2),'VariableNames',{'height' 'color'})
t2 =
height color
________ _______
-0.20497 0.6715
-0.12414 -1.2075
1.4897 0.71724
1.409 1.6302
1.4172 0.48889
Preallocate a table of missing values, with the right size:
>> t1 = array2table(nan(height(t2),3),'VariableNames',{'height' 'size' 'color'})
t1 =
height size color
______ ____ _____
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
Move the data that was read in:
>> t1(:,t2.Properties.VariableNames) = t2
t1 =
height size color
________ ____ _______
-0.20497 NaN 0.6715
-0.12414 NaN -1.2075
1.4897 NaN 0.71724
1.409 NaN 1.6302
1.4172 NaN 0.48889
Do that three times, and the three results can be vertically concatenated, Another possibility:
>> t2 = array2table(randn(5,2),'VariableNames',{'height' 'color'})
t2 =
height color
________ _______
1.0347 0.8884
0.72689 -1.1471
-0.30344 -1.0689
0.29387 -0.8095
-0.78728 -2.9443
>> t2{:,setdiff({'height' 'size' 'color'},t2.Properties.VariableNames)} = NaN
t2 =
height color size
________ ________ ____
1.4384 -0.10224 NaN
0.32519 -0.24145 NaN
-0.75493 0.31921 NaN
1.3703 0.31286 NaN
-1.7115 -0.86488 NaN

Più risposte (0)

Categorie

Scopri di più su Tables in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by