Collating data from different sources
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I have two csv datasets t1 and t2:
t1: t2:
A B C D A B C D
1 5 4.1 "a" 4 3 4.2 "e"
2 3 3.8 "g" 7 5 2.5 "c"
3 2 3.6 "d" 9 4 4.3 "b"
4 NaN 4.2 "e"
5 6 4.2 "h"
6 2 3.6 "d"
7 NaN 2.5 "c"
8 5 1.8 "e"
9 NaN 4.3 "b"
10 6 2.6 "a"
So t1 is missing some values but is the 'master' table. t2 has only the rows with missing values from t1, but with the data filled (from another source).
I am trying to fill the missing values in t1 using the values from t2, where variable 'A' is like an index (having unique values for each observation).
I have tried using outerjoin() but the result is:
t3:
tleft_A tleft_B tleft_C tleft_D tright_A tright_B tright_C tright_D
1 5 4.1 "a" NaN NaN NaN NaN
2 3 3.8 "g" NaN NaN NaN NaN
3 2 3.6 "d" NaN NaN NaN NaN
4 NaN 4.2 "e" 4 3 2.5 "e"
5 6 4.2 "h" NaN NaN NaN NaN
6 2 3.6 "d" NaN NaN NaN NaN
7 NaN 2.5 "c" 7 5 2.5 "c"
8 5 1.8 "e" NaN NaN NaN NaN
9 NaN 4.3 "b" 9 4 4.3 "b"
10 6 2.6 "a" NaN NaN NaN NaN
What I am trying to achieve, however, is:
t3:
A B C D
1 5 4.1 "a"
2 3 3.8 "g"
3 2 3.6 "d"
4 3 4.2 "e"
5 6 4.2 "h"
6 2 3.6 "d"
7 5 2.5 "c"
8 5 1.8 "e"
9 4 4.3 "b"
10 6 2.6 "a"
Is there a simple way to accomplish this, or do I need to extract, index and combine the variables, or approach it programmatically via a loop or something similar? None of the interpolation-type fill methods seem to be appropriate either.
I'm fairly new to MatLab, so if there is any other information I am missing which would help clarify the problem, I'll be happy to update the question with it.
Thanks in advance for any insights you can provide.
2 Commenti
Risposte (1)
Jon
il 4 Ott 2023
You could use this approach
c1 = {
'A' 'B' 'C' 'D'
1 5 4.1 "a"
2 3 3.8 "g"
3 2 3.6 "d"
4 NaN 4.2 "e"
5 6 4.2 "h"
6 2 3.6 "d"
7 NaN 2.5 "c"
8 5 1.8 "e"
9 NaN 4.3 "b"
10 6 2.6 "a"};
c2 = {
'A' 'B' 'C' 'D'
4 3 4.2 "e"
7 5 2.5 "c"
9 4 4.3 "b"};
t1 = cell2table(c1(2:end,:));
t1.Properties.VariableNames = c1(1,:)
t2 = cell2table(c2(2:end,:));
t2.Properties.VariableNames = c2(1,:)
% Find the indices of the missing data in t1
idxMissing = find(isnan(t1.B))
% Fill the values from t2
t3 = t1;
[~,idx2] = ismember(idxMissing,t2.A)
t3(idxMissing,:) = t2(idx2,:)
7 Commenti
Jon
il 5 Ott 2023
Modificato: Jon
il 5 Ott 2023
Here is another approach, that utilizes operations with timetables, rather than indexing. Still a little arcane, but maybe you prefer this. You may want to remove a few more semicolons at end of statements to understand better what each step is doing
c1 = {
'A' 'B' 'C' 'D'
1 5 4.1 "a"
2 3 3.8 "g"
3 2 3.6 "d"
4 NaN 4.2 "e"
5 6 4.2 "h"
6 2 3.6 "d"
7 NaN 2.5 "c"
8 5 1.8 "e"
9 NaN 4.3 "b"
10 6 2.6 "a"};
c2 = {
'A' 'B' 'C' 'D'
4 3 4.2 "e"
7 5 2.5 "c"
8 5 1.8 "e"
9 4 4.3 "b"
15 8 6.2 "g"};
% Put data into timetables, using A as the time variable
t1 = cell2table(c1(2:end,:),'VariableNames',c1(1,:));
t1.A = days(t1.A); % convert first column to a duration
t1 = table2timetable(t1);
t2 = cell2table(c2(2:end,:),'VariableNames',c2(1,:));
t2.A = days(t2.A); % convert first column to a duration
t2 = table2timetable(t2);
% Combine the tables (makes union of timetables) putting missing data
% wherever data is not found for a given time
t3 = synchronize(t1,t2)
% Replace the missing values using values from t2
% note by defualt, min omits missing, e.g. NaN
t3.B_t1 = min(t3.B_t1,t3.B_t2);
% Remove columns from t2
t3 = t3(:,1:3);
% Remove any rows that were in t2 but not t1
% These will have missing data from the synchronization
t3 = rmmissing(t3);
% Optionally clean up the variable names
t3.Properties.VariableNames = c1(1,2:end)
Vedere anche
Categorie
Scopri di più su Data Type Conversion in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!