Finding discrepancies between two spreadsheets in MATLAB

Question

0 voti

Good evening! I am currently trying to work on writing a script that will allow me to find discrepancies between two spreadsheets (.xls, .csv, .xlsm).

Currently, the two spreadsheets have thousands of rows but a set number of columns. In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference." Is there a way to import the two spreadsheets and have MATLAB re-order the tables based on this similarity?

My current code reads the XLS and CVS files into two separate variables data1 and data2. One of the issues I notice is that I don't know how to declare the first row of each variable as the variable name. I'm not exactly sure if I'm even on the right path at this point so I'm very eager to listen to suggestions. I'm very new to this coding thing, aha!

This is my current code:

clc;
clear;
close all;
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the RTC Export:');
filename = fullfile(folder, baseName);
% Reads data in from selected file and loads them into num (only numeric),
% txt (only string) and raw (all data)
[num, txt, raw] = xlsread(filename);
data = [raw(:,1) raw(:,2) raw(:,5) raw(:,11) raw(:,12)];
tabledata = cell2table(data);
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the JIRA Export:');
filename = fullfile(folder, baseName);
[num, txt, raw] = xlsread(filename);
data2 = [raw(:,3) raw(:,5) raw(:,2) raw(:,1)];
tabledata2 = cell2table(data2);

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Thanh Nguyen il 21 Gen 2020

Well we know that for each Issue ID of table 1 is just Cross Reference of table 2. So whatever information is associated along the row of the ticket in reference should be associated with the row in the other table with the same cross reference number. Does that make sense?

Example:

Cross reference # 316216 belongs to testtable1 Issue-Key B-2 because of the matching IssueID.

I don't know if MATLAB has a function that finds matches on a per-row basis.

Adam Danz il 21 Gen 2020

Modificato: Adam Danz il 21 Gen 2020

Ah, I see. This comment in your quesiton was misleading.

"In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference."

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Adam Danz il 22 Gen 2020

Modificato: Adam Danz il 22 Gen 2020

Apri in MATLAB Online

1 voto

This uses Walter's innerjoin suggestion and then sorts the combined table back into the IssueKey order. If you'd rather use the IssueID order, remove the last two lines.

% Read data in as tables
folder = 'C:\......'; 
baseName1 = 'testtable1.xlsx'; 
baseName2 = 'testtable2.xlsx'; 
filename1 = fullfile(folder, baseName1);
filename2 = fullfile(folder, baseName2);
T1 = readtable(filename1); % you'll get a warning that column names were modified
T2 = readtable(filename2); % you'll get a warning that column names were modified
% Look at the tables
head(T1)
head(T2)
% Change CrossReference to IssueID
T2.Properties.VariableNames(ismember(T2.Properties.VariableNames,'CrossReference')) = {'IssueID'}; 
% Join tables
T = innerjoin(T1,T2);
% or 
% T = outerjoin(T1,T2);
% Resort new table based on original order in T1
% assumes that all rows of T1 are in T.
[~, sortIdx] = ismember(T1.IssueKey,T.IssueKey);
T = T(sortIdx,:); 

Result

T =
  15×6 table
    IssueKey              Summary               IssueID        Status         DueDate       OrderNo_
    ________    ___________________________    __________    __________    _____________    ________
    {'B-1' }    {'Bike chain broken.'     }    6.5413e+05    {'Open'  }    {'1/22/2020'}      5492  
    {'B-2' }    {'Bike light broken.'     }    3.1622e+05    {'Open'  }    {'1/22/2020'}      8135  
    {'B-3' }    {'Bike seat torn.'        }    8.4162e+05    {'Open'  }    {'1/22/2020'}      4613  
    {'B-4' }    {'Bike in good condition.'}    9.5322e+05    {'Closed'}    {'Closed'   }      1256  
    {'B-5' }    {'Bike in good condition.'}    1.2319e+05    {'Closed'}    {'Closed'   }      1617  
    {'B-6' }    {'Bike in good condition.'}    7.8551e+05    {'Closed'}    {'Closed'   }      6772  
    {'B-7' }    {'Bike chain broken.'     }    3.1549e+05    {'Open'  }    {'1/29/2020'}      4381  
    {'B-8' }    {'Rusty bike chain.'      }    9.6345e+05    {'Open'  }    {'1/22/2020'}      1269  
    {'B-9' }    {'Handlebar bent.'        }    4.5642e+05    {'Open'  }    {'1/21/2020'}      3028  
    {'B-10'}    {'Bike seat torn.'        }    2.3312e+05    {'Open'  }    {'1/22/2020'}      8684  
    {'B-11'}    {'Bike chain broken.'     }    1.7899e+05    {'Open'  }    {'1/24/2020'}      3432  
    {'B-12'}    {'Bike in good condition.'}    5.1087e+05    {'Closed'}    {'Closed'   }      1251  
    {'B-13'}    {'Bike in good condition.'}    5.4675e+05    {'Closed'}    {'Closed'   }      8155  
    {'B-14'}    {'Bike chain broken.'     }     1.012e+05    {'Open'  }    {'1/20/2020'}      9493  
    {'B-15'}    {'Bike seat torn.'        }    3.0126e+05    {'Open'  }    {'1/25/2020'}      5810  

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Thanh Nguyen il 22 Gen 2020

First off, thank you again for your elegant answer. I think I'm about 50% there in terms of the vision of what I'd like. However, I failed to list a couple of things that may or may not mess up this code.

Let me rephrase this question:

Lets say I give you those same tables, except in some fields, there are missing data. For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these? And what happens if I didn't have straight up numerical values that were similar?

Second question: What if the similar values were in the format ABCDE-12345? Example, what if the Cross Reference number and the Issue ID were in the ABCDE-12345 format instead? What is the workaround for such a case?

Adam Danz il 22 Gen 2020

Apri in MATLAB Online

"For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these?"

You could simulate that to see what would happen.

T1.Summary{12} = '';

If you use outerjoin you'll see an empty indicator for that value. If you use innerjoin you'll see that the entire row is missing.

I'm not sure I undertand your second question. If there was no way to pair the rows of each table, then each row would be considered a unique row.

Accedi per commentare.

Finding discrepancies between two spreadsheets in MATLAB

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Risposta accettata

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

Finding discrepancies between two spreadsheets in MATLAB

9 Commenti Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Risposta accettata

4 Commenti Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti