Finding discrepancies between two spreadsheets in MATLAB

Good evening! I am currently trying to work on writing a script that will allow me to find discrepancies between two spreadsheets (.xls, .csv, .xlsm).
Currently, the two spreadsheets have thousands of rows but a set number of columns. In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference." Is there a way to import the two spreadsheets and have MATLAB re-order the tables based on this similarity?
My current code reads the XLS and CVS files into two separate variables data1 and data2. One of the issues I notice is that I don't know how to declare the first row of each variable as the variable name. I'm not exactly sure if I'm even on the right path at this point so I'm very eager to listen to suggestions. I'm very new to this coding thing, aha!
This is my current code:
clc;
clear;
close all;
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the RTC Export:');
filename = fullfile(folder, baseName);
% Reads data in from selected file and loads them into num (only numeric),
% txt (only string) and raw (all data)
[num, txt, raw] = xlsread(filename);
data = [raw(:,1) raw(:,2) raw(:,5) raw(:,11) raw(:,12)];
tabledata = cell2table(data);
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the JIRA Export:');
filename = fullfile(folder, baseName);
[num, txt, raw] = xlsread(filename);
data2 = [raw(:,3) raw(:,5) raw(:,2) raw(:,1)];
tabledata2 = cell2table(data2);

9 Commenti

Can you provide small sample files? You can include them as a zip. And can you provide an example of the intended output?
You can use the readtable function to import your data. You can use ismember function to compare the two columns. Use the index output from ismember to reorder your data in the same order, or perhaps use the sort function to sort both tables by the desired variable.
Attached are two spreadsheets that I have made to kind of simplify the larger file.
As you can see, there's different variables in each spreadsheet, however, the common thing they have is that Cross Reference == Issue ID. I would like to be able to create a code that matches the information of the corresponding row to one another and maybe even reorganize the output table to numerical order.
I can't seem to find an example on this, nor can I figure out how to delcare the first row to be the variable name.
I could not use innerjoin or outerjoin when trying to merge the two datasets together. Part of the reason for this is because one of the tables came from a .xls and the other from a .csv file. At least I think that's what the reason was.
T1 = readtable('testtable1.xlsx');
T2 = readtable('testtable2.xlsx');
T3 = innerjoin(T1, T2, 'leftkeys', 'IssueID', 'rightkeys', 'CrossReference');
Adam Danz
Adam Danz il 21 Gen 2020
Modificato: Adam Danz il 21 Gen 2020
"IssueKey" contains character arrays such as 'B-1', 'B-2' while CrossReference are numeric interger values such as 316216, 301258. Since there are no repeats within either of the columns from either spreadsheet that you attached, it's unclear how they pair with each other.
If the rows of both spread sheets are not in the same order, how do we pair the B-1 values with the 316216 values?
Well we know that for each Issue ID of table 1 is just Cross Reference of table 2. So whatever information is associated along the row of the ticket in reference should be associated with the row in the other table with the same cross reference number. Does that make sense?
Example:
Cross reference # 316216 belongs to testtable1 Issue-Key B-2 because of the matching IssueID.
I don't know if MATLAB has a function that finds matches on a per-row basis.
Adam Danz
Adam Danz il 21 Gen 2020
Modificato: Adam Danz il 21 Gen 2020
Ah, I see. This comment in your quesiton was misleading.
"In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference."

Accedi per commentare.

 Risposta accettata

Adam Danz
Adam Danz il 22 Gen 2020
Modificato: Adam Danz il 22 Gen 2020
This uses Walter's innerjoin suggestion and then sorts the combined table back into the IssueKey order. If you'd rather use the IssueID order, remove the last two lines.
% Read data in as tables
folder = 'C:\......';
baseName1 = 'testtable1.xlsx';
baseName2 = 'testtable2.xlsx';
filename1 = fullfile(folder, baseName1);
filename2 = fullfile(folder, baseName2);
T1 = readtable(filename1); % you'll get a warning that column names were modified
T2 = readtable(filename2); % you'll get a warning that column names were modified
% Look at the tables
head(T1)
head(T2)
% Change CrossReference to IssueID
T2.Properties.VariableNames(ismember(T2.Properties.VariableNames,'CrossReference')) = {'IssueID'};
% Join tables
T = innerjoin(T1,T2);
% or
% T = outerjoin(T1,T2);
% Resort new table based on original order in T1
% assumes that all rows of T1 are in T.
[~, sortIdx] = ismember(T1.IssueKey,T.IssueKey);
T = T(sortIdx,:);
Result
T =
15×6 table
IssueKey Summary IssueID Status DueDate OrderNo_
________ ___________________________ __________ __________ _____________ ________
{'B-1' } {'Bike chain broken.' } 6.5413e+05 {'Open' } {'1/22/2020'} 5492
{'B-2' } {'Bike light broken.' } 3.1622e+05 {'Open' } {'1/22/2020'} 8135
{'B-3' } {'Bike seat torn.' } 8.4162e+05 {'Open' } {'1/22/2020'} 4613
{'B-4' } {'Bike in good condition.'} 9.5322e+05 {'Closed'} {'Closed' } 1256
{'B-5' } {'Bike in good condition.'} 1.2319e+05 {'Closed'} {'Closed' } 1617
{'B-6' } {'Bike in good condition.'} 7.8551e+05 {'Closed'} {'Closed' } 6772
{'B-7' } {'Bike chain broken.' } 3.1549e+05 {'Open' } {'1/29/2020'} 4381
{'B-8' } {'Rusty bike chain.' } 9.6345e+05 {'Open' } {'1/22/2020'} 1269
{'B-9' } {'Handlebar bent.' } 4.5642e+05 {'Open' } {'1/21/2020'} 3028
{'B-10'} {'Bike seat torn.' } 2.3312e+05 {'Open' } {'1/22/2020'} 8684
{'B-11'} {'Bike chain broken.' } 1.7899e+05 {'Open' } {'1/24/2020'} 3432
{'B-12'} {'Bike in good condition.'} 5.1087e+05 {'Closed'} {'Closed' } 1251
{'B-13'} {'Bike in good condition.'} 5.4675e+05 {'Closed'} {'Closed' } 8155
{'B-14'} {'Bike chain broken.' } 1.012e+05 {'Open' } {'1/20/2020'} 9493
{'B-15'} {'Bike seat torn.' } 3.0126e+05 {'Open' } {'1/25/2020'} 5810

4 Commenti

Question, do you know if this would still work if let's say there was an Issue ID that didn't have a corresponding Cross Reference? Would there be a way to just tell the program, "Hey, this row here does not have a match in the other table, but we'll include you onto the output list anyway."
You could simulate that by removing a line from table 2 before doing innerjoin.
T2(5,:) = []; % 654131 <==> B-1
After running innerjoin you'll notice that the entire B-1 row is missing.
Replace innerjoin() with outerjoin() and the B-1 row will be included where missing data will be filled with missing value indicators.
First off, thank you again for your elegant answer. I think I'm about 50% there in terms of the vision of what I'd like. However, I failed to list a couple of things that may or may not mess up this code.
Let me rephrase this question:
Lets say I give you those same tables, except in some fields, there are missing data. For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these? And what happens if I didn't have straight up numerical values that were similar?
Second question: What if the similar values were in the format ABCDE-12345? Example, what if the Cross Reference number and the Issue ID were in the ABCDE-12345 format instead? What is the workaround for such a case?
"For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these?"
You could simulate that to see what would happen.
T1.Summary{12} = '';
If you use outerjoin you'll see an empty indicator for that value. If you use innerjoin you'll see that the entire row is missing.
I'm not sure I undertand your second question. If there was no way to pair the rows of each table, then each row would be considered a unique row.

Accedi per commentare.

Più risposte (0)

Categorie

Prodotti

Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by