Find duplicate entries and sum up their associated values then put everything back in a table
8 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello everyone,
I have an excel sheet with two columns. The first column is a list of names, and the second column is the score of each person. Some names have multiple score and I want to find duplicate entries and sum up their associated values then put everything back in a table. Basically clean up the duplicate but want the sum of the scores. The excel file is attached as well
I would really appreciate if you have a script or a function can achieve the results.
Here where I’m at
Data = readtable("Data.xlsx")
unique(Data.Names)
[C,ia,idx] = unique(Data(:,1),'stable');
val = accumarray(idx,Data(:,2),[],@sum);
mat = [C val]
0 Commenti
Risposta accettata
dpb
il 8 Lug 2021
tData = readtable("Data.xlsx");
tData.Properties.VariableNames={'Name','Score'}; % set a known set of variable names
tSum=rowfun(@sum,tData,'InputVariables','Score','GrouptingVariables','Name');
writetable(tSum,'PickAnOutputFileName.xls')
Più risposte (1)
Cris LaPierre
il 8 Lug 2021
It gets a bit trickly since you don't want to change the order of the names, but I think you could do this with groupsummary.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
Data.Order = (1:height(Data))'
% Use groupsummary to find sum and min of scores and order for each unique name
sumTbl = groupsummary(Data,"Names",["sum","min"],["Scores" "Order"])
% sort the results, which are alphabetical, so they are back in the original order
sumTbl = sortrows(sumTbl,"min_Order");
% create a new table with just Names and total scores
finalData = sumTbl(:,["Names","sum_Scores"]);
% rename 'sum_Scores' back to 'Scores'
finalData.Properties.VariableNames(2) = "Scores"
3 Commenti
dpb
il 8 Lug 2021
I didn't think of groupsummary, Chris. Good thinking. Was keeping the order an input requirement? I missed it if it was. I've done same as you before several times; have one humongous spreadsheet that has to get updated piecemeal -- it has a (hidden) column that contains =ROW() precisely for the purpose.
Cris LaPierre
il 8 Lug 2021
I had interpreted it as having to be in the same order, but on a re-read, that is not stated. If not, it gets much simpler.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
% Use groupsummary to find sum scores for each unique name
finalData = groupsummary(Data,"Names","sum")
This keeps the GroupCount information, but that's not bad info to have either.
Vedere anche
Categorie
Scopri di più su Logical in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!