multiple condition sumif without loop

22 visualizzazioni (ultimi 30 giorni)
ahmed920
ahmed920 il 26 Mag 2022
Risposto: Seth Furman il 2 Giu 2022
I have the following table and want to sum over unique dates and Accounts to get the daily changes per account
TransactionID = {'102001';'102002';'102003';'102004';'102005';'102006';'102007';'102008';'102009';'102010'};
Date = datetime({'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-09';'2021-02-09';'2021-02-09';'2021-02-17'},"InputFormat","uuuu-MM-dd");
AmountDebited = [3000; 0; 0; 3000; 4443; 2.02877864; 430; 50; 200; 0.297598984636403];
AccountDebited = ["TD-CAD";"LC-CAD";"LC-CAD";"SI-CAD";"SP-CAD";"LC-USD";"SP-CAD";"SP-CAD";"SP-CAD";"LC-USD"];
AmountCredited = [3000; 0; 0; 3000; 2.03048665; 4450; 0.18713085; 0.02177471; 0.0869854; 0.295498984636403];
AccountCredited = ["SP-CAD"; "MM-LPT"; "KN-DIR"; "SP-CAD"; "LC-USD"; "SI-CAD"; "LC-USD"; "LC-USD"; "LC-USD"; "MM-USD"];
X = table(TransactionID, Date, AmountDebited, AccountDebited, AmountCredited, AccountCredited);
Right Now I can do it using the following for loop to compute the result but My actual table has thousands of records and the calculation takes forever so have been trying to get it to work using indexes but had no luck so far.
UniqueDebit = unique(X.AccountDebited);
UniqueCredit = unique(X.AccountCredited);
UniqueConc = unique([UniqueDebit;UniqueCredit]);
UniqueDates = unique(X.Date);
for i = 1:length(UniqueDates)
for j = 1:height(UniqueConc)
RowsCredited = find(X.Date == UniqueDates(i) & X.AccountCredited == UniqueConc(j));
RowsDebited = find(X.Date == UniqueDates(i) & X.AccountDebited == UniqueConc(j));
DailyCredited = sum(X.AmountCredited(RowsCredited));
DailyDebited = sum(X.AmountDebited(RowsDebited));
DailyChanges = DailyCredited - DailyDebited;
XMatrix(i,j) = DailyChanges;
end
end
XTable=array2table(XMatrix);
XTable.Properties.VariableNames = UniqueConc;
XTable.Properties.RowNames = string(UniqueDates);

Risposta accettata

Jan
Jan il 26 Mag 2022
Modificato: Jan il 26 Mag 2022
Start with avoiding repeated work:
XMatrix = zeros(length(UniqueDates), height(UniqueConc)); % Pre-allocate!!!
xCredit = X.AccountCredited; % Abbreviation
xDebit = X.AccountDebited;
for i = 1:length(UniqueDates)
tmp = (X.Date == UniqueDates(i)); % Move out of the inner loop
for j = 1:height(UniqueConc)
RowsCredited = (tmp & xCredited == UniqueConc(j));
RowsDebited = (tmp & xDebited == UniqueConc(j));
DailyCredited = sum(X.AmountCredited(RowsCredited)); % [EDITED]
DailyDebited = sum(X.AmountDebited(RowsDebited)); % [EDITED]
XMatrix(i,j) = DailyCredited - DailyDebited;
end
end
Avoid the find(), because logical indexing is faster.
Is XMatrix preallocated?
  3 Commenti
Jan
Jan il 26 Mag 2022
Modificato: Jan il 26 Mag 2022
It is hard to optimize code without having realistic input data. I cannot estimate, if an accumarray approach is faster here. Are you sure, that the loop is the bottleneck?
Another idea is to use ismember once, such that Matlab can search in numerical indices instead of string arrays:
xAccDebit = X.AccountDebited;
xAccCredit = X.AccountCredited;
xAmDebit = X.AmountDebited;
xAmCredit = X.AmountCredited;
uDebit = unique(xAccDebit);
uCredit = unique(xAccCredit);
uConc = unique([uDebit; uCredit]);
[uDates, ~, uDateInd] = unique(X.Date);
[~, creditInd] = ismember(xAccCredit, uConc);
[~, debitInd] = ismember(xAccDebit, uConc);
X = zeros(numel(uDates), numel(uConc)); % Pre-allocate
for i = 1:numel(uDates)
tmp = (uDateInd == i); % same as: (X.Date == uDates(i));
mCreditInd = creditInd .* tmp; % mask FALSE in tmp as 0 in vector
mDebitInd = debitInd .* tmp; %
for j = 1:numel(uConc)
RowsCredited = (mCreditInd == j);
RowsDebited = (mDebitInd == j);
DailyCredited = sum(xAmCredit(RowsCredited));
DailyDebited = sum(xAmDebit(RowsDebited));
X(i,j) = DailyCredited - DailyDebited;
end
end
Is numel(uDates) much larger than numel(uConc)? Maybe swapping the loops is faster.
I've edited the code in my first answer: I've abbreviated "X.AccountDebited" and "X.AmountDebited" both by "xDebited".
Are you working with less than 65536 elements? Then it might save some time, if you convert mCreditInd and mDebitInd to UINT16 and the loop e.g. to:
for j = uint16(1):uint16(numel(uConc))
ahmed920
ahmed920 il 27 Mag 2022
Modificato: ahmed920 il 27 Mag 2022
yeah number uDates is 4 times larger than numel uConc. for now yeah less than 65536 elements. Thanks this works it further reduced my time by a factor of 4.

Accedi per commentare.

Più risposte (1)

Seth Furman
Seth Furman il 2 Giu 2022
It's worth noting that a lot of these computations can be expressed more succinctly using groupsummary, outerjoin, and fillmissing.
TransactionID = {'102001';'102002';'102003';'102004';'102005';'102006';'102007';'102008';'102009';'102010'};
Date = datetime({'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-09';'2021-02-09';'2021-02-09';'2021-02-17'},"InputFormat","uuuu-MM-dd");
AmountDebited = [3000; 0; 0; 3000; 4443; 2.02877864; 430; 50; 200; 0.297598984636403];
AccountDebited = ["TD-CAD";"LC-CAD";"LC-CAD";"SI-CAD";"SP-CAD";"LC-USD";"SP-CAD";"SP-CAD";"SP-CAD";"LC-USD"];
AmountCredited = [3000; 0; 0; 3000; 2.03048665; 4450; 0.18713085; 0.02177471; 0.0869854; 0.295498984636403];
AccountCredited = ["SP-CAD"; "MM-LPT"; "KN-DIR"; "SP-CAD"; "LC-USD"; "SI-CAD"; "LC-USD"; "LC-USD"; "LC-USD"; "MM-USD"];
X = table(TransactionID, Date, AmountDebited, AccountDebited, AmountCredited, AccountCredited);
Find total amounts credited/debited for each date-account pair
credited = groupsummary(X,["Date","AccountCredited"],"sum",["AmountCredited"])
credited = 7×4 table
Date AccountCredited GroupCount sum_AmountCredited ___________ _______________ __________ __________________ 06-Feb-2021 "KN-DIR" 1 0 06-Feb-2021 "LC-USD" 1 2.0305 06-Feb-2021 "MM-LPT" 1 0 06-Feb-2021 "SI-CAD" 1 4450 06-Feb-2021 "SP-CAD" 2 6000 09-Feb-2021 "LC-USD" 3 0.29589 17-Feb-2021 "MM-USD" 1 0.2955
debited = groupsummary(X,["Date","AccountDebited"],"sum",["AmountDebited"])
debited = 7×4 table
Date AccountDebited GroupCount sum_AmountDebited ___________ ______________ __________ _________________ 06-Feb-2021 "LC-CAD" 2 0 06-Feb-2021 "LC-USD" 1 2.0288 06-Feb-2021 "SI-CAD" 1 3000 06-Feb-2021 "SP-CAD" 1 4443 06-Feb-2021 "TD-CAD" 1 3000 09-Feb-2021 "SP-CAD" 3 680 17-Feb-2021 "LC-USD" 1 0.2976
Match total amounts credited/debited by date-account pair
credited.GroupCount = [];
debited.GroupCount = [];
sums = outerjoin(credited,debited,LeftKeys=["Date","AccountCredited"],RightKeys=["Date","AccountDebited"],MergeKeys=true)
sums = 11×4 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited ___________ ______________________________ __________________ _________________ 06-Feb-2021 "KN-DIR" 0 NaN 06-Feb-2021 "LC-CAD" NaN 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 06-Feb-2021 "MM-LPT" 0 NaN 06-Feb-2021 "SI-CAD" 4450 3000 06-Feb-2021 "SP-CAD" 6000 4443 06-Feb-2021 "TD-CAD" NaN 3000 09-Feb-2021 "LC-USD" 0.29589 NaN 09-Feb-2021 "SP-CAD" NaN 680 17-Feb-2021 "LC-USD" NaN 0.2976 17-Feb-2021 "MM-USD" 0.2955 NaN
Fill missing data
sums = fillmissing(sums,"constant",0,DataVariables=[3 4])
sums = 11×4 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited ___________ ______________________________ __________________ _________________ 06-Feb-2021 "KN-DIR" 0 0 06-Feb-2021 "LC-CAD" 0 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 06-Feb-2021 "MM-LPT" 0 0 06-Feb-2021 "SI-CAD" 4450 3000 06-Feb-2021 "SP-CAD" 6000 4443 06-Feb-2021 "TD-CAD" 0 3000 09-Feb-2021 "LC-USD" 0.29589 0 09-Feb-2021 "SP-CAD" 0 680 17-Feb-2021 "LC-USD" 0 0.2976 17-Feb-2021 "MM-USD" 0.2955 0
Compute net change by date-account pair
sums.NetChange = sums.sum_AmountCredited - sums.sum_AmountDebited
sums = 11×5 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited NetChange ___________ ______________________________ __________________ _________________ _________ 06-Feb-2021 "KN-DIR" 0 0 0 06-Feb-2021 "LC-CAD" 0 0 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 0.001708 06-Feb-2021 "MM-LPT" 0 0 0 06-Feb-2021 "SI-CAD" 4450 3000 1450 06-Feb-2021 "SP-CAD" 6000 4443 1557 06-Feb-2021 "TD-CAD" 0 3000 -3000 09-Feb-2021 "LC-USD" 0.29589 0 0.29589 09-Feb-2021 "SP-CAD" 0 680 -680 17-Feb-2021 "LC-USD" 0 0.2976 -0.2976 17-Feb-2021 "MM-USD" 0.2955 0 0.2955

Categorie

Scopri di più su Data Type Identification in Help Center e File Exchange

Prodotti


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by