How to count data based on the categories
Mostra commenti meno recenti
There is a Master matrix as follow that includes a unique ID (first column), different ID (second column) and last three columns (C/D/E) are included a number with a different range. (e.g. Column #3 included 1:2; column #4 included 1:2; column #5 included 1:10). I want a new matrix (like matrix OT in the following) that according to the unique ID find in the first column, count column 3 to 5 based on the different category. (e.g. ID = 1, column #3, category #1 = counted as 5, category #2 = counted as 0).
Master = [1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10]
Example of output for first ID #1
OT = [1 5 0 2 3 2 0 0 0 0 0 0 1 1 0];

1 Commento
Risposta accettata
Più risposte (2)
Image Analyst
il 22 Ott 2015
If you have the Statistics and Machine Learning Toolbox, you can use grpstats():
Master = [...
1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10]
statsArray = grpstats(Master, Master(:,1), 'sum')
and in the command window you'll see
statsArray =
5 202692 5 8 28
12 381496 6 7 36
9 107462 3 5 19
Columns 3-5 in statsArray are the sums in columns 3-5 broken down by category number in column 1 of Master.
2 Commenti
Moe
il 22 Ott 2015
Image Analyst
il 23 Ott 2015
Sorry, I didn't understand your definition/distinction between count and sum. If you want count, it seems to get the number of unique numbers. So you can just do
statsArray = grpstats(Master, Master(:,1), @fun)
with "fun" being defined as:
function num = fun(array)
num= length(unique(array));
It seems a lot simpler than the answer you chose, but whatever... I know I modified my answer after you had already picked a solution. If you want a :one-liner" you can still use it.
Peter Perkins
il 23 Ott 2015
Another possibility, using rowfun and a table. This code:
Master = ...
[1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10];
M = array2table(Master,'VariableNames',{'A' 'B' 'C' 'D' 'E'});
outNames = {'C1' 'C2' 'D1' 'D2' 'E1' 'E2' 'E3' 'E4' 'E5' 'E6' 'E7' 'E8' 'E9' 'E10'};
counts = rowfun(@fun, M,'GroupingVariables','A','InputVariables',{'C' 'D' 'E'},'OutputVariableNames',outNames)
... with this function:
function [varargout] = fun(C,D,E)
counts = [histc(C',1:2) histc(D',1:2) histc(E',1:10)];
varargout = num2cell(counts);
... produces this output:
counts =
A GroupCount C1 C2 D1 D2 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10
_ __________ __ __ __ __ __ __ __ __ __ __ __ __ __ ___
1 5 5 0 2 3 2 0 0 0 0 0 0 1 2 0
2 6 6 0 5 1 1 0 0 0 0 2 1 2 0 0
3 3 3 0 1 2 0 1 0 0 0 0 1 0 0 1
There are probably better ways to provide this result, for example with a table more like this:
ans =
A GroupCount Ccounts Dcounts Ecounts
_ __________ _______ _______ _____________
1 5 5 0 2 3 [1x10 double]
2 6 6 0 5 1 [1x10 double]
3 3 3 0 1 2 [1x10 double]
but the former is more or less what you seem to be asking for.
Categorie
Scopri di più su Repeated Measures and MANOVA in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!