Is it possible to join categorical variables in table according to group variables ?

5 visualizzazioni (ultimi 30 giorni)
I have a table (`A`) containing a string (`x`) of IDs and categorical (`y`) data types.
For example:
>> A.x
11×1 string array
"A-00555"
"A-01139"
"B-08811"
"B-00014"
"C-00007"
"C-00007"
"D-00015"
"D-00015"
"E-00048"
"E-00048"
"E-00048"
>> A.y
11×1 categorical array
APPLE
GRAPEFRUIT
COCONUT
APPLE
APPLE
BANANA
APPLE
COCONUT
APPLE
BANANA
KIWI
And I want to generate an array, of the same size as A.x, with a new categorical variable that "joins" all the A.y's of the same A.x(i). I may not be explaining this very well....
In the above example the resulting array would be something like this:
>> A.z
11×1 categorical array
APPLE
GRAPEFRUIT
COCONUT
APPLE
APPLE+BANANA
APPLE+BANANA
APPLE+COCONUT
APPLE+COCONUT
APPLE+BANANA+KIWI
APPLE+BANANA+KIWI
APPLE+BANANA+KIWI
Is there an efficient way to accomplish this? Is there a version of groupsummary—or something similiar—with a method option that is "concatenate categorical variable" according to groupvars?
Other info: The table contains a few million unique IDs. All rows of A are unique. There are 30 categorical variables.
  3 Commenti
Steven Lord
Steven Lord il 25 Ago 2020
That categorical array used to define A.y likely doesn't have categories like APPLE+BANANA or APPLE+BANANA+KIWI.
Do you need the result to be a categorical array or would the result being a string array be sufficient for your purposes?
David
David il 25 Ago 2020
@Steven Lord: You are correct. A.y does not contain those categories.
A string array as a "between-step" could work. I think I could then covert it to a categorical array...

Accedi per commentare.

Risposta accettata

Steven Lord
Steven Lord il 25 Ago 2020
I'd use findgroups. First let's define the data.
x = ["A-00555"; "A-01139"; "B-08811"; "B-00014"; "C-00007"; ...
"C-00007"; "D-00015"; "D-00015"; "E-00048"; "E-00048"; "E-00048"];
y = categorical(["APPLE"; "GRAPEFRUIT"; "COCONUT"; "APPLE"; "APPLE"; ...
"BANANA"; "APPLE"; "COCONUT"; "APPLE"; "BANANA"; "KIWI"]);
Now use findgroups to get the group numbers for each element in x.
g = findgroups(x);
join the elements of y (converted to string) in each group, putting a + between the elements.
s = splitapply(@(x) join(string(x), "+"), y, g);
Let's see the results as a table.
T= table(x, y, g, s(g))
  1 Commento
David
David il 25 Ago 2020
This got me 99.99% of the way there. I just modified the last line like this:
T = table(x, y, g, categorical(s(g)));
Thank you!

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Tables in Help Center e File Exchange

Prodotti


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by