How do I assign rows of a variable to categories?

Question

Maximilian Fenski il 20 Apr 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1700990-how-do-i-assign-rows-of-a-variable-to-categories

Risposto: Vatsal il 29 Set 2023

Hello,

i have a table ("data") that consists of 4 variables (688 rows), this is how the upper 6 rows look like:

Pseudonym Indication Study-name Sequence

Patient_001 1 1 1

Patient_002 2 2 2

Patient_003 3 3 1

Patient_004 3 1 1

Patient_005 4 2 2

Patient_006 4 5 2

I want to find all groups defined by "Indication" "Study-name" "Sequence".

I created a new table: data1 = data(:,{'indication' 'study_name' 'sequence'}) and then used

[p,v] = findgroups(data1) to find all possible groups.

Now I want to assign each row in "Pseudonym" to one of these groups.

My goal is to create a new variable for every group, containing all Pseudonyms that belong to that group.

In the next step i want to randomly pick pseudonyms from each group.

Furthermore I would like to take the group-size (e.g. number of pseudonyms in one group) into consideration.

That means, that if I want to randomly pick 20 Patients from all categories and one group contains 50% of the data, then 10 patients should be picked out of this group.

could you please help me setting up the code!

Thank you so much!

Max

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Vatsal il 29 Set 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1700990-how-do-i-assign-rows-of-a-variable-to-categories#answer_1321771

Apri in MATLAB Online

Hi @Maximilian Fenski,

I understand that you have a table “data” which consists of four columns, and you want to find the groups based on the columns "Indication", "Study-name" and "Sequence". After finding the groups you want to assign each row in “Pseudonym” to one of these groups.

After this, it is required to randomly pick “x” number of “Pseudonym” from all groups, keeping the group size in consideration.

I am attaching the code below which will randomly pick the “Pseudonym” from all groups while considering the group-size:

data1 = data(:, {'Indication', 'Study-name', 'Sequence'});
[p, v] = findgroups(data1);
groups = splitapply(@(x) {x}, data.Pseudonym, p);
numPicks = 20; % Number of pseudonyms to pick in total
pickedPseudonyms = [];
totalPseudonyms = sum(cellfun(@numel, groups));
scalingFactor = numPicks / totalPseudonyms;
[~, sortedIndices] = sort(cellfun(@numel, groups), 'descend');
sortedGroups = groups(sortedIndices);
for i = 1:numel(sortedGroups)
    groupSize = numel(sortedGroups{i});
    picksFromGroup = round(groupSize * scalingFactor); % Adjust picks based on group size
    
    if picksFromGroup > 0
        randomIndices = randperm(groupSize, min(groupSize, picksFromGroup));
        pickedPseudonyms = [pickedPseudonyms, sortedGroups{i}(randomIndices)];
    end
    
    % Break the loop if 20 pseudonyms are selected
    if numel(pickedPseudonyms) >= numPicks
        break;
    end
end