How to output specific rows from tables depending on values within the table?

15 visualizzazioni (ultimi 30 giorni)
I have a table of variable I have pulled form an excel spread sheet (the actual file is 45 columns X 2000 rows). But this gives the idea of what I am trying to achieve.
I would like to find each separate participants (identified by their 'ID') maximum jump height for that 'season' of testing and remove the other rows.
The Table I'm working with is like this (but extended):
[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
'Pre' 1 0001 U14 40 35 685 0.3 1100
'Pre' 2 0001 U14 40 32 630 0.25 1200
'Pre' 1 0002 U14 42 40 750 0.42 1000
'Pre' 2 0002 U14 42 36 700 0.4 1300
'Pre' 1 0003 U14 45 32 610 0.3 1111
'Pre' 2 0003 U14 45 28 600 0.3 1600
'Post' 1 0001 U14 40 35 685 0.3 1100
'Post' 2 0001 U14 40 32 630 0.25 1200
'Post' 1 0002 U14 42 40 750 0.42 1000
'Post' 2 0002 U14 42 36 700 0.4 1300
'Post' 1 0003 U14 45 32 610 0.3 1111
'Post' 2 0003 U14 45 28 600 0.3 1600
What I aim to end up with is something more like
[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
'Pre' 1 0001 U14 40 35 685 0.3 1100
'Pre' 1 0002 U14 42 40 750 0.42 1000
'Pre' 1 0003 U14 45 32 610 0.3 1111
'Post' 1 0001 U14 40 35 685 0.3 1100
'Post' 1 0002 U14 42 40 750 0.42 1000
'Post' 1 0003 U14 45 32 610 0.3 1111
  2 Commenti
Paolo
Paolo il 5 Giu 2018
Modificato: Paolo il 5 Giu 2018
JumpHeight seems to be already sorted for the ID for every pair or rows, so if that's always the case you could just delete every other row.
Alternatively, if its not always sorted, you could sort every two rows by JumpHeight.
Could you attach a sample spreadsheet?
Sean Byrne
Sean Byrne il 5 Giu 2018
Attached is a sample of the data I'm working with. As you can see jump height is random within the three trials.

Accedi per commentare.

Risposte (3)

Razvan Carbunescu
Razvan Carbunescu il 5 Giu 2018
Modificato: Razvan Carbunescu il 5 Giu 2018
If you're using R2018a and only interested in maximum JumpHeight can use groupsummary on table T:
>> GT = groupsummary (T,{'Season','ID'},'max','JumpHeight')
GT =
6×4 table
Season ID GroupCount max_JumpHeight
______ __ __________ ______________
'Post' 1 2 35
'Post' 2 2 40
'Post' 3 2 32
'Pre' 1 2 35
'Pre' 2 2 40
'Pre' 3 2 32
If you want to get all the row information or on an earlier release can use findgroups / splitapply workflow
idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
function T = maxidx(varargin)
[~,i] = max(varargin{6});
tmpvarargout = cellfun(@(x) x(i,:),varargin,'UniformOutput',false);
T = table(tmpvarargout{:});
end
Sample Output
GT =
6×9 table
Season TrialNo ID AgeGroup bodyMass JumpHeight Force FlighTime LandingForce
______ _______ __ ________ ________ __________ _____ _________ ____________
'Post' 1 1 14 40 35 685 0.3 1100
'Post' 1 2 14 42 40 750 0.42 1000
'Post' 1 3 14 45 32 610 0.3 1111
'Pre' 1 1 14 40 35 685 0.3 1100
'Pre' 1 2 14 42 40 750 0.42 1000
'Pre' 1 3 14 45 32 610 0.3 1111
Edit: Script assumes JumpHeight is 6th column in table, might have to modify for correct position
  6 Commenti
Sean Byrne
Sean Byrne il 7 Giu 2018
Also as a fix I tried changing the (T.Season,T.ID) to {T.Season,T.ID} and that changed the error message to Undefined variable "findgroups" or class "findgroups"
Razvan Carbunescu
Razvan Carbunescu il 7 Giu 2018
I had missed the fact that you're on R2014a. findgroups/splitapply were introduced in R2016b.
I think the way to try to get it in R2014a is to use sortrows and unique with the rows flag to find the indexing to the first sorted highest value.
ST = sortrows(T,{'Season' 'ID' 'JumpHeight'},{'ascend' 'ascend' 'descend'});
% taking advantage here of the fact that ST is sorted by JumpHeight and unique returns first element
[~,idx] = unique([double(categorical(ST.Season)) ST.ID],'rows');
GT = ST(idx,:)

Accedi per commentare.


Are Mjaavatten
Are Mjaavatten il 5 Giu 2018
Modificato: Are Mjaavatten il 11 Giu 2018
I am a little uncertain about the type of data structure you use. For completeness I therefore entered your data in an Excel workbook that I read using readtable.
If there are always exactly two trials per ID and season:
T0 = readtable('Byrne.xlsx');
rows = [];
for i = 1:2:size(T0,1)-1
[~,j] = max(T0.JumpHeight(i:i+1));
rows = [rows;i+j-1];
end
T2 = T0(rows,:);
If the number of trials may vary:
T0 = sortrows(T0,'ID');
T0 = sortrows(T0,'Season','descend');
J = [find(diff([0;T0.ID])~=0);size(T0,1)]; % Indices for each ID change
rows = [];
for i = 1:length(J)-1
[~,j] = max(T0.JumpHeight(J(i):J(i+1)-1));
rows = [rows;J(i)+j-1];
end
T2 = T0(rows,:);

Peter Perkins
Peter Perkins il 3 Lug 2018
In more recent versions of MATLAB there are several ways to do this. In R2014a, do a grouped varfun, using @max as the function to apply, ID and Season as the grouping Variables, and JumpHight as the InputVariable.

Categorie

Scopri di più su Data Distribution Plots in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by