I have a large table with over 13,000 rows, each row represents an experiment and containins a set of genes. I would like to get generate a list of all the genes and the number of experiments that contain that gene. For example:

Question

Adnan il 5 Ott 2017

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/359864-i-have-a-large-table-with-over-13-000-rows-each-row-represents-an-experiment-and-containins-a-set-o

Risposto: Peter Perkins il 13 Ott 2017

Hello,

tableA:

Experiment,  Gene1, Gene2, Gene3, Gene4;
'1',            'A',      'C',   'D',     '';
'2',            'A',      'C',   'G',    '';
'3',            'C',      'B',   'D',   'D';

What I would like is a count of number of experiments that each gene occurs in ignoring duplicates within the same experiment. For Example A = 2, B = 1, C = 3, D = 2

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Guillaume il 5 Ott 2017

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/359864-i-have-a-large-table-with-over-13-000-rows-each-row-represents-an-experiment-and-containins-a-set-o#answer_284476

Apri in MATLAB Online

Here is how I'd do it:

%demo input
tableA = cell2table({...
    '1',            'A',      'C',   'D',     '';
    '2',            'A',      'C',   'G',    '';
    '3',            'C',      'B',   'D',   'D';}, ...
                    'VariableNames', ...
                    {'Experiment', 'Gene1', 'Gene2', 'Gene3', 'Gene4'})
%processing
genes = unique(tableA{:, 2:5}).';   %get list of genes
genes(1) = [];     %get rid of ''
expcount = sum(cell2mat(rowfun(@(g) ismember(genes, g), tableA, 'SeparateInputs', false, 'InputVariables', 2:5, 'OutputFormat', 'cell')));
result = array2table(expcount, 'VariableNames', compose('gene_%s', string(genes)))  %for pretty display

2 Commenti
Mostra NessunoNascondi Nessuno

Adnan il 5 Ott 2017

Thank you. This is a great solution. I had to read the rowfun help page several times to understand how this is working but its an elegant answer.

So my understanding is correct, the reason you used the variable g is so you can pass one row of tableA into the ismember function?

Thanks again

Guillaume il 5 Ott 2017

Exactly, g is the the name of the input to the anonymous function. It receives a row of the table, and check the list of genes against it. That gives you an array of 0 and 1 telling which genes are present in the row. Summing all the rows gives you the result you want.

Accedi per commentare.

Answer 2

Stephen il 5 Ott 2017

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/359864-i-have-a-large-table-with-over-13-000-rows-each-row-represents-an-experiment-and-containins-a-set-o#answer_284447

Modificato: Stephen il 5 Ott 2017

Apri in MATLAB Online

Given your example and format, you will probably want to make use of a for loop and the "unique" function:

Gene = struct('numExp',[],...) %Add other properties of experiments
AllGenes = struct('A',Gene ,'B',Gene ,'C',Gene ,...etc) % Each gene gets a structure field
for ptr = 1:13000
    currGenes = unique(tableA(2:end));
    for ptr2 = 1:length(currGenes)
        incGene = currGenes{ptr2};
        AllGenes.(incGene).numExp = AllGenes.(incGene).numExp + 1;
    end
end

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Adnan il 5 Ott 2017

For the below line from your code, are you finding the unique genes within a row (experiment) of tabeA?

currGenes = unique(tableA(2:end));

Accedi per commentare.

Answer 3

Peter Perkins il 13 Ott 2017

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/359864-i-have-a-large-table-with-over-13-000-rows-each-row-represents-an-experiment-and-containins-a-set-o#answer_285671

Apri in MATLAB Online

+1 for Guillaume's use of rowfun.

Here's another possibility that might be wiorth looking at if you have very large number of experiments and a smaller number of genes:

>> tableA = cell2table({...
    '1',            'A',      'C',   'D',     '';
    '2',            'A',      'C',   'G',    '';
    '3',            'C',      'B',   'D',   'D';}, ...
                    'VariableNames', ...
                    {'Experiment', 'Gene1', 'Gene2', 'Gene3', 'Gene4'});

Turn things into categorical because categorical is useful:

>> tableA.Experiment = categorical(tableA.Experiment);
>> genes = {'A' 'B' 'C' 'D' 'G'};
>> tableA.Gene1 = categorical(tableA.Gene1,genes);
>> tableA.Gene2 = categorical(tableA.Gene2);
>> tableA.Gene3 = categorical(tableA.Gene3);
>> tableA.Gene4 = categorical(tableA.Gene4);

Create one array, then look across rows for each gene, looping over genes instead of rows. This could create a 1-row table, or a scalar struct, or a double vector (with no names). I chose the struct, just because a table with only one row is kind of overkill:

>> allGenes = [tableA.Gene1 tableA.Gene2 tableA.Gene3 tableA.Gene4]
allGenes = 
  3×4 categorical array
     A      C      D      <undefined> 
     A      C      G      <undefined> 
     C      B      D      D           
>> for i = 1:length(genes)
    counts.(genes{i}) = sum(any(allGenes==genes{i},2));
end
>> counts
counts = 
  struct with fields:
      A: 2
      B: 1
      C: 3
      D: 2
      G: 1

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

I have a large table with over 13,000 rows, each row represents an experiment and containins a set of genes. I would like to get generate a list of all the genes and the number of experiments that contain that gene. For example:

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (3)

2 Commenti
Mostra NessunoNascondi Nessuno

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

I have a large table with over 13,000 rows, each row represents an experiment and containins a set of genes. I would like to get generate a list of all the genes and the number of experiments that contain that gene. For example:

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (3)

2 Commenti Mostra NessunoNascondi Nessuno

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti