How can i extract data from data from table using a string of words.

1 visualizzazione (ultimi 30 giorni)
I am trying to extract data from a file but using a nested loop and if statement has not worked. The main file has column 1 as genes ranging form (1:60000) its a very large file. These genes have names like DUOX1 and so on. Now each column has the patient name a total of 45. Each gene has an expression number for each patient. What i want to do is to extract a certain amount of genes by using their names to extract all the expression for all patients.
the loop that i have been trying has been:
%16 TDS Genes
data_table=readtable("Normal and Tumor TPM.xlsx","Sheet","Normal Expression","VariableNamingRule","preserve");
tds_genes_table=readtable("16 TDS.xlsx","Sheet","Normal");
genes_16 = string(tds_genes_table.Properties.VariableNames);
n = max(size(data_table)) %gives total amount of genes in the main file "data". Each has a name like DUOX1
m = max(size(genes_16)) %contains gene names that i want to extract from data
%loop to match 16 tds genes with patient and expression
for i = 1:n
for j = 1:m
if i=j
expression(j,:) = data_table(i,:)
end
end
end
  6 Commenti
Jose Mendoza Garcia
Jose Mendoza Garcia il 9 Set 2022
what i want to do is extract these 16 genes form the large file with all the expression information
this are the 16 genes:
DIO1 DIO2 DUOX1 DUOX2 FOXE1 GLIS3 NKX2-1 PAX8 SLC26A4 SLC5A5 SLC5A8 TG THRA THRB TPO TSHR

Accedi per commentare.

Risposte (2)

Cris LaPierre
Cris LaPierre il 9 Set 2022
I think the Access Data in a Table page is what you want. You might also be interested in the height and width functions for tables. Also, I think this is a scenario where it is best to set the TextType to string.
I think the code might look something like this.
data_table=readtable("Normal and Tumor TPM.xlsx","Sheet","Normal Expression","VariableNamingRule","preserve","TextType","string")
tds_genes_table=readtable("16 TDS.xlsx","Sheet","Normal","TextType","string")
genes_16 = string(tds_genes_table.Properties.VariableNames)
idx = ismember(data_table.Gene_ID,genes_16)
expression = data_table(idx,:)

dpb
dpb il 9 Set 2022
Modificato: dpb il 9 Set 2022
data_table=readtable("Normal and Tumor TPM.xlsx","Sheet","Normal Expression","VariableNamingRule","preserve");
geneslist=strtrim({'DIO1';'DIO2';'DUOX1';'DUOX2';'FOXE1';'GLIS3';'NKX2-1';'PAX8';'SLC26A4 ';'SLC5A5 ';'SLC5A8 ';'TG ';'THRA ';'THRB ';'TPO ';'TSHR'});
data_table.GENEVARIABLENAME=categorical(data_table.GENEVARIABLENAME); % turn to categorical if isn't; you don't tell us what the variable names are...
tWantedByGene=data_table(matches,data_table.GENEVARIABLENAME,geneslist),:); % select rows with matching genes
should be all you're looking for...
As always, attaching a representative section of the data file would facilitate writing actual code to address any problems.

Categorie

Scopri di più su Large Files and Big Data in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by