the number of occurences of each character of one string,in another
Mostra commenti meno recenti
i have a string of more than 100 characters (fasta format of a protein sequence. like
'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH'
which is being shortened here for simplicity) and i want to find out whether or not it is hydrophobic. so i have to check the number of occurrences of each of the characters in the set 'A C F I L M P V W Y'(hydrophob amino acids) in my fasta string. considering the very long length of fasta strings, is there any easy way to do that by matlab string functions?
Risposta accettata
Più risposte (4)
Peter Perkins
il 29 Dic 2014
Another possibility:
>> s = 'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH';
>> t = 'ACFILMPVWY';
>> n = hist(double(s),1:90);
>> n(t)
ans =
6 2 4 6 13 2 7 7 1 7
1 Commento
Jan
il 30 Dic 2014
This is a histogram problem, so histc is an efficient and direct solution.
Luuk van Oosten
il 24 Gen 2015
Modificato: Luuk van Oosten
il 24 Gen 2015
I reckon you are using the BioInformatics Toolbox. In that case you can probably use:
aacount('SEQ')
Where SEQ is of course your sequence of interest: MEQNGLDHDSRSSIDTTINDTQKTFLEF....
and using
nr_A = All.A
nr_C = All.C
nr_F = All.F
etc. (you get the idea)
you get the numbers of your hydrophobic residues. Sum these and you have your hydrophobic score. You might want to 'normalize' this number by dividing this number by the total amount of amino acids in the sequence.
Of course you can write a loop for this and calculate the hydrophobic score for all your sequences in your FASTA file.
Shoaibur Rahman
il 28 Dic 2014
s = 'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH';
numA = sum(s=='A')
numC = sum(s=='C')
numF = sum(s=='F')
numI = sum(s=='I')
numL = sum(s=='L')
numM = sum(s=='M')
numP = sum(s=='P')
numV = sum(s=='V')
numW = sum(s=='W')
numY = sum(s=='Y')
1 Commento
hiva
il 29 Dic 2014
>> s = 'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH';
>> t = 'ACFILMPVWY';
>> sum(bsxfun(@eq,s.',t))
ans =
6 2 4 6 13 2 7 7 1 7
1 Commento
Categorie
Scopri di più su Sequence Alignment in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!