Accelerating the pace of engineering and science

seqconsensus

Calculate consensus sequence

Syntax

CSeq = seqconsensus(Seqs)
[CSeq, Score] = seqconsensus(Seqs)
CSeq = seqconsensus(Profile)

seqconsensus(..., 'PropertyName', PropertyValue,...)
seqconsensus(..., 'ScoringMatrix', ScoringMatrixValue)

Arguments

SeqsSet of multiply aligned amino acid or nucleotide sequences. Enter an array of strings, a cell array of strings, or an array of structures with the field Sequence.
ProfileSequence profile. Enter a profile from the function seqprofile. Profile is a matrix of size [20 (or 4) x Sequence Length] with the frequency or count of amino acids (or nucleotides) for every position. Profile can also have 21 (or 5) rows if gaps are included in the consensus.
ScoringMatrixValue

Either of the following:

• String specifying the scoring matrix to use for the alignment. Choices for amino acid sequences are:

• 'BLOSUM62'

• 'BLOSUM30' increasing by 5 up to 'BLOSUM90'

• 'BLOSUM100'

• 'PAM10' increasing by 10 up to 'PAM500'

• 'DAYHOFF'

• 'GONNET'

Default is:

• 'BLOSUM50' — When AlphabetValue equals 'AA'

• 'NUC44' — When AlphabetValue equals 'NT'

 Note:   The above scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the 'Scale' property to specify an additional scale factor to convert the output score from bits to another unit.
• A 21x21, 5x5, 20x20, or 4x4 numeric array. For the gap-included cases, gap scores (last row/column) are set to mean(diag(ScoringMatrix)) for a gap matching with another gap, and set to mean(nodiag(ScoringMatrix)) for a gap matching with another symbol.

 Note:   If you use a scoring matrix that you created, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix.
 Note:   If you need to compile seqconsensus into a stand-alone application or software component using MATLAB® Compiler™, use a matrix instead of a string for ScoringMatrixValue.

Description

CSeq = seqconsensus(Seqs), for a multiply aligned set of sequences (Seqs), returns a string with the consensus sequence (CSeq). The frequency of symbols (20 amino acids, 4 nucleotides) in the set of sequences is determined with the function seqprofile. For ambiguous nucleotide or amino acid symbols, the frequency or count is added to the standard set of symbols.

[CSeq, Score] = seqconsensus(Seqs) returns the conservation score of the consensus sequence. Scores are computed with the scoring matrix BLOSUM50 for amino acids or NUC44 for nucleotides. Scores are the average euclidean distance between the scored symbol and the M-dimensional consensus value. M is the size of the alphabet. The consensus value is the profile weighted by the scoring matrix.

CSeq = seqconsensus(Profile) returns a string with the consensus sequence (CSeq) from a sequence profile (Profile).

seqconsensus(..., 'PropertyName', PropertyValue,...)
defines optional properties using property name/value pairs.

seqconsensus(..., 'ScoringMatrix', ScoringMatrixValue) specifies the scoring matrix.

The following input parameters are analogous to the function seqprofile when the alphabet is restricted to 'AA' or 'NT'.

seqconsensus(..., 'Alphabet', AlphabetValue)

seqconsensus(..., 'Gaps', GapsValue)

seqconsensus(..., 'Ambiguous', AmbiguousValue)

seqconsensus(..., 'Limits', LimitsValue)

Examples

```  seqs = fastaread('pf00002.fa');
[C,S] = seqconsensus(seqs,'limits',[50 60],'gaps','all')```