Main Content

seqlogo

Display sequence logo for nucleotide or amino acid sequences

Syntax

seqlogo(Seqs)
seqlogo(Profile)
WgtMatrix = seqlogo(...)
[WgtMatrix, Handle] = seqlogo(...)
seqlogo(..., 'Displaylogo', DisplaylogoValue, ...)
seqlogo(..., 'Alphabet', AlphabetValue, ...)
seqlogo(..., 'Startat', StartatValue, ...)
seqlogo(..., 'Endat', EndatValue, ...)
seqlogo(..., 'SSCorrection', SSCorrectionValue, ...)

Input Arguments

Seqs

Set of pairwise or multiply aligned nucleotide or amino acid sequences, represented by any of the following:

  • Character array

  • Cell array of character vectors

  • String vector

  • Array of structures containing a Sequence field

Profile

Sequence profile distribution matrix with the frequency of nucleotides or amino acids for every column in the multiple alignment, such as returned by the seqprofile function.

The size of the frequency distribution matrix is:

  • For nucleotides — [4 x sequence length]

  • For amino acids — [20 x sequence length]

If gaps were included, Profile may have 5 rows (for nucleotides) or 21 rows (for amino acids), but seqlogo ignores gaps.

DisplaylogoValue

Controls the display of a sequence logo. Choices are true (default) or false.

AlphabetValue

Character vector or string specifying the type of sequence (nucleotide or amino acid). Choices are 'NT' (default) or'AA'.

StartatValue

Positive integer that specifies the starting position for the sequences in Seqs. Default starting position is 1.

EndatValue

Positive integer that specifies the ending position for the sequences in Seqs. Default ending position is the maximum length of the sequences in Seqs.

SSCorrectionValue

Controls the use of small sample correction in the estimation of the number of bits. Choices are true (default) or false.

Output Arguments

WgtMatrixCell array containing the symbol list in Seqs or Profile and the weight matrix used to graphically display the sequence logo.
HandleHandle to the sequence logo figure.

Description

seqlogo(Seqs) displays a sequence logo for Seqs, a set of aligned sequences. The logo graphically displays the sequence conservation at a particular position in the alignment of sequences, measured in bits. The maximum sequence conservation per site is log2(4) bits for nucleotide sequences and log2(20) bits for amino acid sequences. If the sequence conservation value is zero or negative, no logo is displayed in that position.

seqlogo(Profile) displays a sequence logo for Profile, a sequence profile distribution matrix with the frequency of nucleotides or amino acids for every column in the multiple alignment, such as returned by the seqprofile function.

Color Code for Nucleotides

Nucleotide Color
AGreen
CBlue
GYellow
T, URed
OtherPurple

Color Code for Amino Acids

Amino Acid Chemical PropertyColor
G S T Y C Q NPolarGreen
A V L I P W F MHydrophobicOrange
D EAcidicRed
K R HBasicBlue
OtherTan

WgtMatrix = seqlogo(...) returns a cell array of unique symbols in the sequence Seqs or Profile, and the information weight matrix used to graphically display the logo.

[WgtMatrix, Handle] = seqlogo(...) returns a handle to the sequence logo figure.

seqlogo(Seqs, ...'PropertyName', PropertyValue, ...) calls seqpdist with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

seqlogo(..., 'Displaylogo', DisplaylogoValue, ...) controls the display of a sequence logo. Choices are true (default) or false.

seqlogo(..., 'Alphabet', AlphabetValue, ...) specifies the type of sequence (nucleotide or amino acid). Choices are 'NT' (default) or'AA'.

Note

If you provide amino acid sequences to seqlogo, you must set Alphabet to 'AA'.

seqlogo(..., 'Startat', StartatValue, ...) specifies the starting position for the sequences in Seqs. Default starting position is 1.

seqlogo(..., 'Endat', EndatValue, ...) specifies the ending position for the sequences in Seqs. Default ending position is the maximum length of the sequences in Seqs.

seqlogo(..., 'SSCorrection', SSCorrectionValue, ...) controls the use of small sample correction in the estimation of the number of bits. Choices are true (default) or false.

Note

A simple calculation of bits tends to overestimate the conservation at a particular location. To compensate for this overestimation, when SSCorrection is set to true, a rough estimate is applied as an approximate correction. This correction works better when the number of sequences is greater than 50.

Examples

collapse all

This example shows how to display a sequence logo for a set of aligned nucleotide sequences.

Create a series of aligned nucleotide sequences.

S = {'ATTATAGCAAACTA',...
     'AACATGCCAAAGTA',...
     'ATCATGCAAAAGGA'}
S =

  1x3 cell array

    {'ATTATAGCAAACTA'}    {'AACATGCCAAAGTA'}    {'ATCATGCAAAAGGA'}

Display the sequence logo.

seqlogo(S)

This example shows how to display a sequence logo for a set of aligned amino acid sequences.

Create a series of aligned amino acid sequences.

S2 = {'LSGGQRQRVAIARALAL',...
      'LSGGEKQRVAIARALMN',...
      'LSGGQIQRVLLARALAA',...
      'LSGGERRRLEIACVLAL',...
      'FSGGEKKKNELWQMLAL',...
      'LSGGERRRLEIACVLAL'};

Display the sequence logo, specifying an amino acid sequence and limiting the logo to sequence positions 2 through 10.

seqlogo(S2, 'alphabet', 'aa', 'startAt', 2, 'endAt', 10)

References

[1] Schneider, T.D., and Stephens, R.M. (1990). Sequence Logos: A new way to display consensus sequences. Nucleic Acids Research 18, 6097–6100.

Version History

Introduced before R2006a