Main Content

fastqread

Read data from FASTQ file

    Description

    FASTQStruct = fastqread(File) reads a FASTQ-formatted file and returns the data in a MATLAB® array of structures.

    example

    [Header,Sequence] = fastqread(File) returns only the header and sequence data in two separate variables.

    [Header,Sequence,Qual] = fastqread(File) returns the data in three separate variables.

    example

    ___ = fastqread(File,Name=Value) specifies options using one or more name-value arguments in addition to the input arguments in previous syntaxes. For example, to return only the header information, set HeaderOnlyValue to true.

    example

    Examples

    collapse all

    FASTQStruct = fastqread("SRR005164_1_50.fastq")
    FASTQStruct=1×50 struct array with fields:
        Header
        Sequence
        Quality
    
    
    [Header,Sequence,Qual] = fastqread("SRR005164_1_50.fastq");
    whos Header Sequence Qual
      Name          Size            Bytes  Class    Attributes
    
      Header        1x50             9734  cell               
      Qual          1x50            18668  cell               
      Sequence      1x50            18668  cell               
    
    FASTQStruct_5_10 = fastqread("SRR005164_1_50.fastq",BlockRead=[5 10])
    FASTQStruct_5_10=1×6 struct array with fields:
        Header
        Sequence
        Quality
    
    

    Input Arguments

    collapse all

    Name of FASTQ-formatted file, specified as a string scalar or character vector. If you specify only a file name, then that file must be on the MATLAB search path or in the MATLAB current folder. Otherwise, specify the full or relative path name.

    Example: "myFile.fastq"

    Example: "myDir\myFile.fastq"

    Example: "C:\myDir\mySubdir\myFile.fastq"

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: fastqread("SRR005164_1_50.fastq",HeaderOnly=true)

    Sequence entries to read, specified as a positive integer scalar or a two-element vector of positive integers or Inf values. Specify Blockread in one of these three ways:

    • Positive integer scalar N — Read Nth entry from the file.

    • Two-element vector of positive integers [N1,N2] — Read a block of entries starting at the N1st entry and ending at the N2nd entry from the file.

    • Two-element vector of the form [N,Inf] — Read all entries starting at the Nth entry from the file.

    Example: 1

    Example: [2 4]

    Example: [5 Inf]

    Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Whether to return only the header information, specified as a numeric or logical 0 (false) or 1 (true).

    Data Types: logical | double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Whether to return only the header information, specified as a numeric or logical 0 (false) or 1 (true). If you specify TrimHeadersValue as true, the headers will be trimmed after their first white space characters. White space characters include the space (char(32)) and tab (char(9)) characters.

    Data Types: logical | double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Output Arguments

    collapse all

    Sequence data, returned as a structure array. There is one array entry for each sequence read or entry in the file. Each entry of the structure array contains Header, Sequence, and Qual fields.

    Header information, returned as a character vector or cell array of character vectors.

    Sequence information, returned as a character vector or cell array of character vectors.

    Quality information, returned as a character vector or cell array of character vectors.

    More About

    collapse all

    FASTQ-file Format

    A FASTQ-formatted file contains nucleotide sequence and quality information on four lines:

    • Line 1 — Header information prefixed with an @ symbol

    • Line 2 — Nucleotide sequence

    • Line 3 — Header information prefixed with a + symbol

    • Line 4 — ASCII representation of per-base quality scores for the nucleotide sequence using Phred or Solexa encoding

    Version History

    Introduced in R2009b