bioinfo.pipeline.block.SeqFilter
Description
A SeqFilter block enables you to filter sequences based on a
specified criterion.
Creation
Syntax
Description
creates
a b = bioinfo.pipeline.block.SeqFilterSeqFilter block.
also specifies additional b = bioinfo.pipeline.block.SeqFilter(options)options.
specifies additional options as the property names and values of a b = bioinfo.pipeline.block.SeqFilter(Name=Value)SeqFilterOptions object. This object is set as the value of the
Options property of the block.
Note
The block always overwrites existing output files, unlike the seqfilter function.
Input Arguments
SeqFilter options, specified as a SeqFilterOptions object.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Note
The following list of arguments is a partial list. For the complete list, refer to
the properties of
SeqFilterOptions object.
Criterion to filter sequences, specified as one of the following options. Specify only one filtering criterion per function call.
'MaxNumberLowQualityBases'– applies a maximum threshold on the number of low-quality bases allowed.'MaxPercentLowQualityBases'– applies a maximum threshold on the percentage of low-quality bases allowed.'MeanQuality'– applies a minimum threshold on the average base quality across each sequence.'MinLength'– applies a minimum threshold on the sequence length.
Use this name-value pair argument together with 'Threshold' to specify the appropriate threshold value. Depending on the filtering criterion, the corresponding value for 'Threshold' can be a scalar or two-element vector. See the 'Threshold' option for the default values. If you do not specify 'Threshold', then the function uses the default threshold value of the specified method. For each filtering criterion, the function uses the base quality encoding format specified by the 'Encoding' name-value pair argument.
Threshold value for the filtering criterion, specified as a scalar or vector. Use this name-value pair to define the threshold value for the filtering criterion specified by 'Method'.
Depending on the filtering criterion, the corresponding value for 'Threshold' can be a scalar or two-element vector. If you do not specify 'Threshold', then the function uses the default threshold value of the corresponding method. For each filtering criterion, the function uses the encoding format of the base quality specified by the 'Encoding' name-value pair argument.
'Method' | 'Threshold' | Default 'Threshold' value |
|---|---|---|
'MaxNumberLowQualityBases' | Two-element vector [V1 V2]. V1 is a nonnegative integer that specifies the maximum number of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a number of low-quality bases greater than V1 is filtered out and not saved in the output file. | [0 10] |
'MaxPercentLowQualityBases' | Two-element vector [V1 V2]. V1 is a scalar between 0 and 100 that specifies the maximum percentage of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a percentage of low-quality bases greater than V1 is filtered out and not saved in the output file. | [0 10] |
'MeanQuality' | Positive scalar that specifies the minimum threshold on the average base quality across each sequence. Any sequence with average base quality less than this value is filtered out. | 0 |
'MinLength' | Nonnegative integer that specifies the minimum threshold on the sequence length allowed. Any sequence with length less than this value is filtered out. | 1 |
Properties
Function to handle errors from the run
method of the block, specified as a function handle. The handle specifies the function to call
if the run method encounters an error within a pipeline. For the pipeline to continue after a
block fails, ErrorHandler must return a structure that is compatible with
the output ports of the block. The error handling function is called with the following two inputs:
Structure with these fields:
Field Description identifier Identifier of the error that occurred message Text of the error message index Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension. Input structure passed to the
runmethod when it fails
Data Types: function_handle
This property is read-only.
Input ports of the block, specified as a structure. The field
names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors.
The input port names are the expected field names of the input structure that you pass to the
block run method.
The SeqFilter block Inputs structure has the
following field:
FASTQFiles— Names of FASTQ-formatted files with sequence and quality information. This input is a required input that must be satisfied. The default value is abioinfo.pipeline.datatypes.Unsetobject, which means that the input value is not set yet.
Data Types: struct
This property is read-only.
Output ports of the block, specified as a structure. The field
names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors.
The field names of the output structure returned by the block run method
are the same as the output port names.
The SeqFilter block Outputs structure has the
following fields:
FilteredFASTQFiles— Output file names. By default, the name of each output file consists of the input file name followed by the output suffix ('_filtered').Tip
To see the actual location of these files, first get the results of the block. Then use the
unwrapmethod as shown in this example.NumFilteredIn— Number of sequences selected from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order inNumFilteredIncorresponds to the order of the input files.NumFilteredOut— Number of sequences excluded from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order inNumFilteredOutcorresponds to the order of the input files.
Data Types: struct
SeqFilter options, specified as a SeqFilterOptions object. The default value is a default
SeqFilterOptions object.
Object Functions
compile | Perform block-specific additional checks and validations |
copy | Copy array of handle objects |
emptyInputs | Create input structure for use with run method |
eval | Evaluate block object |
run | Run block object |
Examples
Use a SeqFilter block to filter out sequences with
low-quality bases, where a base is considered low-quality if its quality score is less
than 15 (default).
import bioinfo.pipeline.block.* import bioinfo.pipeline.Pipeline FC = FileChooser(which("SRR005164_1_50.fastq")); SF = SeqFilter; P = Pipeline; addBlock(P,[FC,SF]); connect(P,FC,SF,["Files","FASTQFiles"]); run(P); R = results(P,SF)
R =
struct with fields:
FilteredFASTQFiles: [1×1 bioinfo.pipeline.datatypes.File]
NumFilteredIn: 3
NumFilteredOut: 47Call unwrap on FilteredFASTQFiles to see the
location of the output file.
unwrap(R.FilteredFASTQFiles)
ans =
"C:\PipelineResults\SeqFilter_1\1\SRR005164_1_50_filtered.fastq"Import the Pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
qcpipeline = Pipeline;
Select an input FASTQ file using a FileChooser block.
fastqfile = FileChooser(which("SRR005164_1_50.fastq"));Create a SeqFilter block.
sequencefilter = SeqFilter;
Define the filtering threshold value. Specifically, filter out sequences with a total of more than 10 low-quality bases, where a base is considered a low-quality base if its quality score is less than 20.
sequencefilter.Options.Threshold = [10 20];
Add the blocks to the pipeline.
addBlock(qcpipeline,[fastqfile,sequencefilter]);
Connect the output of the first block to the input of the second block. To do so, you need to first check the input and output port names of the corresponding blocks.
View the Outputs (port of the first block) and Inputs (port of the second block).
fastqfile.Outputs
ans = struct with fields:
Files: [1×1 bioinfo.pipeline.Output]
sequencefilter.Inputs
ans = struct with fields:
FASTQFiles: [1×1 bioinfo.pipeline.Input]
Connect the Files output port of the fastqfile block to the FASTQFiles port of sequencefilter block.
connect(qcpipeline,fastqfile,sequencefilter,["Files","FASTQFiles"]);
Next, create a UserFunction block that calls the seqqcplot function to plot the quality data of the filtered sequence data. In this case, inputFile is the required argument for the seqqcplot function. The required argument name can be anything as long as it is a valid variable name.
qcplot = UserFunction("seqqcplot",RequiredArguments="inputFile",OutputArguments="figureHandle");
Alternatively, you can also use dot notation to set up your UserFunction block.
qcplot = UserFunction; qcplot.RequiredArguments = "inputFile"; qcplot.Function = "seqqcplot"; qcplot.OutputArguments = "figureHandle";
Add the block.
addBlock(qcpipeline,qcplot);
Check the port names of sequencefilter block and qcplot block.
sequencefilter.Outputs
ans = struct with fields:
FilteredFASTQFiles: [1×1 bioinfo.pipeline.Output]
NumFilteredIn: [1×1 bioinfo.pipeline.Output]
NumFilteredOut: [1×1 bioinfo.pipeline.Output]
qcplot.Inputs
ans = struct with fields:
inputFile: [1×1 bioinfo.pipeline.Input]
Connect the FilteredFASTQFiles port of the sequencefilter block to the inputFile port of the qcplot block.
connect(qcpipeline,sequencefilter,qcplot,["FilteredFASTQFiles","inputFile"]);
Run the pipeline to plot the sequence quality data.
run(qcpipeline);

Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)