Contenuto principale

Create Simple Pipeline to Plot Sequence Quality Data Using Biopipeline Designer

This example shows how to create a bioinformatics pipeline in the Biopipeline Designer app that loads sequence read data, filters some sequences based on quality, and displays the quality statistics of the filtered data.

Open Biopipeline Designer App

Enter the following at the MATLAB® command line.

biopipelineDesigner

Select Input File Using FileChooser Block

In the Block Libraries panel of the app, scroll down to the General section. Drag the FileChooser block onto the diagram.

You can also use the Search box to look for specific built-in blocks in the Block Libraries.

Double-click the block name FileChooser_1 and rename as FASTQ.

Run the following command at the MATLAB command line to create a variable that contains the full file path to the provided sequence read data.

fastqFile = which("SRR005164_1_50.fastq");

In the app, click the FASTQ block. In the Pipeline Inspector pane, under FileChooser Properties, click the vertical three-dot menu next to the Files property. Select Assign from workspace.

Select fastqFile from the list. Click OK.

Filter Sequences Based on Quality

In the Block Libraries panel, under the Sequence Utilities section, drag the SeqFilter block onto the diagram. This block can filter sequences based on some specifications. The Pipeline Inspector panel shows the default values of the block properties and filtering options. In the SeqFilter Options section, change Threshold to 10,20. Keep the other options as default. This 10,20 threshold value filters out any sequences with more than 10 low quality bases, where a base is considered low quality when its quality score is less than 20. For details, see SeqFilterOptions.

Plot Sequence Quality Data

Create a custom (bioinfo.pipeline.block.UserFunction) block that calls an existing MATLAB function seqqcplot to plot the quality statistics of the filtered data.

  1. In the Block Libraries panel, under the General section, drag and drop the UserFunction block onto the diagram.

  2. Rename the block to SeqQCPlot.

  3. In the Pipeline Inspector pane, under UserFunction Properties, set the RequiredArguments to inputFile and Function to seqqcplot.

Connect Blocks and Run Pipeline

After setting up the blocks, you can now connect them to complete the pipeline.

Drag an arrow from the Files output port of FASTQ to the FASTQFiles port of SeqFilter_1.

Next connect the FilteredFASTQFiles port to inputFile port.

On the toolstrip of the app, click Run. During the run, you can see the progress of each block at its status bar. Point to a color-coded section with a number to see its meaning.

After the run, you can click each output port name of a block to see the output value. For example, click NumFilteredOut to see the total number of reads that were filtered out by the block.

The app generates the following figure, which contains quality statistics plots of the filtered data.

If there are any errors or warnings, the app shows them in the Diagnostics tab of the Pipeline Information panel, which is at the bottom of the diagram.

Click the Results tab. In the Source column, expand SeqFilter_1 to see the block results, such as the filtered FASTQ file and the number of sequences that are selected and filtered out.

Rerun Pipeline with Different Filtering Threshold

You can specify a different threshold to filter sequences and rerun the pipeline. The app is aware of which blocks in the pipeline have changed and which other blocks, such as downstream blocks, are affected as a result. Hence, on subsequent runs, it reruns only those blocks that are needed, instead of every block in the pipeline. For details, see Bioinformatics Pipeline Run Mode.

Click SeqFilter_1. In the Pipeline Inspector panel, change its Threshold option to 5,20. This setting now filters out any sequence with more than 5 low quality bases, where a base is considered low quality when its score is less than 20. Both SeqFilter and SeqQCPlot blocks now have a warning icon to indicate that the results are now out of date due to the change to the SeqFilter block.

Click Run. The app generates the following figure. During this run, the app does not rerun the FASTQ block because it is not needed. It only reruns the other two blocks.

Go to the Results tab of the Pipeline Information to check the new results.

Navigate, View, and Access Output Files in Results File Browser

By default, the app saves the pipeline results in the PipelineResults folder in the MATLAB current folder. In this example, the current folder is C:\Biopipeline_Designer\seqqcplot_pipeline\ and the pipeline results folder is C:\Biopipeline_Designer\seqqcplot_pipeline\PipelineResults\ as shown in the Results File Browser pane.

The Results File Browser pane allows you to navigate, view, and access the output files generated by the pipeline. You can expand the results folder for each block to view the corresponding output files.

Expand SeqFilter_1 > 1 to see the FASTQ file containing the output sequences of the SeqFilter_1 block.

Click the up arrow to go up one folder. The app uses a red folder to indicate the current results folder.

Clicking this icon takes you directly to the current results folder if you are not in it.

You can use the context (right-click) menu for additional options. For instance, create a new folder and set it as the new results folder. Right-click within the Results File Browser and select New Folder.

Click the folder once to rename it as PipelineResults2. Right-click the folder and select Set As Results Folder.

The color of PipelineResults2 is changed to red to indicate that PipelineResults2 is the current results folder, and the app saves any subsequent pipeline results in this folder.

Export Results

You can export each output of a block or every output of a block to the MATLAB workspace by selecting Export to Workspace from the context (right-click) menu of the corresponding row in the Results table. To export all outputs of a block, right-click at the block level.

See Also

| |

Topics