Feature Extraction Using SURF

Object Recognition using Speeded-Up Robust Features (SURF) is composed of three steps - feature extraction, feature description, and feature matching. This example performs feature extraction, which is the first step of the SURF algorithm. The algorithm used here is based on the OpenSURF library implementation. In this example, we demonstrate how GPU Coder™ can be used to solve this compute intensive problem, through CUDA® code generation.


  • CUDA enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • Image Processing Toolbox™ for reading and displaying images.

  • NVIDIA CUDA toolkit.

  • Environment variables for the compilers and libraries. For more information, see Environment Variables.

Create a Folder and Copy Relevant Files

The following line of code creates a folder in your current working folder (pwd), and copies all the relevant files into this folder. If you do not want to perform this operation or if you cannot generate files in this folder, change your current working folder.


Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;

Feature Extraction

Feature extraction is a fundamental step in any object recognition algorithm. It refers to the process of extracting useful information referred to as features from an input image. The extracted features must be representative in nature, carrying important and unique attributes of the image.

The SurfDetect.m function is the main entry-point, that performs feature extraction. This function accepts an 8-bit RGB or an 8-bit grayscale image as the input. The output returned is an array of extracted interest points. This function is composed of the following function calls, which contain computations suitable for GPU parallelization:

  • Convert32bitFPGray.m function converts an 8-bit RGB image to an 8-bit grayscale image. If the input provided is already in the 8-bit grayscale format, this step is skipped. After this step, the 8-bit grayscale image is converted to a 32-bit floating-point representation for enabling fast computations on the GPU.

  • MyIntegralImage.m function calculates the integral image of the 32-bit floating-point grayscale image obtained in the previous step. The integral image is useful for simplifying the operation of finding the sum of pixels enclosed within any rectangular region of the image. This helps in improving the speed of convolutions performed in the next step.

  • FastHessian.m function performs convolution of the image with box filters of different sizes and stores the computed responses. In this example, we use the following parameters:

    Number of Octaves: 5
    Number of Intervals: 4
    Threshold: 0.0004
    Filter Sizes: Octave 1 -  9,  15,  21,  27
                  Octave 2 - 15,  27,  39,  51
                  Octave 3 - 27,  51,  75,  99
                  Octave 4 - 51,  99, 147, 195
                  Octave 5 - 99, 195, 291, 387
  • NonMaxSuppression_gpu.m function performs non-maximal suppression to filter out only the useful interest points from the responses obtained earlier, based on several factors. We use the coder.ceval construct to generate a kernel that uses the atomicAdd operation. Because this construct is not compatible when invoked directly from MATLAB®, we have two different function calls - NonMaxSuppression_gpu.m function gets invoked when GPU code generation is enabled and NonMaxSuppression.m gets invoked when we are executing the algorithm directly in MATLAB.

  • OrientationCalc.m function calculates and assigns orientation to the interest points located in the previous step.

The final result obtained is an array of interest points where an interest point is a structure that consists of the following fields:

    x, y (coordinates), scale, orientation, laplacian

Read Input Image

Read an input image into MATLAB by using the imread function.

imageFile = 'peppers.png';
inputImage = imread(imageFile);

Generate CUDA MEX for the Function

To generate CUDA MEX for the SurfDetect function, create a GPU Coder configuration object and use the codegen function.

cfg = coder.gpuConfig('mex');
evalc('codegen -config cfg SurfDetect -args {inputImage}');

Run the MEX Function on a GPU

The generated MEX function SurfDetect_mex, can be invoked to run on a GPU in the following way:

disp('Running GPU Coder SURF');
interestPointsGPU = SurfDetect_mex(inputImage);
fprintf('    GPU Coder SURF found: %d interest points\n',length(interestPointsGPU));
Running GPU Coder SURF
    GPU Coder SURF found: 249 interest points

Depict the Extracted Interest Points

The output interestPointsGPU is an array of extracted interest points. These interest points are depicted over the input image in a figure window.

DrawIpoints(imageFile, interestPointsGPU);

Run Command: Cleanup

Remove files, perform clean up, and return to the original folder.



  1. Notes on the OpenSURF Library by Christopher Evans

  2. SURF: Speeded-Up Robust Features by Herbert Bay, Tinne Tuytelaars, and Luc Van Gool