Main Content

monovslam

Visual simultaneous localization and mapping (vSLAM) with monocular camera

Since R2023b

    Description

    Use the monovslam object to perform visual simultaneous localization and mapping (vSLAM) with a monocular camera. The object extracts Oriented FAST and Rotated BRIEF (ORB) features from incrementally read images, and then tracks those features to estimate camera poses, identify key frames, and reconstruct a 3-D environment. The vSLAM algorithm also searches for loop closures using the bag-of-features algorithm, and then optimizes the camera poses using pose graph optimization. To learn more about visual SLAM, see Implement Visual SLAM in MATLAB.

    Note

    To use the monovslam object, you must have a Navigation Toolbox™ license.

    Creation

    Description

    example

    vslam = monovslam(intrinsics) creates a monocular visual SLAM object, vslam, by using the camera intrinsic parameters.

    The monovslam object does not account for lens distortion. You can use the specified undistortImage function to undistort images before adding them to the object.

    The object represents 3-D map points and camera poses in world coordinates. The object assumes the camera pose of the first key frame is an identity rigidtform3d transform.

    Note

    The monovslam object runs on multiple threads internally, which can delay the processing of an image frame added by using the addFrame function. Additionally, the object running on multiple threads means the current frame the object is processing can be different than the recently added frame.

    vslam = monovslam(intrinsics,Name=Value) sets properties using one or more name-value arguments. For example, MaxNumFeatures=850 sets the maximum number of ORB feature points to extract from each image to 850.

    Input Arguments

    expand all

    Camera intrinsic parameters, specified as a cameraIntrinsics object.

    Properties

    expand all

    This property is read-only.

    Scale factor for image decomposition, stored as a scalar greater than 1. The scale factor is also referred to as the pyramid decimation ratio. Increasing the value of ScaleFactor reduces the number of pyramid levels, but reduce computation time. Decreasing this value (down to just over 1), increases the number of pyramid levels, which can improve tracking performance, at the cost of computation speed. The scale value at each level of decomposition is ScaleFactor(level-1), where level is any value in the range [0, Numlevels-1]. Given the input image of size M-by-N, the image size at each level of composition is Mlevel-by-Nlevel, where:

    Mlevel=MScaleFactor(level1)Nlevel=NScaleFactor(level1)

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint64 | uint32

    This property is read-only.

    Number of decomposition levels, specified as a scalar greater than or equal to 1. Increase this value to extract keypoints from the image at more levels of decomposition. Along with the ScaleFactor value, NumLevels controls the number of pyramid levels on which the object evaluates feature points.

    The image size at each decomposition level limits the number of levels at which you can extract keypoints. The image size at a level of decomposition must be at least 63-by-63 for keypoint detection. The maximum level of decomposition is calculated as

    levelmax = floor(log(min(M,N))log(63)log(ScaleFactor))+1

    If either the default value or the specified value of NumLevels is greater than levelmax, the object modifies NumLevels to levelmax and returns a warning.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint64 | uint32

    This property is read-only.

    Maximum number of ORB feature points uniformly extracted from each image, specified as a positive integer. The typical value is in the range of [800, 2000], depending on the resolution of the image. When the number of extracted features is less than the value of MaxNumPoints, then the object uses all feature points.

    This property is read-only.

    Key frame feature point range, stored as a two-element vector of the form [lowerLimit,upperLimit]. This property specifies the minimum and maximum numbers of tracked features points a frame must contain for the object to identify it as a key frame. The TrackFeatureRange and the SkipMaxFrames properties enable you to control the frequency at which frames in the tracking process adds key frames.

    The success of tracking depends on the number of tracked points in the current frame, with one of these results:

    • Tracking is lost — The number of tracked feature points in the current frame is less than the lowerLimit set by the TrackFeatureRange argument. This indicates that the image does not contain enough features, or that the camera is moving too fast. To improve the tracking, you can increase the upperLimit value of the TrackFeatureRange property and decrease the SkipMaxFrames property to add key frames more frequently.

    • Tracking is successful — The object identifies the current frame as a key frame. The number of tracked feature points in the current frame is in the range set by TrackFeatureRange.

    • Tracking adds key frames too frequently — The number of tracked feature points in the current frame is greater than the upperLimit set by the TrackFeatureRange property. This indicates that the camera is moving very slowly, which produces an unnecessary number of key frames. To improve the tracking, you can reduce the frequency of adding key frames by increasing the value of the SkipMaxFrames property.

    For more details, see the addFrame object function.

    This property is read-only.

    Maximum number of skipped frames, stored as a positive integer. When the number of tracked features is consistently greater than the upperLimit set by the TrackFeatureRange property, use the SkipMaxFrames property to control the frequency of adding new key frames. The object identifies the current frame as a key frame when the number of skipped frames since the previous key frame added equals the value of SkipMaxFrames.

    This property is read-only.

    Minimum number of matched feature points between loop closure key frames, stored as a positive integer.

    Progress information display, specified as a logical 0 (false) or 1 (true). To display the progress information, set Verbose to true.

    Object Functions

    addFrameAdd image frame to visual SLAM object
    hasNewKeyFrameCheck if new key frame added in visual SLAM object
    checkStatusCheck status of visual SLAM object
    isDoneEnd-of-processing status for visual SLAM object
    mapPointsBuild 3-D map of world points
    posesAbsolute camera poses of key frames
    plotPlot 3-D map points and estimated camera trajectory in visual SLAM
    resetReset visual SLAM object

    Examples

    collapse all

    Perform monocular visual simultaneous localization and mapping (vSLAM) using the data from the TUM RGB-D Benchmark. You can download the data to a temporary directory using a web browser or by running this code:

    baseDownloadURL = "https://cvg.cit.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz"; 
    dataFolder = fullfile(tempdir,"tum_rgbd_dataset",filesep); 
    options = weboptions(Timeout=Inf);
    tgzFileName = dataFolder+"fr3_office.tgz";
    folderExists = exist(dataFolder,"dir");
    
    % Create a folder in a temporary directory to save the downloaded file
    if ~folderExists  
        mkdir(dataFolder) 
        disp("Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.") 
        websave(tgzFileName,baseDownloadURL,options); 
        
        % Extract contents of the downloaded file
        disp("Extracting fr3_office.tgz (1.38 GB) ...") 
        untar(tgzFileName,dataFolder); 
    end

    Create an imageDatastore object to store all the RGB images.

    imageFolder = dataFolder+"rgbd_dataset_freiburg3_long_office_household/rgb/";
    imds = imageDatastore(imageFolder);

    Specify your camera intrinsic parameters, and use them to create a monocular visual SLAM object.

    intrinsics = cameraIntrinsics([535.4 539.2],[320.1 247.6],[480 640]);
    vslam = monovslam(intrinsics);

    Process each image frame, and visualize the camera poses and 3-D map points.

    for i = 1:numel(imds.Files)
        addFrame(vslam,readimage(imds,i))
    
        if hasNewKeyFrame(vslam)
            % Query 3-D map points and camera poses
            xyzPoints = mapPoints(vslam);
            [camPoses,viewIds] = poses(vslam);
    
            % Display 3-D map points and camera trajectory
            plot(vslam);
        end
    
        % Get current status of system
        status = checkStatus(vslam);
    end 

    Figure contains an axes object. The axes object with xlabel X, ylabel Y contains 12 objects of type line, text, patch, scatter. This object represents Camera trajectory.

    Note that the monovslam object runs several algorithm parts on separate threads, which can introduce a latency in processing of an image frame added by using the addFrame function.

    % Plot intermediate results and wait until all images are processed
    while ~isDone(vslam)
        if hasNewKeyFrame(vslam)
            plot(vslam);
        end
    end

    After all the images are processed, you can collect the final 3-D map points and camera poses for further analysis.

    xyzPoints = mapPoints(vslam);
    [camPoses,viewIds] = poses(vslam);
    
    % Reset the system
    reset(vslam)

    References

    [1] Mur-Artal, Raul, J. M. M. Montiel, and Juan D. Tardos. “ORB-SLAM: A Versatile and Accurate Monocular SLAM System.” IEEE Transactions on Robotics 31, no. 5 (October 2015): 1147–63. https://doi.org/10.1109/TRO.2015.2463671.

    Extended Capabilities

    Version History

    Introduced in R2023b