Monocular Visual-Inertial Odometry (VIO) Using Factor Graph

This example uses:

Monocular visual-inertial odometry estimates the position and orientation of the robot (e.g. UAV in this example) by using data from a camera and an inertial measurement unit (IMU) sensor data. Camera-based state estimation is accurate during low-speed navigation but faces challenges such as motion blur and track loss at higher speeds. Additionally, monocular camera-based estimation can only determine poses up to an arbitrary scale, meaning the actual scale of the environment is not directly measurable. In contrast, inertial navigation can easily handle high-speed navigation and provides pose estimates at a world scale. By combining the advantages of both camera and IMU data through tightly coupled factor graph optimization, one can achieve better accuracy. For efficient execution time performance, it is possible to optimize only a small portion of the entire factor graph, which contains only the most recent measurements, at every optimization step. This approach is popularly referred to as sliding window optimization or partial graph optimization. This figure shows a UAV equipped with an IMU and monocular camera.

Overview

The visual-inertial system implemented in this example consists of a simplified version of the monocular visual odometry front-end of the VINS [1] algorithm and a factor graph back-end.

The visual odometry front-end performs similarly to the standard structure from motion (SfM) algorithms, such as oriented FAST and rotated BRIEF (ORB) and simultaneous localization and mapping (SLAM). The visual odometry front-end detects and tracks key points from images across multiple frames, estimates camera poses, and triangulates 3-D points using multi-view geometry. The factor graph back-end optimizes the estimated camera poses, 3-D points, IMU velocity, and bias.

Before fusing the camera and IMU measurements, The algorithm must align the camera and IMU to compute the camera pose scale factor with respect to IMU measurement and compute the gravity rotation. These alignment steps are crucial as they transform the camera poses to the local navigation reference frame of the IMU and provide estimates for the initial IMU velocity and bias.

Set Up

This example uses the Blackbird data set (NYC Subway Winter) to demonstrate the visual-inertial odometry workflow. The data set is 628MB in size and includes IMU readings, images, camera intrinsic parameters.

data = helperDownloadVIOData();

Fix the random seed for repeatability.

rng(0)

Initialize Algorithm Parameters

Use the helperVIOParameters function to initialize the parameters that you must use for the visual-Inertial odometry algorithm. Check the helper function to see more details.

params = helperVIOParameters;
% Create a struct that stores status information
status = struct("isFirstFrame",true,"isMapInitialized",false,"isIMUAligned",false,"isMedianDepthEstimated",false);
% Set initial scene median depth.
initialSceneMedianDepth = 4;
viewID = 0;
removedFrameIDs = [];
allCameraTracks = cell(1,5000);

Set up factor graph for back-end tightly-coupled factor graph optimization.

fg = factorGraph;
maxFrames = 10000;
maxLandmarks = 100000;
ids = helperGenerateNodeID(fg,maxFrames,maxLandmarks);

Create the point tracker to track key points across multiple frames.

tracker = vision.PointTracker(MaxBidirectionalError=params.KLT_BiErr, ...
        NumPyramidLevels=params.KLT_Levels,BlockSize=params.KLT_Block);

Use the helper class helperFeaturePointManager to maintain key point tracks.

fpManager = helperFeaturePointManager(data.intrinsics,params,maxFrames,maxLandmarks);
% Set up the key point detector.
fpManager.DetectorFunc = @(I)helperDetectKeyPoints(I);

Create an image view set to maintain frame poses.

viewSet = imageviewset;

Specify the first and last frames to process from the data set. Then, process the first frame.

% IMU data is available starting from frame number 40 in the data set.
startFrameIdx = 40;
% Index of the last frame to process in this example. For reasonable
% example execution time, process up to only frame 1000 of the data set.
endFrameIdx = 1000;
allFrameIDs = startFrameIdx:endFrameIdx;

Structure from Motion (SfM)

The accelerometer and gyroscope readings from the IMU data inherently contain some level of bias and noise. To accurately estimate these bias values, you must obtain precise pose estimates from the initial few frames. You can achieve this by using Structure from Motion (SfM) a technique that estimates the 3-D structure of a scene using a sequence of 2-D images. The SfM process involves several critical steps:

Map initialization — Estimate the relative pose between the first keyframe and the current frame when sufficient parallax is present. This is done by utilizing 2D-2D correspondences, which are keypoint tracks across multiple frames.
Triangulation — Create 3-D points by triangulating using the world poses of keyframes and the 2D-2D correspondences.
Pose Estimation — Continue to track these 3-D points in subsequent frames and determine the pose of the current frame using 3D-2D correspondences.

Image Preprocessing

Before extracting features from images for use in the algorithm, you must preprocess the images to ensure the accuracy feature detection and tracking. Image preprocessing typically involves these steps:

Equalize — Apply histogram equalization to enhance the contrast of the image. This step is particularly important in conditions of dim lighting, as it can significantly improve the visibility of features, enabling more robust extraction and tracking.
Undistort — Correct for lens distortions, including radial and tangential distortions, which can warp the image. Distortion correction is essential for accurate state estimation because it ensures that the extracted features accurately represent the scene geometry.

Initiate the process of reading and processing images for visual odometry. Loop through frames starting from the specified startFrameIdx and continue until the map is initialized or until the endFrameIdx is reached. For each frame, read and preprocess the image to enhance features and correct distortions for feature detection and tracking. In the first frame, set up the feature tracker and initialize the map. In subsequent frames, continue the process of feature tracking and map updating. Detect and track key points in the image, and assign each frame a unique view ID to keep track of the pose of the camera.

% Assign start frame index to current frame index and start reading images
currFrameIdx = startFrameIdx;
while ~status.isMapInitialized && currFrameIdx < endFrameIdx
    % Read image data captured by the camera for the current frame.
    I = data.images{currFrameIdx};
    % Assign a unique view ID for each camera frame.
    viewID = viewID + 1;
    % Equalize and undistort the images
    I = helperProcessImage(I,params,data.intrinsics);
    if status.isFirstFrame
        % In the first frame, detect new key points and initialize the tracker for
        % future tracking.
        currPoints = createNewFeaturePoints(fpManager,I);
        updateSlidingWindow(fpManager,I,currPoints,true(size(currPoints,1),1),viewID);
        initialize(tracker, currPoints, I);
        firstI = I;
        viewSet = addView(viewSet,viewID,rigidtform3d);
        status.isFirstFrame = false;
    else

Feature Tracking

To accurately compute the pose of the camera frame, you must calculate 2D-2D correspondences, which are 2-D image point tracks across multiple frames that correspond to the same physical landmark in the scene. While there are several methods for estimating these feature points, this example uses a Kalman tracker for its effectiveness in tracking feature points across a sequence of images.

The performance of the tracking process heavily depends on the precision of these point tracks, which are susceptible to errors, including outliers. Outliers may arise from various factors, such as repetitive structures within the scene, and can significantly impact the accuracy of the tracking. Therefore, outlier rejection is a critical task in feature tracking. To reject outliers from tracks, use fundamental matrix decomposition in the feature point manager while updating the sliding window with the latest feature point tracks.

Start tracking feature points from one frame to the next, update the sliding window with these points, and manage outlier rejection using fundamental matrix decomposition. Initially, track the feature points from the previous frame in the current frame using the Kalman tracker. Then update the sliding window, adding in a valid frame while removing the oldest frame. During this process, if you detect sufficient parallax, estimate the relative pose between the key frames. Perform Outlier rejection iteratively to refine the fundamental matrix and extract the inlier correspondences, which are crucial for accurate pose estimation.

        % Track previous frame points in the current frame.
        [currPoints,validIdx] = tracker(I);
        % Update the sliding window after tracking features in the current
        % frame. If the sliding window already contains maximum number of
        % frames specified using windowSize, one frame with id
        % removeFrameId will be removed from the window to accommodate
        % space for the current frame.
        [removedFrameID,windowState] = updateSlidingWindow(fpManager,I,currPoints,validIdx,viewID);
        if (removedFrameID > fpManager.slidingWindowViewIDs(1))
            % Store non-key frames or removed frame IDs.
            removedFrameIDs(end + 1) = removedFrameID; %#ok<SAGROW>
        end

        if windowState.isFirstFewViews
            % Accept the first few camera views.
            viewSet = addView(viewSet,viewID,rigidtform3d);
        elseif windowState.isEnoughParallax
            % Estimate relative pose between the first key frame in the
            % window and the current frame.
            swIDs = getSlidingWindowIDs(fpManager);
            [matches1,matches2] = get2DCorrespondensesBetweenViews(fpManager,swIDs(end-1),swIDs(end));

            inliersIndex = false(size(matches1,1),1);
            for k = 1:10
                [f,inliers] = estimateFundamentalMatrix( ...
                    matches1,matches2,Method="MSAC", ...
                    NumTrials=params.F_Iterations,DistanceThreshold=params.F_Threshold, ...
                    Confidence=params.F_Confidence);
                if length(find(inliersIndex)) < length(find(inliers))
                    inliersIndex = inliers;
                    fundamentalMatrix = f;
                end
            end

            inlierPrePoints = matches1(inliersIndex,:);
            inlierCurrPoints = matches2(inliersIndex,:);
            relPose = estrelpose(fundamentalMatrix,data.intrinsics,inlierPrePoints,inlierCurrPoints);
            viewSet = addView(viewSet,viewID,relPose);

            % Map is initialized now
            status.isMapInitialized = true;
            break
        end

Add a new feature point to the Kalman tracker in case the number of points goes below the feature tracking threshold.

        createNewFeaturePoints(fpManager,I);
        currPoints = getKeyPointsInView(fpManager,viewID);
        setPoints(tracker,currPoints);
    end
    currFrameIdx = currFrameIdx + 1;
end

if status.isMapInitialized
    % Show matched features
    hFeature = showMatchedFeatures(firstI,I,matches1,matches2);
    title(hFeature.Parent,"Enough Parallax Between Key Frames");
else
    error('Unable to initialize the map.')
end

Figure contains an axes object. The hidden axes object with title Enough Parallax Between Key Frames contains 4 objects of type image, line. One or more of the lines displays its values using only markers

Camera-IMU Alignment

Camera-IMU alignment is crucial for sensor fusion in visual-inertial odometry or SLAM algorithms as it ensures that the measurements from both the camera and the IMU are coherent and can be effectively combined and optimized. You must perform these tasks to align the camera and IMU:

Gravity Rotation Calculation — Compute the rotation needed to align the local navigation frame of the IMU with the initial camera frame by rotating the gravity vector as measured by the IMU to match the z-axis orientation of the camera. You can apply the inverse of this rotation matrix to the coordinate frame of the camera, ensuring that the z-axis of both systems are aligned, which is essential for accurate motion tracking and pose estimation.
Camera Pose Scale Calculation — Since the camera provides relative scale measurements, you must compute a scaling factor that aligns the pose scale of the camera with that of the IMU or the world scale. This step is necessary for integrating the relative movement data of the camera with the absolute scale provided by the IMU.
Initial IMU Bias Estimation — Accurate sensor fusion requires knowledge of any inherent biases in the IMU measurements. You must estimate the initial bias values for both the accelerometer and the gyroscope to correct the IMU data stream. This correction enables the algorithm to achieve for more precise alignment and integration of the IMU data with the camera data.

% Initialize camera and IMU parameters
% Information matrix (measure of accuracy) associated with the camera
% projection factors that relate 3-D points and estimated camera poses.
cameraInformation = ((data.intrinsics.K(1,1)/1.5)^2)*eye(2);
% Initialize IMU parameters. The fusion accuracy depends on these parameters.
imuParams = factorIMUParameters(SampleRate=100,GyroscopeNoise=0.1, ...
    GyroscopeBiasNoise=3e-3,AccelerometerNoise=0.3, ...
    AccelerometerBiasNoise=1e-3,ReferenceFrame="ENU");
readyToAlignCameraAndIMU = false;

After the map is initialized, read and process images until the IMU is aligned with camera.

% Camera-IMU alignment loop
while ~status.isIMUAligned

3-D Point Triangulation

When using the latest 2D-2D correspondences for camera-world pose estimation, you must frequently create new 3-D points.

    % Triangulate new 3D points based on the latest 2D-2D correspondences stored in 
    % the feature point manager and view set.
    [newXYZ,newXYZID,newPointViews,newPointObs] = triangulateNew3DPoints(fpManager,viewSet);

Estimated Pose Refinement Using Factor Graph Optimization

The factor graph optimization minimizes errors in the trajectory or camera pose estimation from various sources, such as inaccurate feature tracking or the presence of outliers.

Graph optimization adjusts camera pose node estimates so that they satisfy constraints imposed by sensor measurements. These sensor measurements include camera observations, where 3-D points are projected onto the image plan to generate 2-D image point observations, as well as constraints from IMU data such as relative poses and changes in velocity. You can categorize optimization based on the type of factors used. Depending on the types of sensor measurements, the optimization can be one of two types:

Bundle Adjustment — This optimization uses only camera measurements to refine the camera poses. The factor graph uses the factorCameraSE3AndPointXYZ (Navigation Toolbox) object to represent the constraints from camera observations.
Visual-Inertial Optimization — This optimization extends bundle adjustment by incorporating IMU measurements, such as gyroscope and accelerometer data, into the factor graph. The factor graph uses the factorIMU (Navigation Toolbox) object to represent IMU-based constraints in the optimization process.

Initially, before the camera and IMU are fully aligned, the factor graph will rely solely on camera-based factors for bundle adjustment. As the algorithm progresses and establishes the alignment, the algorithm uses both camera and IMU measurements for a more robust visual-inertial optimization.

    % Update the sliding window with the latest 3-D points and camera view pose.
    newPointsTriangulated = false;
    if ~isempty(newXYZ)
        newPointsTriangulated = true;
        % Store all new 3D-2D correspondenses.
        for pIte = 1:length(newPointViews)
            allCameraTracks{newPointViews(pIte)} = [allCameraTracks{newPointViews(pIte)}; newPointObs{pIte}];
        end
        obs = vertcat(newPointObs{:});
        % Create camera projection factors with the latest
        % 3-D point landmark observations in the current image.
        fCam = factorCameraSE3AndPointXYZ([ids.pose(obs(:,1)) ...
            ids.point3(obs(:,2))],data.intrinsics.K, ...
            Measurement=obs(:,3:4), ...
            Information=cameraInformation);
        addFactor(fg,fCam);
    end

    % Set current camera pose node state guess.
    slidingWindowViewIDs = getSlidingWindowIDs(fpManager);
    currentCameraPose = poses(viewSet,slidingWindowViewIDs(end));
    nodeState(fg, ...
        ids.pose(slidingWindowViewIDs(end)), ...
        helperCameraPoseTableToSE3Vector(currentCameraPose));

Due to the computationally intensive nature of this optimization process, it is not feasible to perform it after the estimation of each camera frame. Instead, to balance accuracy with computational efficiency, you can control the frequency of optimization using a parameter. Use the parameter to determine the optimal interval at which the optimization should be executed, based on the number of frames processed, to meet the optimization needs of the application.

    if helperDecideToRunGraphOptimization(currFrameIdx,newPointsTriangulated,params)

The factor graph optimization can use all the sensor measurements and estimates until the current frame to refine the state estimates. Using all the frame data in the factor graph at every optimization step produces more accurate solutions but can very computationally extensive. To improve execution-time performance, you can consider only a few recent frame measurements for optimization. This type of optimization is referred to as sliding window optimization or partial graph optimization.

        % Use partial factor graph optimization with only the latest key
        % frames, for performance. 
        nodeIdsToOptimize = ids.pose(slidingWindowViewIDs);
        xyzIds = getPointIdsInViews(fpManager,slidingWindowViewIDs);

        % Add guess for newly triangulated 3-D point node states.
        xyz = getXYZPoints(fpManager,xyzIds);
        fg.nodeState(ids.point3(xyzIds),xyz);

        % Fix a few pose nodes during graph optimization
        helperFixNodes(fg,ids.pose,slidingWindowViewIDs,windowState.isWindowFull);
        % Optimize the sliding window.
        optiInfo = optimize(fg,nodeIdsToOptimize,params.SolverOpts);
        % Unfix nodes after optimization
        helperUnFixNodes(fg,ids.pose,slidingWindowViewIDs,windowState.isWindowFull);

Update the feature manager and view set with the optimization results.

        if ~status.isMedianDepthEstimated
            status.isMedianDepthEstimated = true;
            xyz = fg.nodeState(ids.point3(xyzIds));
            EstimatedMedianDepth = median(vecnorm(xyz.'));
            [posesUpdated,xyz] = helperTransformToNavigationFrame(helperUpdateCameraPoseTable(poses(viewSet,slidingWindowViewIDs), ...
                fg.nodeState(ids.pose(slidingWindowViewIDs))), ...
                xyz,rigidtform3d,initialSceneMedianDepth/EstimatedMedianDepth);
            % Set current camera pose node state guess.
            fg.nodeState(ids.pose(slidingWindowViewIDs), ...
                helperCameraPoseTableToSE3Vector(posesUpdated));
            % Add guess for newly triangulated 3-D points node states.
            fg.nodeState(ids.point3(xyzIds),xyz);
        else
            posesUpdated = helperUpdateCameraPoseTable(poses(viewSet,slidingWindowViewIDs), ...
                fg.nodeState(ids.pose(slidingWindowViewIDs)));

            xyz = fg.nodeState(ids.point3(xyzIds));
        end
        % Update the view set after visual-inertial optimization.
        viewSet = updateView(viewSet,posesUpdated);
        setXYZPoints(fpManager,xyz,xyzIds);
    end

Add new feature points to the Kalman tracker, in case the number of points goes below the feature tracking threshold.

    createNewFeaturePoints(fpManager,I);
    currPoints = getKeyPointsInView(fpManager,viewID);
    setPoints(tracker,currPoints);

    if windowState.isWindowFull && ~readyToAlignCameraAndIMU
        % The sliding window is full and the camera and IMU are not yet aligned.
        readyToAlignCameraAndIMU = true;
        % Plot the map created by Initial SfM. The figure shows the
        % estimated camera pose and observed landmarks
        helperPlotCameraPosesAndLandmarks(fpManager,viewSet,removedFrameIDs,true);
    end
    % Keep processing new images.
    currFrameIdx = currFrameIdx + 1;
    if currFrameIdx > endFrameIdx
        break
    else
        I = data.images{currFrameIdx};
        I = helperProcessImage(I,params,data.intrinsics);
        viewID = viewID + 1;
        % Track previous frame points in the current frame.
        [currPoints,validIdx] = tracker(I);
        [removedFrameID,windowState] = updateSlidingWindow(fpManager,I,currPoints,validIdx,viewID);
        if (removedFrameID > fpManager.slidingWindowViewIDs(1))
            % Store non-key frames or removed frame IDs.
            removedFrameIDs(end + 1) = removedFrameID; %#ok<SAGROW>
        end
        swIDs = getSlidingWindowIDs(fpManager);
        if length(swIDs) > 2
            [matches1,matches2] = get2DCorrespondensesBetweenViews(fpManager,swIDs(end-2),viewID);
        end

        if readyToAlignCameraAndIMU
            swIDs = getSlidingWindowIDs(fpManager);
            % Because we have not yet computed the latest frame pose,
            % So use only the past few frames for alignment.
            swIDs = swIDs(1:end-1);
            [gyro,accel] = helperExtractIMUDataBetweenViews( ...
                data.gyroReadings,data.accelReadings,data.timeStamps,allFrameIDs(swIDs));
            [xyz] = getXYZPoints(fpManager,xyzIds);
            % Align camera with IMU.
            camPoses = poses(viewSet,swIDs);
            [gRot,scale,info] = ...
                estimateGravityRotationAndPoseScale(camPoses,gyro,accel, ...
                SensorTransform=data.camToIMUTransform,IMUParameters=imuParams);
            disp("Estimated scale: " + scale);

If the alignment is successful, update the camera poses, 3-D points, and add IMU factors between the initial frames in the current sliding window.

            if info.IsSolutionUsable && scale > 1e-3
                status.isIMUAligned = true;
                posesUpdated = poses(viewSet);
                % Transform camera poses to navigation frame using
                % computed gravity rotation and pose scale.
                [posesUpdated,xyz] = helperTransformToNavigationFrame(posesUpdated,xyz,gRot,scale);
                viewSet = updateView(viewSet,posesUpdated);
                % Plot the scaled and unscaled estimated trajectory against
                % ground truth.
                helperPlotCameraIMUAlignment(data,camPoses,scale,allFrameIDs);
                % After alignment, add IMU factors to factor graph.
                for k = 1:length(gyro)
                    nId = [ids.pose(swIDs(k)),ids.vel(swIDs(k)),ids.bias(swIDs(k)), ...
                        ids.pose(swIDs(k+1)),ids.vel(swIDs(k+1)),ids.bias(swIDs(k+1))];
                    fIMU = factorIMU(nId,gyro{k},accel{k},imuParams,SensorTransform=data.camToIMUTransform);
                    fg.addFactor(fIMU);
                end

                % Set camera pose node guesses and 3-D point guesses
                % after alignment.
                fg.nodeState(ids.pose(swIDs), ...
                    helperCameraPoseTableToSE3Vector(poses(viewSet,swIDs)));
                fg.nodeState(ids.point3(xyzIds),xyz);

Estimate an initial guess for IMU bias by using factor graph optimization with the camera projection and IMU factors.

                % Add prior to first camera pose to fix it softly during
                % optimization.
                fixNode(fg,ids.pose(swIDs));
                fixNode(fg,ids.point3(xyzIds));
                % Add velocity prior to first IMU velocity node.
                fVelPrior = factorVelocity3Prior(ids.vel(swIDs(1)));
                addFactor(fg,fVelPrior);

                % Add bias prior to first bias node.
                fBiasPrior = factorIMUBiasPrior(ids.bias(swIDs(1)));
                addFactor(fg,fBiasPrior);

                % Perform visual-inertial optimization after alignment to estimate
                % initial IMU bias values.
                soll1 = optimize(fg,params.SolverOpts);
                fixNode(fg,ids.pose(swIDs),false);
                fixNode(fg,ids.point3(xyzIds),false);

                fixNode(fg,ids.pose(swIDs(1)));
                soll = optimize(fg,params.SolverOpts);
                fixNode(fg,ids.pose(swIDs(1)),false);

                % Update feature manager and view set after optimization.
                viewSet = updateView(viewSet,helperUpdateCameraPoseTable(poses(viewSet,swIDs), ...
                    fg.nodeState(ids.pose(swIDs))));
                xyz = fg.nodeState(ids.point3(xyzIds));
                setXYZPoints(fpManager,xyz,xyzIds);
            end
        else
            [currPoints,pointIds,isTriangulated] = getKeyPointsInView(fpManager,viewID);
            hasTriangulatedPoints = true(size(currPoints,1),1);
            triangulatedPoints = find(isTriangulated);

No IMU prediction is available, so use 3D-2D correspondences to estimate the current view pose.

            x3D = getXYZPoints(fpManager,pointIds(isTriangulated));
            c2D = currPoints(isTriangulated,:);
            ii = false(size(x3D,1),1);
            currPose = rigidtform3d;
            [currPose,ii] = helperEstimateCameraPose(params,c2D,x3D,data.intrinsics,currPose,ii);
            hasTriangulatedPoints(triangulatedPoints(~ii)) = false;

            setKeyPointValidityInView(fpManager,viewID,hasTriangulatedPoints);
            viewSet = addView(viewSet,viewID,currPose);

Add camera projection factors related to the 3-D point tracks of the current view.

            obs2 = pointIds(isTriangulated);
            obs2 = obs2(ii);
            fCam = factorCameraSE3AndPointXYZ( ...
                [ids.pose(viewID*ones(size(obs2))) ids.point3(obs2)], ...
                data.intrinsics.K,Measurement=c2D(ii,:), ...
                Information=cameraInformation);
            allCameraTracks{viewID} = [viewID*ones(size(obs2)) obs2 fCam.Measurement];
            fg.addFactor(fCam);
        end
    end
end

Figure contains an axes object. The axes object with title Initial Structure from Motion contains 211 objects of type line, text, patch, scatter.

Estimated scale: 1.8529

Figure contains an axes object. The axes object with title Camera-IMU Alignment contains 3 objects of type line. These objects represent Ground Truth, Estimated trajectory before alignment, Estimated trajectory after alignment.

Visual-Inertial Optimization

Upon successful camera-IMU alignment, the factor graph can harness both camera and IMU factors to optimize the state estimates of the system. The visual-inertial factor graph comprises several types of nodes, each representing different aspects of the system's state at various timestamps:

Camera Pose Nodes — These nodes represent the estimated position and orientation of the camera at different points in time. The SfM algorithm estimates these pose nodes and links the pose nodes to other pose nodes through camera projection factors and IMU factors, which integrate the visual and inertial measurements.
3-D Point Landmark Nodes — These nodes correspond to the mapped features in the environment. SfM estimates these nodes and connects them to camera pose nodes using camera projection factors, which help in refining the landmark positions based on observed image points.
IMU Velocity Nodes — These nodes capture the estimated velocity of the camera at different timestamps. IMU preintegration computes these nodes and connects them to other nodes using IMU factors that reflect changes in motion between timestamps.
IMU Bias Nodes — Representing the estimated biases in IMU measurements, these nodes are initially unknown and are refined through the optimization process. IMU factors connect these nodes to other nodes.

Each timestep contains a pose node, a velocity node, and a bias node. Each of these nodes in one time step connects to the pose, velocity and bias nodes of another time step using a factorIMU object. When the UAV observes feature points using the camera, a factorCameraSE3AndPointXYZ object relates the observed feature points to the pose node of the observed time step. This process repeats for each new time step of the UAV flight, with each new timestep introducing additional nodes and factors to the graph.

Enable trajectory visualization. Deselect to skip trajectory visualization.

visualizeTrajectory = true;
if visualizeTrajectory
    axTraj = [];
    axFactorGraph = axes(figure);
end

Start the visual-Inertial optimization loop. Extract gyroscope and accelerometer readings between the curren image frame and the last image frame. Then create the transformation to transform a camera pose to the IMU base frame for the IMU residual computation. And then create and add the IMU factor to the graph.

while currFrameIdx < endFrameIdx + 1
    % Extract gyro and accel reading between current image frame
    % and last acquired image frame to create IMU factor.
    swIDs = getSlidingWindowIDs(fpManager);
    svs = swIDs((end-1):end);
    [gyro,accel] = helperExtractIMUDataBetweenViews(data.gyroReadings, ...
        data.accelReadings,data.timeStamps,allFrameIDs(svs));
    nodeID = [ids.pose(svs(1)) ids.vel(svs(1))  ids.bias(svs(1)) ...
              ids.pose(svs(2)) ids.vel(svs(2))  ids.bias(svs(2))];
    % Create the transformation required to transform a camera pose
    % to IMU base frame for the IMU residual computation.
    fIMU = factorIMU(nodeID,gyro{1},accel{1},imuParams, ...
        SensorTransform=data.camToIMUTransform);

    % Add camera pose and IMU factor to graph.
    fg.addFactor(fIMU);

IMU Pose Prediction

When IMU data is available, you can predict the world pose of the camera by integrating accelerometer and gyroscope readings. Use factor graph optimization to further refine this prediction.

    % Set velocity and bias guess.
    prevP = nodeState(fg,ids.pose(svs(1)));
    prevVel = nodeState(fg,ids.vel(svs(1)));
    prevBias = nodeState(fg,ids.bias(svs(1)));
    [pp,pv] = fIMU.predict(prevP,prevVel,prevBias);

    [currPoints,pointIds,isTriangulated] = getKeyPointsInView(fpManager,viewID);
    hasTriangulatedPoints = true(size(currPoints,1),1);
    triangulatedPoints = find(isTriangulated);

Use the IMU predicted pose as an initial guess for motion-only bundle adjustment.

    x3D = getXYZPoints(fpManager,pointIds(isTriangulated));
    c2D = currPoints(isTriangulated,:);
    [currPose,velRefined,biasRefined,ii] = helperBundleAdjustmentMotion( ...
        x3D,c2D,data.intrinsics,size(I),pp,pv,prevP,prevVel,prevBias,fIMU);
    fg.nodeState(ids.vel(viewID),velRefined);
    fg.nodeState(ids.bias(viewID),biasRefined);
    hasTriangulatedPoints(triangulatedPoints(~ii)) = false;

    setKeyPointValidityInView(fpManager,viewID,hasTriangulatedPoints);
    viewSet = addView(viewSet,viewID,currPose);

Add camera projection factors related to the 3-D point tracks of the current view.

    obs2 = pointIds(isTriangulated);
    obs2 = obs2(ii);
    fCam = factorCameraSE3AndPointXYZ( ...
        [ids.pose(viewID*ones(size(obs2))) ids.point3(obs2)], ...
        data.intrinsics.K,Measurement=c2D(ii,:),Information=cameraInformation);
    allCameraTracks{viewID} = [viewID*ones(size(obs2)) obs2 fCam.Measurement];
    fg.addFactor(fCam);
    [newXYZ,newXYZID,newPointViews,newPointObs] = triangulateNew3DPoints(fpManager,viewSet);
    newPointsTriangulated = false;
    if ~isempty(newXYZ)
        newPointsTriangulated = true;
        % Store all new 3D-2D correspondenses.
        for pIte = 1:length(newPointViews)
            allCameraTracks{newPointViews(pIte)} = [allCameraTracks{newPointViews(pIte)}; newPointObs{pIte}];
        end
        obs = vertcat(newPointObs{:});
        % Create camera projection factors with the latest
        % 3-D point landmark observations in the current image.
        fCam = factorCameraSE3AndPointXYZ([ids.pose(obs(:,1)) ...
            ids.point3(obs(:,2))],data.intrinsics.K, ...
            Measurement=obs(:,3:4),Information=cameraInformation);
        addFactor(fg,fCam);
    end

    % Set current camera pose node state guess.
    slidingWindowViewIDs = getSlidingWindowIDs(fpManager);
    currentCameraPose = poses(viewSet,slidingWindowViewIDs(end));
    nodeState(fg, ...
        ids.pose(slidingWindowViewIDs(end)), ...
        helperCameraPoseTableToSE3Vector(currentCameraPose));
    
    if helperDecideToRunGraphOptimization(currFrameIdx,newPointsTriangulated,params)
        nodeIdsToOptimize = ids.pose(slidingWindowViewIDs);
        xyzIds = getPointIdsInViews(fpManager,slidingWindowViewIDs);
        xyz = getXYZPoints(fpManager,xyzIds);
        fg.nodeState(ids.point3(xyzIds),xyz);

        % Fix a few pose nodes during graph optimization
        helperFixNodes(fg,ids.pose,slidingWindowViewIDs,windowState.isWindowFull);

        % Fix the first velocity and bias nodes in the sliding
        % window.
        fixNode(fg,ids.vel(slidingWindowViewIDs(1)));
        fixNode(fg,ids.bias(slidingWindowViewIDs(1)));

        % Optimize the sliding window.
        optiInfo = optimize(fg,nodeIdsToOptimize,params.SolverOpts);

        % Unfix nodes after optimization
        helperUnFixNodes(fg,ids.pose,slidingWindowViewIDs,windowState.isWindowFull);

        posesUpdated = helperUpdateCameraPoseTable(poses(viewSet,slidingWindowViewIDs), ...
            fg.nodeState(ids.pose(slidingWindowViewIDs)));

        xyz = fg.nodeState(ids.point3(xyzIds));

        % Update the view set after visual-inertial optimization.
        viewSet = updateView(viewSet,posesUpdated);
        setXYZPoints(fpManager,xyz,xyzIds);

        % Visualize the estimated trajectory.
        if visualizeTrajectory && mod(currFrameIdx,params.optimizationFrequency*4)==0
            axTraj = helperVisualizeTrajectory(axTraj,fpManager,viewSet,removedFrameIDs);
            show(fg,"Landmark","on","Parent",axFactorGraph,"Legend","on");
            xlim(axFactorGraph,[-4 7]);
            ylim(axFactorGraph,[-8 3]);
            zlim(axFactorGraph,[-3 1]);
            view(axFactorGraph,36,40);
            title(axFactorGraph,"Estimated Trajectory and Landmarks");
            drawnow;
        end
    end

Add new feature points to the Kalman tracker, in case the number of points goes below the feature tracking threshold.

    createNewFeaturePoints(fpManager,I);
    currPoints = getKeyPointsInView(fpManager,viewID);
    setPoints(tracker,currPoints);

    % Process new images.
    currFrameIdx = currFrameIdx + 1;
    if currFrameIdx > endFrameIdx
        break
    end
    I = data.images{currFrameIdx};
    I = helperProcessImage(I,params,data.intrinsics);
    viewID = viewID + 1;
    % Track previous frame points in the current frame.
    [currPoints,validIdx] = tracker(I);
    [removedFrameID,windowState] = updateSlidingWindow(fpManager,I,currPoints,validIdx,viewID);
    if (removedFrameID > fpManager.slidingWindowViewIDs(1))
        % Store non-key frames or removed frame IDs.
        removedFrameIDs(end+1) = removedFrameID;
    end
    swIDs = getSlidingWindowIDs(fpManager);
end

Figure contains an axes object. The axes object with title Estimated Trajectory contains 2 objects of type line. These objects represent Estimated Trajectory, Sliding Window Frames, Most Recent Frame.

Figure contains an axes object. The axes object with title Estimated Trajectory and Landmarks contains 3 objects of type line, scatter. One or more of the lines displays its values using only markers These objects represent Pose Node, Pose Edge, Landmark Node.

This is a sample image of the scene taken by the UAV camera in the start frame.

Plot all key frame camera poses and 3-D points. Observe the landmarks on features such as the ceiling, floor, and pillars.

axFG = show(fg);
xlim(axFG,[-10 10]);
ylim(axFG,[-10 10]);
zlim(axFG,[-5 4]);
view(axFG,10.46,-4.28);
title(axFG,"Estimated Trajectory and Landmarks");

Figure contains an axes object. The axes object with title Estimated Trajectory and Landmarks contains 3 objects of type line, scatter. One or more of the lines displays its values using only markers

Compare Estimated Trajectory with Ground Truth

As a measure of accuracy, compute these metrics:

Absolute trajectory error (ATE) — Root Mean Squared Error (RMSE) between computed camera locations and ground truth camera locations.
Scale error — Percentage of how far the computed median scale is to original scale.

Plot the estimated trajectory against the ground truth.

axf = axes(figure);
helperPlotAgainstGroundTruth(viewSet,data.gTruth,data.camToIMUTransform, ...
            allFrameIDs(viewSet.Views.ViewId),axf,removedFrameIDs);

Figure contains an axes object. The axes object contains 2 objects of type line. These objects represent Ground Truth Trajectory, Estimated Trajectory.

Evaluate the tracking accuracy, based on root mean square error (RMSE) and median scale error.

helperComputeErrorAgainstGroundTruth(data.gTruth,viewSet,allFrameIDs,removedFrameIDs,data.camToIMUTransform);

    "Absolute RMSE for key frame trajectory (m): "    "0.30225"

    "Percentage of median scale error: "    "3.458"

Supporting Functions

This section details the short helper functions included in this example. Larger helper functions have been included in separate files.

helperDownloadVIOData downloads Visual-Inertial Odometry data set from specified URL to specified output folder.

helperFeaturePointManager manages key point tracks.

helperVIOParameters initializes visual-inertial odometry algorithm tunable parameters.

helperBundleAdjustmentMotion refines pose of current frame using motion-only bundle adjustment.

helperCameraPoseTableToSE3Vector converts pose table to N-by-7 SE(3) pose matrix.

helperDetectKeyPoints detects key points via tunable detectors.

helperExtractIMUDataBetweenViews extracts IMU data between specified views.

helperProcessImage equalizes and undistorts images if needed.

helperEstimateCameraPose estimate the current camera pose using 3D-2D correspondences.

helperTransformToNavigationFrame transforms and scales input poses and XYZ 3-D points to local navigation reference frame of IMU using gravity rotation and pose scale.

helperUpdateCameraPoseTable updates pose table with latest estimated N-by-7 SE(3) poses.

helperPlotCameraIMUAlignment plots the camera-IMU alignment result.

helperVisualizeTrajectory plots trajectory with latest data stored in view set and feature manager.

helperPlotCameraPosesAndLandmarks plots estimated trajectory and 3-D landmarks.

helperPlotAgainstGroundTruth plots estimated trajectory and ground truth trajectory for visual comparison.

helperComputeErrorAgainstGroundTruth computes absolute trajectory error and scale error compared to known ground truth.

helperGenerateNodeID generates unique factor graph node IDs for fixed number of camera view poses, IMU velocities, IMU biases, and 3-D point nodes.

function ids = helperGenerateNodeID(fg,maxFrames,maxLandmarks)
% helperGenerateNodeID

ids.pose = generateNodeID(fg,[maxFrames 1]);
ids.vel = generateNodeID(fg,[maxFrames 1]);
ids.bias = generateNodeID(fg,[maxFrames 1]);
ids.point3 = generateNodeID(fg,[maxLandmarks 1]);
end

helperDecideToRunGraphOptimization decides whether to run or skip graph optimization at current frame.

function shouldOptimize = helperDecideToRunGraphOptimization(currFrameIdx,newPointsTriangulated,params)
% helperDecideToRunGraphOptimization

% If the current frame belongs to the initial set of frames, then run graph
% optimization every frame, because the initial SfM is still running.
% Otherwise, after a number of frames specified by optimization frequency,
% run graph optimization. Lower frequency can result in a more accurate
% estimation, but can increase execution time.
numberOfInitialFrames = 250;
shouldOptimize = (currFrameIdx < numberOfInitialFrames) || (mod(currFrameIdx,params.optimizationFrequency) == 0) || newPointsTriangulated;
end

helperFixNodes fixes pose nodes during graph optimization.

function helperFixNodes(factorGraph,poseIds,slidingWindowIds,isWindowFull)
% helperFixNodes fixes pose nodes during graph optimization
if isWindowFull
    fixNode(factorGraph,poseIds(slidingWindowIds(1:11)));
else
    fixNode(factorGraph,poseIds(slidingWindowIds(1)));
end
end

helperUnFixNodes unfixes nodes after graph optimization.

function helperUnFixNodes(factorGraph,poseIds,slidingWindowIds,isWindowFull)
% helperUnFixNodes unfixes nodes after graph optimization
if isWindowFull
    fixNode(factorGraph,poseIds(slidingWindowIds(1:11)),false);
else
    fixNode(factorGraph,poseIds(slidingWindowIds(1)),false);
end
end

References

[1] Qin, Tong, Peiliang Li, and Shaojie Shen. “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator.” IEEE Transactions on Robotics 34, no. 4 (August 2018): 1004–20. https://doi.org/10.1109/TRO.2018.2853729

[2] Antonini, Amado, Winter Guerra, Varun Murali, Thomas Sayre-McCord, and Sertac Karaman. “The Blackbird Dataset: A Large-Scale Dataset for UAV Perception in Aggressive Flight.” In Proceedings of the 2018 International Symposium on Experimental Robotics, edited by Jing Xiao, Torsten Kröger, and Oussama Khatib, 11:130–39. Cham: Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-33950-0_12