Hi Cui,
To the best of my knowledge, the algorithms you cite are mostly used in vision-based tracking and they are intended for use on a single video stream, in other words a single-sensor system. Moreover, an algorithm like deepSORT is actually a combination of deep-learning detector and a SORT (simple, online, and real-time tracker), and if you dig deep into the SORT part, it's nothing more than a Hungarian association algorithm and Kalman-based filtering (from the link you shared). That combination of association and filtering is the same as trackerGNN in Sensor Fusion and Tracking, which means you just need a detector to feed it. The algorithms we ship in Sensor Fusion and Tracking Toolbox are designed for multi-sensor, multi-modality tracking problems. Our goal is to enable tracking using a variety of sensors, including camera, but also radar, lidar, sonar, etc. Therefore, the trackers in Sensor Fusion and Tracking algorithms do not depend on any particular sensor type and the inputs to them are generic objectDetection objects.
You can easily create a vision-based tracking system if you combine the detectors that Computer Vision Toolbox offers with the trackers from Sensor Fusion and Tracking Toolbox. The Computer Vision Toolbox detector (and there are several types of them, some of them based on deep learning) will output bounding boxes and if you define objectDetection for each bounding box and you can use any tracking filter and tracker from Sensor Fusion and Tracking to complete the tracking. Please see:
Similarly, we use bounding boxes coming from Lidar Toolbox detectors to track objects. Some of these detectors are based on deep learning as well. There are several examples of that in Automated Driving Toolbox, Sensor Fusion and Tracking Toolbox, and Lidar Toolbox.
Thanks,
Elad