THESIS
2022
1 online resource (xv, 64 pages) : illustrations (some color)
Abstract
Online multi-object tracking (MOT) is one of the fundamental tasks in computer
vision with its wide range of applications in video surveillance and autonomous driving.
However, the online setting is challenging and not robust against occlusion and motion
blur in the videos since future information is restricted to be exploited to refine the
output from the current timestep. In this thesis, we propose two online MOT models
which are robust against partial occlusion by approaching two different spatio-temporal
(S-T) modeling: 1) pixel-level S-T modeling, and 2) object-level S-T modeling.
In the approach of pixel-level S-T modeling, we propose Dynamic GNNs for Simultaneous
Detection and Tracking (DynGSDT) that enhances the feature map of the current
frame by dynamically propagating the pre...[
Read more ]
Online multi-object tracking (MOT) is one of the fundamental tasks in computer
vision with its wide range of applications in video surveillance and autonomous driving.
However, the online setting is challenging and not robust against occlusion and motion
blur in the videos since future information is restricted to be exploited to refine the
output from the current timestep. In this thesis, we propose two online MOT models
which are robust against partial occlusion by approaching two different spatio-temporal
(S-T) modeling: 1) pixel-level S-T modeling, and 2) object-level S-T modeling.
In the approach of pixel-level S-T modeling, we propose Dynamic GNNs for Simultaneous
Detection and Tracking (DynGSDT) that enhances the feature map of the current
frame by dynamically propagating the previous tracklets to the current frame. With
learned edge weights in GNN, the current frame adaptively selects the features from the
previous frame. Experiment results show that DynGSDT outperforms its baseline models
FairMOT and GSDT. Especially, DynGSDT shows a larger gap on MOT20 than MOT17
since MOT20 is much more crowded than MOT17 and thus occlusion between objects is
dominant.
We point out that the existing tracking-by-detection (TBD) framework is inherently
vulnerable to missed detections caused by occlusion. Since only detections whose confidence
score is above the detection threshold are selected for tracking in the TBD framework,
the object under severe occlusion may be detected with a score slightly lower than
the threshold and is excluded from tracking. Motivated by this problem, we suggest detection
recovery by tracking framework and propose Sparse Graph Tracker (SGT) based
on object-level S-T modeling with GNN. SGT associates tracklets and top-K detections.
Then, the missed detections whose score is lower than the threshold are recovered as positive detections if they are matched with the tracklets. SGT achieves the state-of-the-art
performance on the MOT20 dataset and comparable performance on the MOT16/17
datasets. Extensive ablation studies demonstrate the effectiveness of the detection recovery
mechanism proposed in SGT.
Post a Comment