THESIS
2023
1 online resource (xiii, 101 pages) : illustrations (some color)
Abstract
Conventional object recognition methods typically require a substantial amount of training
data and preparing such high-quality training data is very labor-intensive. This has
motivated the recent development of data-efficient object recognition. Further research is
needed to develop novel techniques that can enable high-performance object recognition
with limited labeled data. The success of such techniques can have a significant impact on
various applications such as autonomous vehicles, robotics, and healthcare.
In this thesis, we present our proposed data-efficient object recognition methods. We
first introduce a general few-shot object detection model that can be applied to detect
novel objects without re-training and fine-tuning by exploiting matching relationship between
object p...[
Read more ]
Conventional object recognition methods typically require a substantial amount of training
data and preparing such high-quality training data is very labor-intensive. This has
motivated the recent development of data-efficient object recognition. Further research is
needed to develop novel techniques that can enable high-performance object recognition
with limited labeled data. The success of such techniques can have a significant impact on
various applications such as autonomous vehicles, robotics, and healthcare.
In this thesis, we present our proposed data-efficient object recognition methods. We
first introduce a general few-shot object detection model that can be applied to detect
novel objects without re-training and fine-tuning by exploiting matching relationship between
object pairs in a weight-shared network at multiple network stages. Central to our
method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy,
which exploit the similarity between the few shot support set and query set to detect
novel objects while suppressing false detection in the background. To train our network,
we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations.
We extend our box-level recognition FSOD method to perform accurate pixel-level prediction
for novel classes, i.e., few-shot semantic segmentation, and co-salient object detection.
Our proposed self-support few-shot semantic segmentation method addresses the
critical intra-class appearance discrepancy problem inherent in few-shot segmentation, by
leveraging the query feature to generate self-support prototypes and perform self-support
matching with query features. Then we investigate a novel group collaborative learning
framework GCoNet, by introducing effective semantic information to benefit the representation
of both the intra-group compactness and inter-group separability for CoSOD.
Then we propose a technique to detect objects in the video with three contributions
to real-world visual learning challenges in our highly diverse and dynamic world: 1) a
large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos
in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate
high-quality video tube proposals for aggregating feature representation for the target
video object which can be highly dynamic; 3) a strategically improved Temporal Matching
Network (TMN+) for matching representative query tube features with better discriminative
ability thus achieving higher diversity.
Finally, we generalize our method to detect objects in unseen domains. We analyze
and investigate effective solutions to overcome domain style overfitting for robust object
detection without the above shortcomings. Our method, dubbed as Normalization
Perturbation (NP), perturbs the channel statistics of source domain low-level features to
synthesize various latent styles, so that the trained deep model can perceive diverse potential
domains and generalizes well even without observations of target domain data in
training. Normalization Perturbation only relies on a single source domain and is surprisingly
simple and effective, contributing a practical solution by effectively adapting or
generalizing classification DG methods to robust object detection.
Post a Comment