THESIS
2022
1 online resource (xxii, 214 pages) : color illustrations
Abstract
Although deep learning has many practical applications, it is known that deep
neural networks are vulnerable to adversarial examples, which are small perturbations
of inputs that can fool neural networks into making wrong predictions.
In this thesis, we propose new methods and theories to evaluate, understand and
improve the adversarial robustness of deep neural networks.
Firstly, we investigate the black-box adversarial attacks, where the attacker has
no information about the target model except for its output. We propose two new
methods, ZOHA and TREMBA, to accelerate the black-box attack. In ZOHA,
the second order information is incorporated into the zeroth-order optimization.
In TREMBA, we utilize the transferability of adversarial examples, developing
a new search space and greatly...[
Read more ]
Although deep learning has many practical applications, it is known that deep
neural networks are vulnerable to adversarial examples, which are small perturbations
of inputs that can fool neural networks into making wrong predictions.
In this thesis, we propose new methods and theories to evaluate, understand and
improve the adversarial robustness of deep neural networks.
Firstly, we investigate the black-box adversarial attacks, where the attacker has
no information about the target model except for its output. We propose two new
methods, ZOHA and TREMBA, to accelerate the black-box attack. In ZOHA,
the second order information is incorporated into the zeroth-order optimization.
In TREMBA, we utilize the transferability of adversarial examples, developing
a new search space and greatly reducing the number of queries for black-box
attacks. These algorithms demonstrate that the black-box attack can be practical
threats to the practical models.
Secondly, we study the existence of adversarial examples and its relationship
to benign overfitting. We provide a theoretical explanation for why adversarial
examples exist in standard training of neural networks: adversarial examples are
by-products of overfitting the noise in the overparameterized models. Moreover, our theory explains the trade-off between robustness and clean performance.
Lastly, we improve the poor generalization of adversarial training by a novel
test-time fine-tuning strategy. Standard adversarial training does not necessarily
achieve near optimal generalization performance on test samples. Bayesian
optimal robust estimator requires test-time adaptation, and such adaptation
can lead to significantly better performance. Motivated by this observation, we
propose a practically easy-to-implement method that fine-tunes the adversarially-trained
networks with an additional self-supervised test-time adaptation step.
And we introduce a meta adversarial training method to find a good starting
point for test-time fine-tuning. The empirical experiments also demonstrate the
effectiveness of the proposed strategy.
Post a Comment