THESIS
2021
1 online resource (xiv, 89 pages) : illustrations (some color)
Abstract
In this thesis, we focus on the theory part of deep neural networks including generalization
and adversarial robustness which are essential problems in deep learning.
Generalization of deep neural networks is still a mystery although deep learning has
been successfully applied to many areas. What we pursue is to provide an appropriate
explanation for the success and failure of margin based Rademacher complexity bounds
for generalization ability of deep neural networks. In traditional machine learning
community, margin based Rademacher complexity have been used to explain the
generalization of bagging and boosting and it is shown that generalization ability
of these complex classifiers might be due to margin enlargement during training.
However, Breiman shows examples that uniform improv...[
Read more ]
In this thesis, we focus on the theory part of deep neural networks including generalization
and adversarial robustness which are essential problems in deep learning.
Generalization of deep neural networks is still a mystery although deep learning has
been successfully applied to many areas. What we pursue is to provide an appropriate
explanation for the success and failure of margin based Rademacher complexity bounds
for generalization ability of deep neural networks. In traditional machine learning
community, margin based Rademacher complexity have been used to explain the
generalization of bagging and boosting and it is shown that generalization ability
of these complex classifiers might be due to margin enlargement during training.
However, Breiman shows examples that uniform improvements on training margins do
not guarantee the decrease of generalization error, known as Breiman’s dilemma. We
show that this phenomenon also exists in deep neural networks and explore its possible
explanations. To reach this goal, we introduce the margin dynamics into deep neural
networks to analyze the generalization abilities. A novel perspective is provided to
explain the relationship between margin dynamics and generalization error based on
some phase transitions in dynamics of normalized margin distributions. Large training
margins may exhibit different dynamics to small margins, where the latter typically undergoes a monotone decay during training to reduce the loss, on the other hand the
former may first drop and then grow. We find that such a phase transition is related to
trade-off between the model expressive power and data complexity. It happens when
the expressive power of deep neural networks is comparable to the data complexity,
in this case improving small training margins one has to sacrifice the large margins.
On the other hand, we show that Breiman’s dilemma appears in deep neural networks
when models are over-expressive against data such that one can uniformly improve
both large and small training margins, that loses the phase transitions above and fails
the prediction of generalization error based on training margin distributions.
The adversarial robustness of deep neural networks is another problem in deep learning.
For most of the existing adversarial defense methods, they need adversarial training
to improve robustness of neural networks, hence have to make a trade-off between
natural accuracy and adversarial robustness. Recently some work show that Neural
Ordinary Differential Equations (ODEs) may exhibit certain adversarial robustness
without sacrificing natural accuracy and it remains open whether such designs lead
to genuine or fake robustness. Inspired by the dynamical system theory, we design a
stabilized neural ODE network named SONet whose ODE blocks are skew-symmetric
and proved to be stable in the sense of Lyapunov. With only natural training, SONet can
achieve comparable robustness against gradient based attacks with the state-of-the-art
adversarial training methods, without sacrificing natural accuracy. To understand the
underlying mechanism behind this superb robustness, we explore deeper the relationship
between numerical ODE solvers and gradient or gradient-free adversarial attacks.
Our results disclose that the adversarial robustness of ODE-based networks mainly
comes from the gradient masking effect in numerical ODE solvers with adaptive step
sizes, hence leads to a false sense of adversarial robustness.
Post a Comment