THESIS
2021
1 online resource (xi, 70 pages) : illustrations (chiefly color)
Abstract
In this thesis, we introduce Greenformers, a collection of model efficiency methods to
improve the model efficiency of the recently renowned transformer models with a low-rank
approximation approach. The development trend of deep learning models tends to
results in a more complex and larger model. Although it leads to a better and more accurate
prediction, the resulting model becomes even more costly, as it requires weeks of
training with a huge amount of GPU resources. Particularly, the size and computational
cost of transformer-based models have increased tremendously since its first debut in 2017
from ~100 million parameters up to ~1.6 trillion parameters in early 2021. This computationally
hungry model also incurs a substantial cost to the environment and even reaches
an alarming le...[
Read more ]
In this thesis, we introduce Greenformers, a collection of model efficiency methods to
improve the model efficiency of the recently renowned transformer models with a low-rank
approximation approach. The development trend of deep learning models tends to
results in a more complex and larger model. Although it leads to a better and more accurate
prediction, the resulting model becomes even more costly, as it requires weeks of
training with a huge amount of GPU resources. Particularly, the size and computational
cost of transformer-based models have increased tremendously since its first debut in 2017
from ~100 million parameters up to ~1.6 trillion parameters in early 2021. This computationally
hungry model also incurs a substantial cost to the environment and even reaches
an alarming level of carbon footprint. Some of these models are so massive that it is even
impossible to run the model without a GPU cluster.
Greenformers improve the model efficiency of transformer models by applying low-rank
approximation approaches. Specifically, we propose a low-rank factorization approach
to improve the efficiency of the transformer model called Low-Rank Transformer.
We further compare our model with an existing low-rank factorization approach called
Linformer. Based on our analysis, the Low-Rank Transformer model is suitable for improving
both the time and memory efficiency in processing short-sequence (≤ 512) input
data, while the Linformer model is suitable for improving the efficiency in processing
long-sequence input data (≥ 512). We also show that Low-Rank Transformer is more suitable
for on-device deployment, as it significantly reduces the model size. Additionally,
we estimate that applying LRT to the existing BERT
BASE model can significantly reduce
the computational, economical, and environmental costs for developing such models by
more than 30% of its original costs.
Our Low-Rank Transformer can significantly reduce the computational time and memory
usage on the speech recognition task. Specifically, our Low-Rank Transformer can
halve the size of the model and increase the speed by up to 1.35x in the GPU and 1.25x in
the CPU while maintaining the performance of the model compared to the original transformer
model. Our finding suggests that transformer models tend to be over-parameterized
and our Low-Rank Transformer can help to mitigate the over-parameterization problem,
yielding a more efficient model with a better generalization.
Additionally, we extend the possibility of applying a low-rank approximation approach
to a genomics study for Alzheimer’s disease risk prediction. We apply sequence
modeling techniques with the Linformer model to predict Alzheimer’s disease in the Chinese
cohort. We define our problem as a long sequence classification problem with various
lengths up to ~33,000 nucleotides long. Our result shows that Linformer models with Subword
Tokenization can process very long sequence data and boost the evaluation performance
by up to ~5% AUC compared to the existing FDA-approved risk scoring model and
other deep learning variants. Based on our analysis, we further conclude that the choice
of tokenization approach can also provide a huge computation and memory efficiency as
much as the efficient model approach, which makes consideration of choosing tokenization
approach more prominent for developing a more efficient transformer model.
Post a Comment