Abstract
Preprocessing data is an important step before any data analysis. In this thesis,
we focus on one particular aspect, namely scaling or normalization. We analyze
various scaling methods in common use and study their effects on different
statistical learning models. We will propose a new two-stage scaling method.
First, we use some training data to fit linear regression model and then scale the
whole data based on the coefficients of regression. Simulations are conducted to
illustrate the advantages of our new scaling method. Some real data analysis will
also be given.
Post a Comment