THESIS
2014
ix, 35 pages : illustrations ; 30 cm
Abstract
In this dissertation, we consider the business data mining problem, the Amazon employee's access as an example, to demonstrate the existing and proposed feature selection
and classification methods. High-dimensionality is one of the most challenging problems
in the past decade, which leading to need of the selection of predictors. Moreover, when
interaction among explanatory variables is taken into account, the dimensionality would
become even larger. Thus, feature selection has been a hot topic in terms of supervised
learning.
We first do an overview of classification methods, including K-nearest neighbors, tree
based models, ridge regression and LASSO. And then we apply them on the Amazon
dataset to get result for comparison. After that, we propose a white-box procedure of
in...[
Read more ]
In this dissertation, we consider the business data mining problem, the Amazon employee's access as an example, to demonstrate the existing and proposed feature selection
and classification methods. High-dimensionality is one of the most challenging problems
in the past decade, which leading to need of the selection of predictors. Moreover, when
interaction among explanatory variables is taken into account, the dimensionality would
become even larger. Thus, feature selection has been a hot topic in terms of supervised
learning.
We first do an overview of classification methods, including K-nearest neighbors, tree
based models, ridge regression and LASSO. And then we apply them on the Amazon
dataset to get result for comparison. After that, we propose a white-box procedure of
influential subset selection, on which we form a hundred subsets of important "split"
variables. Finally, the classification job is done on the selected subsets. The proposed
algorithm outperforms some of existing methods, although it also has some limitations.
Post a Comment