THESIS
2007
xiii, 108 leaves : ill. ; 30 cm
Abstract
One of the fundamental problems in statistical machine learning is the optimization problem under the generic regularization framework. The objective function of the optimization problem is comprised of two parts, i.e., the loss and the regularizer. Besides the parameters we aim to optimize, there are some other hyperparameters in the formulation, e.g., the regularization hyperparameter and the kernel hyperparameter, whose values have to be specified in advance by the user. The optimal hyperparameter value is data dependent, and prior knowledge is often required to set its value properly. The traditional approach to this model selection problem is to apply methods like cross validation to determine the best choice among a number of prespecified hyperparameter values. Extensive explorati...[
Read more ]
One of the fundamental problems in statistical machine learning is the optimization problem under the generic regularization framework. The objective function of the optimization problem is comprised of two parts, i.e., the loss and the regularizer. Besides the parameters we aim to optimize, there are some other hyperparameters in the formulation, e.g., the regularization hyperparameter and the kernel hyperparameter, whose values have to be specified in advance by the user. The optimal hyperparameter value is data dependent, and prior knowledge is often required to set its value properly. The traditional approach to this model selection problem is to apply methods like cross validation to determine the best choice among a number of prespecified hyperparameter values. Extensive exploration, such as performing line search for one hyperparameter or grid search for two hyperparameters, is usually used. However, this requires training the model multiple times with different hyperparameter values and hence is computationally prohibitive especially when the number of candidate values is very large.
Solution path algorithms provide a new approach to the model selection problem. Such algorithms attempt to calculate every solution for each possible hyperparameter value without having to re-train the model multiple times. Since it can rapidly generate the full path of solutions, the optimal value can be found with a low extra computational cost through estimating the generalization errors under different hyperparameter values. For a large family of statistical learning models, the path of solutions extends in a piecewise linear manner. Hence, it is efficient to explore the entire solution path by monitoring the breakpoints only. However, in some problems, the solution paths are nonlinear with respect to a hyperparameter and it is challenging to have an efficient approach to trace the nonlinear solution path.
In this thesis, we investigate this new class of algorithms. We provide a general method that investigates how to analyze the solution space of learning models and how to derive their path-following algorithms. Based on this method, we develop solution path algorithms for many learning models. These algorithms show many advantages over previous problem-solving approaches. Experimental results demonstrate that solution path algorithms provide an efficient and effective approach to model selection.
Post a Comment