Hyper-parameter and kernel optimization in support vector machines.
Thesis DisciplineComputer Science
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
Support vector machines (SVMs) are capable of producing high quality solutions for many types of real-world classification problems. SVMs classify the data by selecting hyperplanes that provide the maximum margin between classes. Intuitively, the farther a data point from the decision boundary (hyperplane), the lower the chances of making a false prediction. A fundamental theorem of SVMs proves that the generalization error rates of SVMs are independent of the dimensionality of the input space and they only depend on the ratio of the margin and the radius of spheres which contain all the data. The expressive power of SVMs can be extended by mapping input data into high dimensional spaces. In the high dimensional spaces, the computational complexity of SVMs does not increase much, because all of the computation is done through the kernels—“the kernel trick”. The maximal margin principle and the kernel trick make SVMs useful and efficient computational tools in machine learning.
While the parameters (support vectors) of SVMs can be found efficiently, the choice of kernels and optimal values of hyper-parameters within the kernels remains an open problem. This thesis addresses the issues of finding the optimal hyper-parameters and constructing the optimal kernel. Since the search spaces of these problems are very large, and often lack properties such as convexity that can be exploited, the thesis tackles the problem by proposing a set of tools that are based on (or influenced by) meta-heuristic algorithms and evolutionary computation.
For the hyper-parameter optimization, we develop a surface estimation method using the B´ezier curve method. Firstly we determine the search region and choose samples on a regular grid in the region. Then we construct a surface which interpolates the prediction accuracy on sampling points and detects the most relevant region for a finer second search. The proposed method makes the search process more efficient and automatic, and it can be applied to multi-dimensional spaces.
The optimization of hyper-parameters can be further improved by making the detection of the initial search region automatic and by making the search process within this region more efficient. We achieve this goal by taking an approach based on evolutionary computation. Particle Swarm Optimization (PSO) is an algorithm inspired by the social behaviour such as in a bird flock and fish school. We propose a unified approach for discretization of continuous PSO. Currently binary PSO and continuous PSO are two separate subjects, and most of the theories developed for continuous PSO can not be applied to binary PSO. We propose a general model to combine these two subjects and derive several binary PSO models and discrete PSO models (where each dimension can have more than two values) from existing continuous PSO models.
We apply the proposed discrete PSO to the hyper-parameter search in SVMs. In order to further increase the efficiency, we introduce an adaptive calibration of the evaluation points. We also present a simple algorithm to automatically adjust the search region using PSO. In the experiments, we show that the proposed discrete PSO is more efficient than surface estimation methods.
For constructing the optimal kernel, we propose a new multiple kernel learning (MKL) method which minimizes the ratio of the radius and the margin rather than just maximizing the margin. The proposed method has several favorable properties. It does not need to tune additional parameters and returns high quality solutions comparable to the best solutions of traditional MKL methods with fine tuned parameters.
The application of MKL does not always improve the prediction accuracy of simple SVMs with fine-tuned parameters. To improve the prediction accuracy of SVMs, we construct optimal kernels from possible nonlinear combinations of kernels and subsets of features. We propose a system of constructing optimal kernels using Genetic Programming and binary PSO. In the experiments, we show that the proposed method drastically improves the prediction accuracy over SVMs with fine-tuned parameters.
It is found that for both tasks of hyper-parameter optimization and optimal kernel construction, the proposed methods solve the difficulties imposed by standard methods in SVMs. This is achieved by combining the standard methods with meta-heuristic algorithms which improves the performance in terms of accuracy and efficiency.