Bibliographic record and links to related information available from the Library of Congress catalog.
Note: Contents data are machine generated based on pre-publication provided by the publisher. Contents may have variations from the printed book or be incomplete or contain other coding.
Contents Preface v 1 Support Vector Machine Solvers 1 L´eon Bottou, Chih-Jen Lin 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Early SVM Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.6 The Decomposition Method . . . . . . . . . . . . . . . . . . . . . . . 16 1.7 A Case Study: LIBSVM . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.8 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Training a Support Vector Machine in the Primal 29 Olivier Chapelle 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Links between primal and dual optimization . . . . . . . . . . . . . . 30 2.3 Primal objective function . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Newton optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.6 Advantages of primal optimization . . . . . . . . . . . . . . . . . . . 44 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3 Fast Kernel Learning with Sparse Inverted Index 51 Patrick Haffner, Stephan Kanthak 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Sequential Kernel Learning . . . . . . . . . . . . . . . . . . . . . . . 52 3.3 Sparse Matrix-Vector Multiplication . . . . . . . . . . . . . . . . . . 55 3.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Speeding up SVM training . . . . . . . . . . . . . . . . . . . . . . . . 65 3.6 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.7 Large-Scale Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4 Large Scale Learning with String Kernels 73 ii CONTENTS S¿oren Sonnenburg, Gunnar R¿atsch, Konrad Rieck 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 String Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Sparse Feature Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.4 Speeding up SVM Training and Testing . . . . . . . . . . . . . . . . 83 4.5 Benchmark Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5 Large Scale Parallel SVM Implementation 105 Igor Durdanovic, Eric Cosatto, Hans-Peter Graf 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 Accelerating SVMs: Previous approaches . . . . . . . . . . . . . . . . 107 5.3 The Sequential Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 Spread-Kernel Optimization (full data) (SKOFD) . . . . . . . . . . . 118 5.5 Spread-Kernel Optimization (split data) (SKOSD) . . . . . . . . . . 121 5.6 Networking Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.7 Theoretical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6 A Distributed Sequential Solver for Large Scale SVMs 141 Elad Yom-Tov 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 A distributed solution of the SVM problem using a sequential solver 146 6.3 Metrics for comparing SVMs . . . . . . . . . . . . . . . . . . . . . . 147 6.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.5 A model of the computational time . . . . . . . . . . . . . . . . . . . 153 6.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7 Newton Methods for Fast Semi-supervised Linear SVMs 157 Vikas Sindhwani, S. Sathiya Keerthi 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.2 Modified Finite Newton Linear l2-SVM . . . . . . . . . . . . . . . . 159 7.3 Fast Multi-switch Transductive SVMs . . . . . . . . . . . . . . . . . 163 7.4 Semi-supervised SVMs based on Deterministic Annealing . . . . . . 166 7.5 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8 The Improved Fast Gauss Transform with Applications to Machine Learning 177 Vikas Chandrakant Raykar, Ramani Duraiswami 8.1 Computational curse of non-parametric methods . . . . . . . . . . . 177 8.2 Bottleneck computational primitive¿Weighted superposition of kernels178 8.3 Structured matrices and ?-exact approximation . . . . . . . . . . . . 179 CONTENTS iii 8.4 Motivating example¿polynomial kernel . . . . . . . . . . . . . . . . . 180 8.5 Sum of Gaussian kernels¿the discrete Gauss transform . . . . . . . . 181 8.6 Bringing computational tractability to the discrete Gauss transform 182 8.7 Multi-index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 8.8 The improved fast Gauss transform . . . . . . . . . . . . . . . . . . . 187 8.9 IFGT vs FGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.10 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 193 8.11 Fast multivariate kernel density estimation . . . . . . . . . . . . . . 198 8.12 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9 Approximation Methods for Gaussian Process Regression 205 Joaquin Qui¿nonero-Candela, Carl Edward Rasmussen, Christopher K. I. Williams 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.2 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . 206 9.3 Sparse Approximations Based on Inducing Variables . . . . . . . . . 208 9.4 Fast Matrix Vector Multiplication (MVM) Approximations . . . . . 219 9.5 Selecting the Inducing Variables . . . . . . . . . . . . . . . . . . . . 220 9.6 Approximate Evidence and Hyperparameter Learning . . . . . . . . 222 9.7 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 9.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 10 Brisk Kernel ICA 227 Stefanie Jegelka, Arthur Gretton 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 10.2 Independent component analysis . . . . . . . . . . . . . . . . . . . . 229 10.3 Independence measures based on RKHS covariance operators . . . . 230 10.4 Gradient descent on the orthogonal group . . . . . . . . . . . . . . . 234 10.5 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 10.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 10.7 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . 250 11 Building SVMs with Reduced Classifier Complexity 253 S. Sathiya Keerthi, Olivier Chapelle, Dennis DeCoste 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 11.2 The Basic optimization . . . . . . . . . . . . . . . . . . . . . . . . . 257 11.3 Selection of new basis element . . . . . . . . . . . . . . . . . . . . . . 259 11.4 Hyperparameter tuning . . . . . . . . . . . . . . . . . . . . . . . . . 265 11.5 Comparison with Kernel Matching Pursuit . . . . . . . . . . . . . . 266 11.6 Additional tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 11.7 Comparison with standard SVM training . . . . . . . . . . . . . . . 272 11.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 12 Trading Convexity for Scalability 277 Ronan Collobert, Fabian Sinz, Jason Weston, L´eon Bottou iv CONTENTS 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 12.2 The Concave-Convex Procedure . . . . . . . . . . . . . . . . . . . . . 278 12.3 Non-Convex SVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 12.4 Experiments with Non-Convex SVMs . . . . . . . . . . . . . . . . . . 283 12.5 Non-Convex Transductive SVMs . . . . . . . . . . . . . . . . . . . . 287 12.6 Experiments with TSVMs . . . . . . . . . . . . . . . . . . . . . . . . 294 12.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 13 Training Invariant SVMs using Selective Sampling 303 Ga¿elle Loosli, L´eon Bottou, St´ephane Canu 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 13.2 Online algorithm with selective sampling . . . . . . . . . . . . . . . . 306 13.3 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 13.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 13.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 14 Scaling Learning Algorithms towards AI 323 Yoshua Bengio, Yann LeCun 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 14.2 Learning Models Towards AI . . . . . . . . . . . . . . . . . . . . . . 326 14.3 Learning Architectures, Shallow and Deep . . . . . . . . . . . . . . . 331 14.4 Fundamental Limitation of Local Learning . . . . . . . . . . . . . . . 339 14.5 Deep Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 14.6 Experiments with Visual Pattern Recognition . . . . . . . . . . . . . 350 14.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 References 363 Contributors 389 Index 393
Library of Congress Subject Headings for this publication:
Data structures (Computer science).
Machine learning.