Bibliographic record and links to related information available from the Library of Congress catalog
Information from electronic data provided by the publisher. May be incomplete or contain other coding.
Preface. Notation. 1 Introduction to statistical pattern recognition. 1.1 Statistical pattern recognition. 1.1.1 Introduction. 1.1.2 The basic model. 1.2 Stages in a pattern recognition problem. 1.3 Issues. 1.4 Supervised versus unsupervised. 1.5 Approaches to statistical pattern recognition. 1.5.1 Elementary decision theory. 1.5.2 Discriminant functions. 1.6 Multiple regression. 1.7 Outline of book. 1.8 Notes and references. Exercises. 2 Density estimation - parametric. 2.1 Introduction. 2.2 Normal-based models. 2.2.1 Linear and quadratic discriminant functions. 2.2.2 Regularised discriminant analysis. 2.2.3 Example application study. 2.2.4 Further developments. 2.2.5 Summary. 2.3 Normal mixture models. 2.3.1 Maximum likelihood estimation via EM. 2.3.2 Mixture models for discrimination. 2.3.3 How many components? 2.3.4 Example application study. 2.3.5 Further developments. 2.3.6 Summary. 2.4 Bayesian estimates. 2.4.1 Bayesian learning methods. 2.4.2 Markov chain Monte Carlo. 2.4.3 Bayesian approaches to discrimination. 2.4.4 Example application study. 2.4.5 Further developments. 2.4.6 Summary. 2.5 Application studies. 2.6 Summary and discussion. 2.7 Recommendations. 2.8 Notes and references. Exercises. 3 Density estimation - nonparametric. 3.1 Introduction. 3.2 Histogram method. 3.2.1 Data-adaptive histograms. 3.2.2 Independence assumption 3.2.3 Lancaster models. 3.2.4 Maximum weight dependence trees. 3.2.5 Bayesian networks. 3.2.6 Example application study. 3.2.7 Further developments. 3.2.8 Summary. 3.3 k-nearest-neighbour method. 3.3.1 k-nearest-neighbour decision rule. 3.3.2 Properties of the nearest-neighbour rule. 3.3.3 Algorithms. 3.3.4 Editing techniques. 3.3.5 Choice of distance metric. 3.3.6 Example application study. 3.3.7 Further developments. 3.3.8 Summary. 3.4 Expansion by basis functions. 3.5 Kernel methods. 3.5.1 Choice of smoothing parameter. 3.5.2 Choice of kernel. 3.5.3 Example application study. 3.5.4 Further developments. 3.5.5 Summary. 3.6 Application studies. 3.7 Summary and discussion. 3.8 Recommendations. 3.9 Notes and references. Exercises. 4 Linear discriminant analysis. 4.1 Introduction. 4.2 Two-class algorithms. 4.2.1 General ideas. 4.2.2 Perceptron criterion. 4.2.3 Fisher's criterion. 4.2.4 Least mean squared error procedures. 4.2.5 Support vector machines. 4.2.6 Example application study. 4.2.7 Further developments. 4.2.8 Summary. 4.3 Multiclass algorithms. 4.3.1 General ideas. 4.3.2 Error-correction procedure. 4.3.3 Fisher's criterion - linear discriminant analysis. 4.3.4 Least mean squared error procedures. 4.3.5 Optimal scaling. 4.3.6 Regularisation. 4.3.7 Multiclass support vector machines. 4.3.8 Example application study. 4.3.9 Further developments. 4.3.10 Summary. 4.4 Logistic discrimination. 4.4.1 Two-group case. 4.4.2 Maximum likelihood estimation. 4.4.3 Multiclass logistic discrimination. 4.4.4 Example application study. 4.4.5 Further developments. 4.4.6 Summary. 4.5 Application studies. 4.6 Summary and discussion. 4.7 Recommendations. 4.8 Notes and references. Exercises. 5 Nonlinear discriminant analysis - kernel methods. 5.1 Introduction. 5.2 Optimisation criteria. 5.2.1 Least squares error measure. 5.2.2 Maximum likelihood. 5.2.3 Entropy. 5.3 Radial basis functions. 5.3.1 Introduction. 5.3.2 Motivation. 5.3.3 Specifying the model. 5.3.4 Radial basis function properties. 5.3.5 Simple radial basis function. 5.3.6 Example application study. 5.3.7 Further developments. 5.3.8 Summary. 5.4 Nonlinear support vector machines. 5.4.1 Types of kernel. 5.4.2 Model selection. 5.4.3 Support vector machines for regression. 5.4.4 Example application study. 5.4.5 Further developments. 5.4.6 Summary. 5.5 Application studies. 5.6 Summary and discussion. 5.7 Recommendations. 5.8 Notes and references. Exercises. 6 Nonlinear discriminant analysis - projection methods. 6.1 Introduction. 6.2 The multilayer perceptron. 6.2.1 Introduction. 6.2.2 Specifying the multilayer perceptron structure. 6.2.3 Determining the multilayer perceptron weights. 6.2.4 Properties. 6.2.5 Example application study. 6.2.6 Further developments. 6.2.7 Summary. 6.3 Projection pursuit. 6.3.1 Introduction. 6.3.2 Projection pursuit for discrimination. 6.3.3 Example application study. 6.3.4 Further developments. 6.3.5 Summary. 6.4 Application studies. 6.5 Summary and discussion. 6.6 Recommendations. 6.7 Notes and references. Exercises. 7 Tree-based methods. 7.1 Introduction. 7.2 Classification trees. 7.2.1 Introduction. 7.2.2 Classifier tree construction. 7.2.3 Other issues. 7.2.4 Example application study. 7.2.5 Further developments. 7.2.6 Summary. 7.3 Multivariate adaptive regression splines. 7.3.1 Introduction. 7.3.2 Recursive partitioning model. 7.3.3 Example application study. 7.3.4 Further developments. 7.3.5 Summary. 7.4 Application studies. 7.5 Summary and discussion. 7.6 Recommendations. 7.7 Notes and references. Exercises. 8 Performance. 8.1 Introduction. 8.2 Performance assessment. 8.2.1 Discriminability. 8.2.2 Reliability. 8.2.3 ROC curves for two-class rules. 8.2.4 Example application study. 8.2.5 Further developments. 8.2.6 Summary. 8.3 Comparing classifier performance. 8.3.1 Which technique is best? 8.3.2 Statistical tests. 8.3.3 Comparing rules when misclassification costs are uncertain 8.3.4 Example application study. 8.3.5 Further developments. 8.3.6 Summary. 8.4 Combining classifiers. 8.4.1 Introduction. 8.4.2 Motivation. 8.4.3 Characteristics of a combination scheme. 8.4.4 Data fusion. 8.4.5 Classifier combination methods. 8.4.6 Example application study. 8.4.7 Further developments. 8.4.8 Summary. 8.5 Application studies. 8.6 Summary and discussion. 8.7 Recommendations. 8.8 Notes and references. Exercises. 9 Feature selection and extraction. 9.1 Introduction. 9.2 Feature selection. 9.2.1 Feature selection criteria. 9.2.2 Search algorithms for feature selection. 9.2.3 Suboptimal search algorithms. 9.2.4 Example application study. 9.2.5 Further developments. 9.2.6 Summary. 9.3 Linear feature extraction. 9.3.1 Principal components analysis. 9.3.2 Karhunen-Loeve transformation. 9.3.3 Factor analysis. 9.3.4 Example application study. 9.3.5 Further developments. 9.3.6 Summary. 9.4 Multidimensional scaling. 9.4.1 Classical scaling. 9.4.2 Metric multidimensional scaling. 9.4.3 Ordinal scaling. 9.4.4 Algorithms. 9.4.5 Multidimensional scaling for feature extraction. 9.4.6 Example application study. 9.4.7 Further developments. 9.4.8 Summary. 9.5 Application studies. 9.6 Summary and discussion. 9.7 Recommendations. 9.8 Notes and references. Exercises. 10 Clustering. 10.1 Introduction. 10.2 Hierarchical methods. 10.2.1 Single-link method. 10.2.2 Complete-link method. 10.2.3 Sum-of-squares method. 10.2.4 General agglomerative algorithm. 10.2.5 Properties of a hierarchical classification. 10.2.6 Example application study. 10.2.7 Summary. 10.3 Quick partitions. 10.4 Mixture models. 10.4.1 Model description. 10.4.2 Example application study. 10.5 Sum-of-squares methods. 10.5.1 Clustering criteria. 10.5.2 Clustering algorithms. 10.5.3 Vector quantisation. 10.5.4 Example application study. 10.5.5 Further developments. 10.5.6 Summary. 10.6 Cluster validity. 10.6.1 Introduction. 10.6.2 Distortion measures. 10.6.3 Choosing the number of clusters. 10.6.4 Identifying genuine clusters. 10.7 Application studies. 10.8 Summary and discussion. 10.9 Recommendations. 10.10 Notes and references. Exercises. 11 Additional topics. 11.1 Model selection. 11.1.1 Separate training and test sets. 11.1.2 Cross-validation. 11.1.3 The Bayesian viewpoint. 11.1.4 Akaike's information criterion. 11.2 Learning with unreliable classification. 11.3 Missing data. 11.4 Outlier detection and robust procedures. 11.5 Mixed continuous and discrete variables. 11.6 Structural risk minimisation and the Vapnik-Chervonenkis dimension. 11.6.1 Bounds on the expected risk. 11.6.2 The Vapnik-Chervonenkis dimension. A Measures of dissimilarity. A.1 Measures of dissimilarity. A.1.1 Numeric variables. A.1.2 Nominal and ordinal variables. A.1.3 Binary variables. A.1.4 Summary. A.2 Distances between distributions. A.2.1 Methods based on prototype vectors. A.2.2 Methods based on probabilistic distance. A.2.3 Probabilistic dependence. A.3 Discussion. B Parameter estimation. B.1 Parameter estimation. B.1.1 Properties of estimators. B.1.2 Maximum likelihood. B.1.3 Problems with maximum likelihood. B.1.4 Bayesian estimates. C Linear algebra. C.1 Basic properties and definitions. C.2 Notes and references. D Data. D.1 Introduction. D.2 Formulating the problem. D.3 Data collection. D.4 Initial examination of data. D.5 Data sets. D.6 Notes and references. E Probability theory. E.1 Definitions and terminology. E.2 Normal distribution. E.3 Probability distributions. References. Index.