Bibliographic record and links to related information available from the Library of Congress catalog.

Note: Contents data are machine generated based on pre-publication provided by the publisher. Contents may have variations from the printed book or be incomplete or contain other coding.

```CONTENTS
Preface to the Second Edition xiii
Preface to the First Edition xv
List of Examples xx
1 GENERAL INTRODUCTION 1
1.1 Introduction 1
1.2 Maximum Likelihood Estimation 3
1.3 Newton-Type Methods 5
1.3.1 Introduction 5
1.3.2 Newton-Raphson Method 5
1.3.3 Quasi-Newton Methods 6
1.3.4 Modified Newton Methods 6
1.4 Introductory Examples 8
1.4.1 Introduction 8
1.4.2 Example 1.1: A Multinomial Example 8
iii
iv CONTENTS
1.4.3 Example 1.2: Estimation of Mixing Proportions 13
1.5 Formulation of the EM Algorithm 18
1.5.1 EM Algorithm 18
1.5.2 Example 1.3: Censored Exponentially Distributed Survival
Times 20
1.5.3 E- and M-Steps for the Regular Exponential Family 22
1.5.4 Example 1.4: Censored Exponentially Distributed Survival
Times (Example 1.3 Continued) 23
1.5.5 Generalized EM Algorithm 24
1.5.6 GEM Algorithm Based on One Newton-Raphson Step 24
1.5.8 EM Mapping 26
1.6 EM Algorithm for MAP and MPL Estimation 26
1.6.1 Maximum a Posteriori Estimation 26
1.6.2 Example 1.5: A Multinomial Example (Example 1.1
Continued) 27
1.6.3 Maximum Penalized Estimation 27
1.7 Brief Summary of the Properties of the EM Algorithm 28
1.8 History of the EM Algorithm 29
1.8.1 Early EM History 29
1.8.2 Work Before Dempster, Laird, and Rubin (1977) 29
1.8.3 EM Examples and Applications Since Dempster, Laird, and
Rubin (1977) 31
1.8.4 Two Interpretations of EM 32
1.8.5 Developments in EM Theory, Methodology, and Applications
33
1.9 Overview of the Book 36
1.10 Notations 37
2 EXAMPLES OF THE EM ALGORITHM 41
2.1 Introduction 41
2.2 Multivariate Data with Missing Values 42
2.2.1 Example 2.1: Bivariate Normal Data with Missing Values 42
2.2.2 Numerical Illustration 45
2.2.3 Multivariate Data: Buck?s Method 45
2.3 Least Squares with Missing Data 47
2.3.1 Healy?Westmacott Procedure 47
CONTENTS v
2.3.2 Example 2.2: Linear Regression with Missing Dependent
Values 47
2.3.3 Example 2.3: Missing Values in a Latin Square Design 49
2.3.4 Healy?Westmacott Procedure as an EM Algorithm 49
2.4 Example 2.4: Multinomial with Complex Cell Structure 51
2.5 Example 2.5: Analysis of PET and SPECT Data 53
2.6 Example 2.6: Multivariate ??-Distribution (Known D.F.) 57
2.6.1 ML Estimation of Multivariate ??-Distribution 57
2.6.2 Numerical Example: Stack Loss Data 60
2.7 Finite Normal Mixtures 60
2.7.1 Example 2.7: Univariate Component Densities 60
2.7.2 Example 2.8: Multivariate Component Densities 63
2.7.3 Numerical Example: Red Blood Cell Volume Data 64
2.8 Example 2.9: Grouped and Truncated Data 65
2.8.1 Introduction 65
2.8.2 Specification of Complete Data 65
2.8.3 E-Step 68
2.8.4 M-Step 69
2.8.5 Confirmation of Incomplete-Data Score Statistic 69
2.8.6 M-Step for Grouped Normal Data 70
2.8.7 Numerical Example: Grouped Log Normal Data 71
2.9 Example 2.10: A Hidden Markov AR(1) model 72
2.10 Example 2.11: Non-Applicability of the EM Algorithm 74
3 BASIC THEORY OF THE EM ALGORITHM 77
3.1 Introduction 77
3.2 Monotonicity of the EM Algorithm 78
3.3 Monotonicity of a Generalized EM Algorithm 79
3.4 Convergence of an EM Sequence to a Stationary Value 79
3.4.1 Introduction 79
3.4.2 Regularity Conditions of Wu (1983) 80
3.4.3 Main ConvergenceTheorem for a GeneralizedEMSequence 81
3.4.4 A Convergence Theorem for an EM Sequence 82
3.5 Convergence of an EM Sequence of Iterates 83
3.5.1 Introduction 83
3.5.2 Two Convergence Theorems of Wu (1983) 83
vi CONTENTS
3.5.3 Convergence of an EM Sequence to a Unique Maximum
Likelihood Estimate 84
3.5.4 Constrained Parameter Spaces 84
3.6 Examples of Nontypical Behavior of an EM (GEM) Sequence 85
3.6.1 Example 3.1: Convergence to a Saddle Point 85
3.6.2 Example 3.2: Convergence to a Local Minimum 88
3.6.3 Example 3.3: Nonconvergence of a Generalized EM
Sequence 90
3.7 Score Statistic 93
3.8 Missing Information 93
3.8.1 Missing Information Principle 93
3.8.2 Example 3.4: (Example 1.3 Continued) 94
3.9 Rate of Convergence of the EM Algorithm 97
3.9.1 Rate Matrix for Linear Convergence 97
3.9.2 Measuring the Linear Rate of Convergence 98
3.9.3 Rate Matrix in Terms of Information Matrices 99
3.9.4 Rate Matrix for Maximum a Posteriori Estimation 100
3.9.5 Derivation of Rate Matrix in terms of Information Matrices 100
3.9.6 Example 3.5: (Example 3.3 Continued) 101
4 STANDARD ERRORS AND SPEEDING UP CONVERGENCE 103
4.1 Introduction 103
4.2 Observed Information Matrix 104
4.2.1 Direct Evaluation 104
4.2.2 Extraction of Observed Information Matrix in Terms of the
Complete-Data Log Likelihood 104
4.2.3 Regular Case 106
4.2.4 Evaluation of the Conditional Expected Complete-Data
Information Matrix 106
4.2.5 Examples 107
4.3 Approximations to Observed Information Matrix: i.i.d. Case 112
4.4 Observed Information Matrix for Grouped Data 114
4.4.1 Approximation Based on Empirical Information 114
4.4.2 Example 4.3: Grouped Data from an Exponential
Distribution 115
4.5 Supplemented EM Algorithm 118
4.5.1 Definition 118
CONTENTS vii
4.5.2 Calculation of ________via Numerical Differentiation 120
4.5.3 Stability 121
4.5.4 Monitoring Convergence 122
4.5.5 Difficulties of the SEM Algorithm 123
4.5.6 Example 4.4: Univariate Contaminated Normal Data 123
4.5.7 Example 4.5: Bivariate Normal Data with Missing Values 126
4.6 Bootstrap Approach to Standard Error Approximation 128
4.7 Baker?s, Louis?, and Oakes? Methods for Standard Error Computation 130
4.7.1 Baker?s Method for Standard Error Computation 130
4.7.2 Louis? Method of Standard Error Computation 130
4.7.3 Oakes? Formula for Standard Error Computation 131
4.7.4 Example 4.6: Oakes? Standard Error for Example 1.1 132
4.7.5 Example 4.7: Oakes? Standard Error for Example 2.4 132
4.7.6 Example 4.8: Louis? Method for Example 1.1 133
4.7.7 Baker?s Method for Standard Error for Categorical Data 133
4.7.8 Example 4.9: Baker?s Method for Example 2.4 134
4.8 Acceleration of the EM Algorithm via Aitken?s Method 135
4.8.1 Aitken?s Acceleration Method 135
4.8.2 Louis? Method 136
4.8.3 Example 4.10: Multinomial Data 137
4.8.4 Example 4.11: Geometric Mixture 137
4.8.5 Example 4.12: Grouped and Truncated Data. (Example 2.8
Continued) 141
4.9 An Aitken Acceleration-Based Stopping Criterion 142
4.10 Conjugate Gradient Acceleration of EM Algorithm 143
4.10.2 A Generalized Conjugate Gradient Algorithm 143
4.10.3 Accelerating the EM Algorithm 144
4.11 Hybrid Methods for Finding the MLE 144
4.11.1 Introduction 144
4.11.2 Combined EM and Modified Newton-Raphson Algorithm 145
4.12 A GEM Algorithm Based on One Newton-Raphson Step 146
4.12.1 Derivation of a Condition to be a Generalized EM Sequence 146
4.12.2 Simulation Experiment 148
4.14 A Quasi-Newton Acceleration of the EM Algorithm 150
4.14.1 The Method 150
viii CONTENTS
4.14.2 Example 4.13: Dirichlet Distribution 152
4.15 Ikeda Acceleration 156
5 EXTENSIONS OF THE EM ALGORITHM 157
5.1 Introduction 157
5.2 ECM Algorithm 158
5.2.1 Motivation 158
5.2.2 Formal Definition 158
5.2.3 Convergence Properties 160
5.2.4 Speed of Convergence 160
5.2.5 Convergence Rates of EM and ECM 161
5.2.6 Example 5.1: ECM Algorithm for Hidden Markov AR(1)
Model 162
5.2.7 Discussion 162
5.3 Multicycle ECM Algorithm 163
5.4 Example 5.2: Normal Mixtures with Equal Correlations 164
5.4.1 Normal Components with Equal Correlations 164
5.4.2 Application of ECM Algorithm 164
5.4.3 Fisher?s Iris Data 166
5.5 Example 5.3: Mixture Models for Survival Data 166
5.5.1 Competing Risks in Survival Analysis 166
5.5.2 A Two-Component Mixture Regression Model 167
5.5.3 Observed Data 167
5.5.4 Application of EM Algorithm 168
5.5.5 M-Step for Gompertz Components 169
5.5.6 Application of a Multicycle ECM Algorithm 170
5.5.7 Other Examples of EM Algorithm in Survival Analysis 171
5.6 Example 5.4: Contingency Tables with Incomplete Data 172
5.7 ECME Algorithm 173
5.8 Example 5.5: MLE of ??-Distribution with Unknown D.F. 174
5.8.1 Application of the EM Algorithm 174
5.8.2 M-Step 175
5.8.3 Application of ECM Algorithm 175
5.8.4 Application of ECME Algorithm 176
5.8.5 Some Standard Results 176
5.8.6 Missing Data 177
5.8.7 Numerical Examples 179
CONTENTS ix
5.8.8 Theoretical Results on the Rate of Convergence 179
5.9 Example 5.6: Variance Components 180
5.9.1 A Variance Components Model 180
5.9.2 E-Step 181
5.9.3 M-Step 182
5.9.4 Application of Two Versions of ECME Algorithm 183
5.9.5 Numerical Example 183
5.10 Linear Mixed Models 184
5.10.1 Introduction 184
5.10.2 General Form of Linear Mixed Model 185
5.10.3 REML Estimation 186
5.10.4 Example 5.7: REML Estimation in a Hierarchical Random
Effects Model 187
5.11 Example 5.8: Factor Analysis 189
5.11.1 ECM Algorithm for Factor Analysis 189
5.11.2 Numerical Example 191
5.11.3 EM Algorithm in Principal Component Analysis 191
5.12 Efficient Data Augmentation 193
5.12.1 Motivation 193
5.12.2 Maximum Likelihood Estimation of ??-Distribution 193
5.12.3 Variance Components Model 197
5.13 Alternating ECM Algorithm 197
5.14 Parameter-Expanded EM (PX-EM) Algorithm 199
5.15 EMS Algorithm 200
5.16 One-Step-Late Algorithm 200
5.17 Variance Estimation for Penalized EM and OSL Algorithms 201
5.17.1 Penalized EM Algorithm 201
5.17.2 OSL Algorithm 202
5.17.3 Example 5.9: Variance of MPLE for the Multinomial (Examples
1.1 and 4.1 Continued) 202
5.18 Incremental EM 203
5.19 Linear Inverse Problems 204
6 MONTE CARLO VERSIONS OF THE EM ALGORITHM 207
6.1 Introduction 207
6.2 Monte Carlo Techniques 208
6.2.1 Integration and Optimization 208
x CONTENTS
6.2.2 Importance Sampling 209
6.2.3 Example 6.1: Monte Carlo Integration 210
6.3 Monte Carlo EM 212
6.3.1 Introduction 212
6.3.2 Example 6.2: Monte Carlo EM for Censored Data from
Normal 213
6.3.3 Example 6.3: MCEM for a Two-Parameter Multinomial
(Example 2.4 Continued) 214
6.3.4 MCEM in Generalized Linear Mixed Models 214
6.3.5 Oakes? Standard Error for MCEM 215
6.3.6 Example 6.4: Oakes? Standard Error for One-Parameter
Multinomial (Example 1.1 Continued) 216
6.3.7 Stochastic EM Algorithm 216
6.4 Data Augmentation 218
6.4.1 The Algorithm 218
6.4.2 Example 6.5: Data Augmentation in the Multinomial (Examples
1.1, 1.5 Continued) 219
6.5 Bayesian EM 220
6.5.1 Posterior Mode by EM 220
6.5.2 Example 6.6: Bayesian EM for Normal with Semi-Conjugate
Prior 221
6.6 I.I.D. Monte Carlo Algorithms 222
6.6.1 Introduction 222
6.6.2 Rejection Sampling Methods 223
6.7 Metropolis?Hastings (M?H) Algorithms 224
6.7.1 Introduction 224
6.7.2 Independent Metropolis?Hastings Algorithm 225
6.7.3 RandomWalk Metropolis?Hastings Algorithm 225
6.7.4 Example 6.7: M-H Algorithm for Bayesian Probit
Regression 226
6.7.5 Hybrid Metropolis?Hastings Algorithm 228
6.7.6 Monte Carlo EM with MCMC 228
6.8 Gibbs Sampling 230
6.8.1 The Algorithm 230
6.8.2 Example 6.8: Gibbs Sampling for the Mixture Problem 231
6.8.3 Rao?Blackwellized Estimates with Gibbs Samples 232
6.8.4 Example 6.9: Gibbs Sampling for Bivariate Normal 233
CONTENTS xi
6.8.5 Example 6.10: Bayesian Probit Analysis with Data
Augmentation 235
6.8.6 Example 6.11: Gibbs Sampling for Censored Normal 237
6.9 EM?s Relations to Gibbs Sampling 238
6.9.1 EM?Gibbs Sampling Connection 238
6.9.2 Example 6.12: EM?Gibbs Connection for Censored Data from
Normal (Example 6.10 Continued) 241
6.9.3 Example 6.13: EM?Gibbs Connection for Normal Mixtures 241
6.9.4 Rate of Convergence of Gibbs Sampling and EM 242
6.10 Data Augmentation and Gibbs Sampling 242
6.10.1 Introduction 242
6.10.2 Example 6.14: Data Augmentation and Gibbs Sampling for
Censored Normal (Example 6.10 Continued) 243
6.10.3 Example 6.15: Gibbs Sampling for a Complex Multinomial
(Example 2.4 Continued) 244
6.10.4 Gibbs Sampling Analogs of ECM and ECME Algorithms 245
6.11 Empirical Bayes and EM 248
6.12 Multiple Imputation 249
6.13 Missing-Data Mechanism, Ignorability and EM Algorithm 249
7 SOME GENERALIZATIONS OF THE EM ALGORITHM 253
7.1 Introduction 253
7.2 Introduction to Estimating Functions 254
7.3 Quasi-Likelihood, Quasi-Score and an EM Generalization 255
7.3.1 Introduction 255
7.3.2 Example 7.1: Quasi-Score for an Incomplete-Data Problem 256
7.3.3 Example 7.2: Expectation-Solution (ES) Algorithm for the
Multinomial Example (Example 1.1 Continued) 256
7.3.4 Computational and Asymptotic Properties of the ES
Algorithm 257
7.3.5 Example 7.3: Covariance Matrix for Multinomial Example by
ES Algorithm (Example 1.1 Continued) 258
7.4 Variational Bayesian EM Algorithm 259
7.5 MM Algorithm 260
7.5.1 Introduction 260
7.5.2 Methods for Constructing Majorizing/Minorizing Functions 262
7.5.3 Example 7.4: MM Algorithm for the Complex Multinomial
(Example 1.1 Continued) 263
xii CONTENTS
7.6 Lower Bound Maximization 264
7.7 Interval EM Algorithm 265
7.7.1 The Algorithm 265
7.7.2 Example 7.5: Interval-EM Algorithm for the Complex
Multinomial (Example 2.4 Continued) 266
7.8 Competing Methods and Some Comparisons with EM 266
7.8.1 Introduction 266
7.8.2 Simulated Annealing 266
7.8.3 Comparison of SA and EM Algorithm for Normal Mixtures 267
7.9 The Delta Algorithm 268
7.10 Image Space Reconstruction Algorithm 269
8 FURTHER APPLICATIONS OF THE EM ALGORITHM 271
8.1 Introduction 271
8.2 Hidden Markov Models 272
8.3 AIDS Epidemiology 275
8.4 Neural Networks 277
8.4.1 Introduction 277
8.4.2 Boltzmann Machine 277
8.4.3 Mixture of Experts 278
8.5 Data Mining 280
8.6 Bioinformatics 280
REFERENCES 281
Author Index 303
Index 304
Subject Index 309
Index 310
```

Library of Congress Subject Headings for this publication:

Expectation-maximization algorithms.
Estimation theory.
Missing observations (Statistics).