Table of contents for Computational molecular evolution / Ziheng Yang.


Bibliographic record and links to related information available from the Library of Congress catalog
Note: Electronic data is machine generated. May be incomplete or contain other coding.


Counter
PART I:   MODELLING MOLECULAR EVOLUTION                  I
Models of nucleotide substitution                            :
1     Introdliction                                                 3
1.2   Markov models of nucleotide substitution and
distance estimation                                           4
?.1  The JC69 model                                          4
1 122  The K80 model                                         10
.2.3  HKY85, F84. T N93 etc.                                11
I 2.4  The transitionitransversion rate ratio                17
I     Vaiabi.e substitution rates across sites                     18
t    . Maximum likelihood estimation                               22
4.1   The JC69 model                                        22
14A2  The K80 model                                          25
"* .43 Profile and integrated likelihood methods            27
,5   iMarkov chains and distance estimation under
,(eneral models                                              30
1.5.1 General theory                                         30
1.5.2  The general time-reversible (GTR) model               33
16    Discussions                                                  37
I6.l Distance estimation under different substitution models  37
S6 2  Limitations of pairwise comparison                    37
E,7   Exercises                                                    38
2     Models of amino acid and codon substitution                  40
Sntroduction                                                    40
22    Models of amino acid replacement                             40
2 2 1 Empirical models                                       40
2.21  Mechanistic models                                     43
2.23  Among-site heterogeneity                               44
2.3   Estimation of distance between two protein sequences         45
.3,1  The Poisson model                                      45
2.3.2  Empirical models                                     46
2.3.3 Gamma distances                                       46
23.4  Example: distance between cat and rabbit p53 genes    47
2.4   Models of codon substitution                                 48
2.5   Estimation of synonymous and nonsynonymous substitution rates  49
2.5.1 Counting methods                                      50(
2.5.2  Maximum likelihood method                            58
215.3  Comparison of methods                                61
*2.5.4 Interpretation and a plethora of distances           62
*2.6  Numerical calculation of the transition-probability matrix   68
2.7   Exercises                                                    70
PART l1: PHYLOGENY RECONSTRUCTION                     71
3     Phylogeny reconstruction: overview                           7
3.    Tree concepts                                                73
3.1,1 Terminology                                           73
3. 12  Topological distance between trees                   77
3.1,3  Consensus trees                                      79
3.1 4  Gene trees and species trees                         80
3.1.5  Classificalion of tree-reconstruction methods        81
32    Exhaustive and heuristic tree search                         82
3.2.1  Exhaustive tree search                               .82
3.2.2  Heuristic tree search                                83
3.2.3 Branch swapping                                       84
3.2.4  Local peaks in the tree space                        87
3.2.5  Stochastic tree search                               89
3.3   Distance methods                                             89
33.1 Least-squares method                                   90
3.3.2  Neighbour-joining method                             92
3.4   Maximum parsimony                                            93
3.4.1 Brief history                                         93
3.4.2 Counting the minimum number of changes given the tree  94
3.4.3  Weighted parsimony and transversion parsimony        95
3.4.4  Long-branch attraction                               98
3.4.5 Assumptions of parsimony                              99
4      Maximum likelihood methods                                100
4.1   Introduction                                                100
4.2   Likelihood calculation on tree                              100
4.2.1  Data, model, tree, and likelihood                   100
42.2  The pruning a"gorithm                                102
4.23  Time reversibility, the root of the tree and the molecular clock  106
.A 2.4  Missig data and alignment gaps                      107
4.2.5  A  nuimerical example: phylogcny of apes              108
43    [ikelihood calculation under more complex models             109
43 1 Models of variable rates among sites                    110
413.2  Models for combined analysis of multiple data sets    116
.3.3  Nonhomogeneous and nonstationary models               1 18
4.34  Amino acid and codon models                            I 19
4..4 Reconstruction of ancestral states                            1 19
44. 1 Overview                                               119
4, 42  Empirical and hierarchical Bayes reconstruction      12 1
"*4!. 3 Discrete morphological characters                   124
4.,.4  Systematic biases in ancestral reconstruction         126
"*5  Nurnerical algorithms for maximum likelihood estimation     128
45,1  Ulnivariate optimization                              129
4.5.2  Multivariate optimization                             131
45.33  Optimization on a fixed tree                          134
45.4  Multipe local peaks on the likelihood surface for a fixed tree  135
4 5 .  Search for the maximum likelihood tree                136
4.6   Approximations to likelihood                                 137
,,I   MoIde selection and robustness                               137
"42  ILRl, AIC, and BIC                                     137
4 /.2  Model adequacy and robustness                        142
4 8   Exercises                                                    144
5      Bayesian methods                                            145
51    T he Bayesian paradigm                                       145
5. L.1  Overview                                            145
5.1.2  B ayess theorem                                       146
5. .3  Classical versus Bayesian statistics                 151
5 2  Prior                                                        158
53   Markov chain Monte Carlo                                    159
5.,,3!  Monte Carlo integration                              160
5 .3.  Metropolis-Hasings algorithm                          161
5.33  Single-romponent Metropolis-Hastings algorithm         164
5 ..4  Gibbs sampler                                         166
53.5  Metropolis -copled MCMC (MCMCMC or MCe'                166
54    Simple moves and their proposal ratios                       167
5.4.1 Sliding window using the uniform proposal             168
5.4.2  Sliding window using normal proposal                 168
54.3  Sliding window using the multivariate normal proposal  169
5.4.4  Proportional shrinking and expanding                  170
5 5   Monitoring Markov chains and processing output               171
55.5   Validating and diagnosing MCMC algorithms             171
5.5.2  Potential scale reduction statistic                   1 7
5.5.3  Processing output                                     1174
5.6   Bayesian phylogenetics                                       1 ,74
5.6.1 Brief history                                          174
5.6,2  General framework                                     175
5.6.3  Summarizing MCMC output                                75
5.64  Bayesian versus likelihood                             177
5.6.5  A numerical example: phylogeny of apes                180
5.7   MCMC algorithms under the coalescent model                   181
5.7.1 Overview                                               181
5.7.2  Estimation of 0                                       181
5.8   Exercises                                                    184
6      Comparison of methods and tests on trees                    185
6.1    Statistical performance of tree-reconstruction methods      186
6.1.  Criteria                                               186
6.1.2  Performance                                           188
6.2   Likelihood                                                   190
6.2.1 Contrast wilh conventional parameter estimation        190
6.2.2  Consistency                                           19.
6.2.3  Efficiency                                            192
6.2.4  Robustness                                            196
6.3   Parsimony                                                    198
613.1 Equivalence with misbehaved likelihood models          198
6.3.2  Equivalence with well-behaved likelihood models      201
63.3  Assumptions and Justifications                        204
6.4    Testing hypotheses concerning trees                         206
6.4o1  Bootstrap                                            207
6.4.2 Interior branch test                                  210
6.4.3  Kishino Hasegawa test and modifications              211
6.4.4 Indexes used in parsimony analysis                     213
6.4.5  Example: phylogeny of apes                            214
*6.5  Appendix: Tuffley and Steel's likelihood analysis of
one character                                               215
PART II: ADVANCEDTOPICS                       221
7      Molecular clock and estimation of species
divergence times                                            223
7.1   Overview                                                     223
7.2   Tests of the molecular clock                                 225
7.2 1 Relative-rate tests                                   225
7.2.2  Likelihood ratio test                                226
" 23  L imitations of the clock tests                        227
"I ,.4  Index of dispersion                                  228
"- 3   ik-lihood estimation of divergence times                    228
7.3.1 Global-clock model                                     228
7  2  Local-clock models                                    230
"1 33  Heuristic rate-smoothing methods                      231
7.34  Dating primate divergences                             233
1.5   Uncertainties in fossils                              235
S4    Bav esian estimation of divergence times                     245
714.1 General framework                                      245
74.2  Calculation of the likelihood                          246
743   Prior on rates                                         247
7.4/4  Uncertainties in fossils and prior on divergence times  248
7.4.5  Application to primate and marmmaian divergences      252
"7.5  Perspectives                                                 257
8     Neutral and adaptive protein evolution                       259
8.    Introduction                                                 259
.2    The neutral theory and tests of neutrality                   260
....2 T[he neutral and nearly neutral theory                 260
8112  Taima's D statistic                                    262
812.3  Fu and Li's D and Fay and Wu's H statistics           264
824 McDonald- Kreitman test and estimation of selective strength  265
8.2.5  Hudson-Kreitman-Aquade test                           267
S,    Lineages undergoing adaptive evolution                      268
".3.1  Heuristic methods                                     268
8 /2  Likelihood method                                      269
.4    Amino acid sites undergoing adaptive evolution               271
841 Three strategies                                         271
8A.2  Likelihood ratio test of positive selection under random-sites
models                                                273
8.A43  Identification of sites under positive selection      276
8A4 4  Positive selection in the human major histocompatability
(MHC) locus                                           276
5     Ad aputve evolution aflfecting particular sites and lineages  279
"8 11 Branch-site test df positive selection                 279
85.2   ther similar models                                   281
8.53  Adaptive evolution in angiosperm phytochromes          282
8 6   Assumptions, limitations, and comparisons                    284
8.6.1 Limitations of current methods                         284
8.6 2  Comparison between tests of neutrality and tests based on dv
and d:                                                 286
87    Adaptively evolving genes                                    286
9     Simulating molecular evolution                              293
9.1   Introduction                                                293
9.2   Random number generator                                     294
9.3   Generation of continuous random variables                   295
9.4   Generation of discrete random variables                     296
9.4.1 Discrete uniform distribution                        296
9.4.2  Binomial distribution                                297
9.4.3  General discrete distribution                       297
9.4.4  Multinornial distribution                           298
9.4.5  The composition method for mixture distributions    298
"*9.4.6 The alias method for sampling froa a discrete distribution  299
9.5    Simulating molecular evolution                             302
9.5.1 Simulating sequences on a fixed tree                 302
9.5.2  Generating random trees                             305
9.6   Exercises                                                   306
I 0   Perspectives                                               308
101   Theoretical issues in phylogeny reconstruction              308
S0.2  Computational issues in analysis of large and heterogeneous data sets  309
10.3  Genome rearrangement data                                   309
10.4  Comparative genomics                                        310



Library of Congress subject headings for this publication: Molecular evolution Mathematical models, Molecular evolution Data processing