Table of contents for Innovations in machine learning : theory and applications / Dawn E. Holmes, Lakhmi C. Jain (eds.).


Bibliographic record and links to related information available from the Library of Congress catalog
Note: Electronic data is machine generated. May be incomplete or contain other coding.


Counter
1    A  Bayesian Approach to Causal Discovery                   ..................................... 1
1.1  Introduction    .......... ... ............................ .......... ...............................   1
1.2 The Bayesian Approach ................................................................  2
1.3 Model Selection and Search.........................................................  6
1.4  Priors  .......................................... ................. ............................   7
1.5 Example.................................................................................  10
1.6 Methods for Incomplete Data and Hidden Variables .................... 13
1.6.1 Monte-Carlo Method ............................................................ 14
1.7  A Case  Study  ........................................................................... ...  20
1.8 Open Issues ........................................................................... 23
Acknowledgments  ..................................................................................  25
References............................................................................................... . 25
2    A  Tutorial on Learning Causal Influence ....................................... 29
2.1  Introduction    ................................................................................. 29
2.1.1  Causation   ...........................................................................  30
2.1.2 Causal networks .............................................   .......... .... 33
2.2 Learning Causal Influences ......................................................... 38
2.2.1 Making the Causal Faithfulness Assumption....................... 36
2.2.2 Assuming Only Causal Embedded Faithfulness ..................42
2.2.3 Assuming Causal Embedded Faithfulness
with Selection Bias.........................................   .......... .... 53
2.3 Learning Causation From  Data on Two Variables......................      56
2.3.1 Preliminary Concepts .....................................................   56
2.3.2 Application to Causal Learning........................             ........... . 62
2.3.3 Application to Quantum  Mechanics..................................... 64
References............................................................................................   69
3     Learning Based Programming ........................................................ 73
3.1 Introduction ...........................................................................74
3.2 Learning Based Programming.................................           ........... ... 76
3.3 The LBP Programming Model..................................................... 77
3.3.1 Knowledge Representations for LBP.................................           79
3.3.2  Interaction.................................................... ...................... 88
3.3.3 Learning Operators in LBP ............................................ 89
3.3.4  Inference............................................................................ . 91
3.3.5  Compilation........................................................................  92
3.4   Related Work    .................................................      ....................... 92
3.5   Discussion          ....................................................      ......................... 93
Acknowledgments ....................................................                       ......................... 94
References............................................................................................                       94
4      N-l Experiments Suffice to Determine the Causal Relations
Among N Variables .......................................................................... 97
4.1 Introduction ................................................................................. 97
4.2   The Idea.................................................................................... 102
4.3   Discussion           .................................................................................. 106
Acknowledgements.................................................................................. 107
Appendix: Proofs ................................................................................... 107
References..........................................................................................                      112
5       Support Vector Inductive Logic Programming                                     ............................        113
5.1 Introduction .................................................................................                 113
5.2   Background            .................................................................................      116
5.2.1 Kernels and Support Vector Machines............................. 116
5.2.2 Inductive Logic Programming ............................................... 169
5.3 Support Vector Inductive Logic Programming (SVILP) ........... 119
5.3.1 Family example................................           ............................... 120
5.3.2    Definition of kernel .......................................................... 121
5.4   Related Work    ............................................................................ 122
5.4.1 Propositionalisation.......................................................... 125
5.4.2    Kernel within ILP........          .........    ..................... ..............      126
5.5   Implementation              ......................................................................... 127
5.6   Experiments.............................................................................                     127
5.6.1 Materials........................................................................... 127
5.6.2 Methods............................................................................              128
5.7   Conclusions and Further Works...............................................           131
Acknowledgements                    ................................................................................ 132
References................................................................................................ 132
6     Neural Probabilistic  Language Models........................................... 137
6.1 Introduction...............................                  ...................................    ....       138
6.1.1 Fighting the Curse of Dimensionality
with     Distributed  Representations ....................................... 140
6.1.2     Relation to  Previous Work    ................................................. 141
6.2   A   Neural Model........................................................................... 143
6.3   First Experimental Results ........................................................ 147
6.3.1 Comparative Results ........................................................ 148
6.4 Architectural Extension: Energy Minimization Network............ 150
6.5 Speeding-up Training by Importance Sampling ....................... 151
6.5.1 Approximation of the Log-Likelihood Gradient
by Biased Importance Sampling........................................ 152
6.5.2 Experimental Results........................................................ 157
6.6 Speeding-up Probability Computation
by Hierarchical Decomposition............................................... 159
6.6.1 Hierarchical Decomposition Can Provide Exponential
Speed-up    ...................................... ....................................   160
6.6.2 Sharing Parameters Across the Hierarchy........................ 162
6.6.3 Using WordNet to Build the Hierarchical Decomposition. 163
6.6.4 Comparative Results ........................................................ 164
6.7 Short Lists and Speech Recognition Applications .................... 166
6.7.1 Fast Recognition................................................................. 168
6.7.2 Fast Training .................................................................... 171
6.7.3 Regrouping of training examples ....................................... 173
6.7.4 Baseline Speech Recognizer .............................................. 174
6.7.5 Language model training.................................................. 176
6.7.6 Experimental results ......................................................... 177
6.7.7 Interpolation and ensembles............................................. 180
6.7.8 Evaluation on other data...................................................  181
6.8 Conclusions and Future Work................................................... 181
Acknowledgments .................................................................................. 181
References................................................................................................ 18 1
7     Computational Grammatical Inference...........................................         187
7.1 Introduction ................................................................................. 188
7.2 Linguistic Grammatical Inference............................................. 190
7.3 Empirical Grammatical Inference ............................................. 192
7.4 Formal Grammatical Inference ................................................. 192
7.5 Overlap between the fields......................................................... 297
7.6 Conclusion........................................................................200
References................................ ............................................... .......... 200
8    On Kernel Target Alignment...........................................................205
8.1 Introduction .........................................................................205
8.2 Similarity and Alignment ......................................................... 208
8.2.1  Kernels  .............................................................................  208
8.2.2 Learning and Similarities .................................................209
8.2.3 Definition of the Alignment ............................................. 211
8.3 Properties of the Alignment ......................................................213
8.3.1 Building kernels with large alignment ...............................213
8.3.2 Statistical Properties of the Alignment.............................215
8.3.3 Alignment and Generalization .........................................219
8.4 Algorithms for Alignment Optimization .................................... 221
8.4.1 Adapting the Alignment ................................................ 222
8.4.2 Kernel Selection and Combination .................................. 228
8.4.3 Optimization over Combination of Kernels. ..................... 211
8.4.5 Non-margin based Gram-Schmidt Optimization ..............234
8.4.6  Experiments........................................................................ 235
8.5 Clustering by Maximal Alignment............................................ 237
8.5.1 Clustering by Minimizing the Cut-Cost...........................243
8.6  Conclusions ..................................................................... ....  247
References............................................................................................... 248
App. A: Proofs for the concentration of the Alignment......................... 250
9   The Structure of Version Space ................................................. 257
9.1  Introduction..................................................................... .... 257
9.2 Generalisation Error Bounds for Consistent Classifiers............         259
9.3 Consequences of the Egalitarian Bound           ....................................261
9.3.1 Linear Classifiers................................................................ 261
9.3.2 From Margin To Sparsity - A Revival of the Perceptron.. 262
9.3.3 Bayes Classification Strategy............................................. 263
9.3.4 Have we Thrown the Baby out with the Bath Water?........ 264
9.4 Experimental Results for Linear Classifiers....................................264
9.4.1 The Kernel Gibbs Sampler.............................................266
9.4.2 Distribution of Generalisation Errors and Margins............269
9.5  C onclusion................................................................................... 269
Acknowledgements...............................................................................  271
References................................................................................................ 27 1
Index    ........................................................................................ ....... 275



Library of Congress subject headings for this publication: Machine learning, Artificial intelligence