Bibliographic record and links to related information available from the Library of Congress catalog.
Note: Contents data are machine generated based on pre-publication provided by the publisher. Contents may have variations from the printed book or be incomplete or contain other coding.
CONTENTS INTRODUCTION 1. How good is synthetic speech? 2. Improvements beyond intelligibility 3. Continuous adaptation 4. Data structure characterisation 5. Shared input properties 6. Intelligibility - some beliefs and some myths 7. Naturalness 8. Variability 9. The introduction of style 10. Expressive content 11. Final introductory remarks PART I CURRENT WORK CHAPTER 1 - HIGH-LEVEL AND LOW-LEVEL SYNTHESIS 1.1 Differentiating between low-level and high-level synthesis 1.2 Two types of text 1.3 The context of high-level synthesis 1.4 Textual rendering CHAPTER 2 - LOW LEVEL SYNTHESISERS - CURRENT STATUS 2.1 The range of low-level synthesisers available 2.1.1 Articulatory synthesis 2.1.2 Formant synthesis 2.1.3 Concatenative synthesis Units for concatenative synthesis Representation of speech in the database Unit selection systems: the data-driven approach Unit joining Cost evaluation in unit selection systems Prosody and concatenative systems Prosody implementation in unit concatenation systems 2.1.4 Hybrid system approaches to speech synthesis CHAPTER 3 - TEXT-TO-SPEECH METHODS 3.1 Methods 3.2 The syntactic parse CHAPTER 4 - DIFFERENT LOW-LEVEL SYNTHESISERS - WHAT CAN BE EXPECED 4.1 The competing types 4.2 The theoretical limits 4.3 Upcoming approaches CHAPTER 5 - LOW-LEVEL SYNTHESIS POTENTIAL 5.1 The input to low-level synthesis 5.2 Text marking 5.2.1 Unmarked text 5.2.2 Marked text - the basics 5.2.3 Waveforms and segment boundaries 5.2.4 Marking boundaries on waveforms - the alignment problem 5.2.5 Labelling the database - segments 5.2.6 Labelling the database - endpointing and alignment PART II A NEW DIRECTION FOR SPEECH SYNTHESIS CHAPTER 1 - A VIEW OF NATURALNESS 1.1 The naturalness concept 1.2 Switchable databases for concatenative synthesis 1.3 Prosodic modifications CHAPTER 2 - PHYSICAL PARAMETERS AND ABSTRACT INFORMATION CHANNELS 2.1 Limitations in the theory and scope of speech synthesis 2.2 Intonation contours from the original database 2.3 Boundaries in intonation CHAPTER 3 - VARIABILITY AND SYSTEM INTEGRITY 3.1 Accent variation 3.2 Voicing 3.3 The Festival system 3.4 Syllable duration 3.5 Changes of approach in speech synthesis CHAPTER 4 - AUTOMATIC SPEECH RECOGNITION 4.1 Advantages of the statistical approach 4.2 Disadvantages of the statistical approach 4.3 Unit selection synthesis compared with automatic speech recognition PART III HIGH-LEVEL CONTROL CHAPTER 1 - THE NEED FOR HIGH-LEVEL CONTROL 1.1 What is high-level control? 1.2 Generalisation in linguistics 1.2.1 Units in the signal 1.3 Achievements of a separate high-level control 1.3.1 The advantages of identifying high-level control CHAPTER 2 - THE INPUT TO HIGH-LEVEL CONTROL 2.1 Segmental linguistic input 2.2 The underlying linguistics model 2.3 Prosody 2.4 Expression CHAPTER 3 - PROBLEMS FOR AUTOMATIC TEXT MARKUP 3.1 The markup and the data 3.2 Generality on the static plane 3.3 Variability in the database - or not 3.4 Multiple databases and perception 3.5 Selecting within a marked database PART IV - AREAS FOR IMPROVEMENT CHAPTER 1 - FILLING GAPS 1.1 General prosody 1.2 Prosody - expression 1.3 The segmental level - accents and register 1.4 Improvements to be expected from filling the gaps CHAPTER 2 - USING DIFFERENT UNITS 2.1 Trade-offs between units 2.2 Linguistically motivated units 2.3 A-linguistic units 2.4 Concatenation 2.5 Improved naturalness using large units CHAPTER 3 - WAVEFORM CONCATENATION SYSTEMS - NATURALNESS AND LARGE DATABASES 3.1 The beginnings of useful automated markup systems 3.2 How much detail in the markup? 3.2.1 Prosodic markup and segmental consequences 3.3 Summary of database markup and content CHAPTER 4 - UNIT SELECTION SYSTEMS 4.1 The supporting theory for synthesis 4.1.1 Terms 4.2 The database paradigm and the limits of synthesis 4.2.1 Variability in the database 4.3 Types of database 4.4 Database size and searchability at low-level 4.4.1 Database size 4.4.2 Database searchability PART V MARKUP CHAPTER 1 - VoiceXML 1.1 Introduction 1.2 VoiceXML and XML 1.3 VoiceXML - functionality 1.4 Principal VoiceXML elements 1.5 Tapping the autonomy of the attached synthesis system CHAPTER 2 - SPEECH SYNTHESIS MARKUP LANGUAGE (SSML) 2.1 Introduction 2.2 Original W3C design criteria for SSML 2.3 Processing the SSML document 2.4 Main SSML elements and their attributes CHAPTER 3 - SABLE CHAPTER 4 - THE NEED FOR PROSODIC MARKUP 4.1 What is prosody? 4.2 Incorporating prosodic markup 4.3 How markup works 4.4 Distinguishing layout from content 4.5 Uses of markup 4.6 Basic control of prosody 4.7 Intrinsic and extrinsic structure and salience 4.8 Automatic markup to enhance orthography - interoperability with the synthesiser 4.9 Hierarchical application of markup 4.10 Markup and perception 4.11 Markup - the way ahead? 4.12 Mark what and how? 4.12.1 Automatic annotation of databases for limited domain systems 4.12.2 Database markup with the minimum of phonology 4.13 Abstract vs. physical prosody PART VI STRENGTHENING THE HIGH-LEVEL MODEL CHAPTER 1 - SPEECH 1.1 Speech production 1.2 Relevance to acoustics 1.3 Summary 1.4 Information for synthesis 1.4.1 Limitation of information CHAPTER 2 - BASIC CONCEPTS 2.1 How does speaking occur? 2.2 Underlying basic disciplines - contributions from linguistics 2.2.1 Linguistic information and speech 2.2.2 Specialist use of the terms phonology and phonetics 2.2.3 Rendering the plan 220.127.116.11 Long and short term events in phonetics 2.2.4 Types of model underlying speech synthesis 18.104.22.168 The static model 22.214.171.124 The dynamic model CHAPTER 3 - UNDERLYING BASIC DISCIPLINES - EXPRESSION STUDIES 3.1 Biology and cognitive psychology 3.2 Modelling biological and cognitive events 3.3 Basic assumptions in our proposed approach 3.4 Biological events 3.5 Cognitive events 3.6 Indexing expression in XML 3.7 Summary CHAPTER 4 - LABELLING EXPRESSIVE/EMOTIVE CONTENT 4.1 Data collection 4.2 Sources of variability 4.3 Summary CHAPTER 5 - THE PROPOSED MODEL 5.1 Organisation of the model 5.2 The model has two stages 5.3 Conditions and restrictions on XML 5.4 Summary CHAPTER 6 - TYPES OF MODEL 6.1 Category models 6.2 Process models PART VII EXPANDED STATIC AND DYNAMIC MODELLING CHAPTER 1 - THE UNDERLYING LINGUISTICS SYSTEM 1.1 Dynamic planes 1.1.1 Specifying time 1.2 Computational dynamic phonology for synthesis 1.3 Computational dynamic phonetics for synthesis 1.3.1 Adding how, what and notions of time 1.4 Static Planes 1.5 Computational static phonology for synthesis 1.5.1 The term process in linguistics 1.6 Computational static phonetics for synthesis 1.7 Supervision 1.7.1 Time constraints 1.8 Summary of the phonological and phonetic models CHAPTER 2 - PLANES FOR SYNTHESIS PART VIII THE PROSODIC FRAMEWORK, CODING AND INTONATION CHAPTER 1 - THE PHONOLOGICAL PROSODIC FRAMEWORK 1.1 Characterising the phonological and phonetic planes CHAPTER 2 - SAMPLE CODE CHAPTER 3 - XML CODING 3.1 Adding detail 3.2 Timing and fundamental frequency control on the dynamic plane 3.3 The underlying markup 3.4 Intrinsic durations 3.5 Rendering intonation as a fundamental frequency contour CHAPTER 4 - PROSODY: GENERAL 4.1 The analysis of prosody 4.2 The principles of some current models of intonation used in synthesis 4.2.1 The Hirst and Di Cristo model (including INTSINT) 4.2.2 Taylor's Tilt model 4.2.3 The ToBI (tones and break indices) model 4.2.4 The basis of intonation modelling 4.2.5 Details of the ToBI model 4.2.6 The INTSINT (International Transcription System for Intonation) model 4.2.7 The Tatham and Morton intonation model Units in T&M intonation CHAPTER 5 - PHONOLOGICAL AND PHONETIC MODELS OF INTONATION 5.1 Phonological models 5.2 Phonetic models 5.3 Naturalness 5.4 Intonation modelling - levels of representation PART IX APPROACHES TO NATURAL SOUNDING SYNTHESIS CHAPTER 1 - THE GENERAL APPROACH 1.1 Parameterisation 1.2 A proposal for a model to support synthesis 1.3 Segments and prosodics - hierarchical ordering 1.3.1 A sample wrapping in XML 1.4 Prosodic wrapper for XML 1.4.1 The phonological prosodic framework CHAPTER 2 - THE EXPRESSION WRAPPER IN XML 2.1 Expression wrapping the entire utterance 2.2 Sourcing for synthesis 2.3 Attributes vs. elements 2.3.1 Attribute sources vary 2.4 Sample cognitive and biological components 2.4.1 Parameters of expression 2.4.2 Blends 2.4.3 Identifying and characterising differences in expression 2.4.4 A grammar of expressions CHAPTER 3 - ADVANTAGES OF XML in wrapping 3.1 Constraints imposed by the XML descriptive system 3.2 Variability CHAPTER 4 - CONSIDERATIONS IN CHARACTERISING EXPRESSION/EMOTION 4.1 Suggested characterisation of features of expressive/emotive content 4.2 Categories 4.3 Choices in dialogue design 4.4 Extent of underlying expressive modelling 4.5 Pragmatics CHAPTER 5 - SUMMARY 5.1 Speaking 5.2 Mutability CONCLUDING OVERVIEW 1. Shared characteristics between database and output - the integrity of the synthesised utterance 2. Concept-to-speech 3. Text-to-speech synthesis - the basic overall concept 4. Prosody in text-to-speech systems 5. Optimising the acoustic signal for perception 6. Conclusion REFERENCES INDEX
Library of Congress Subject Headings for this publication:
Speech processing systems.