Table of contents for Developments in speech synthesis / Mark Tatham, Katherine Morton.

Bibliographic record and links to related information available from the Library of Congress catalog.

Note: Contents data are machine generated based on pre-publication provided by the publisher. Contents may have variations from the printed book or be incomplete or contain other coding.


Counter
CONTENTS
INTRODUCTION
1. How good is synthetic speech?
2. Improvements beyond intelligibility
3. Continuous adaptation
4. Data structure characterisation
5. Shared input properties
6. Intelligibility - some beliefs and some myths
7. Naturalness
8. Variability
9. The introduction of style
10. Expressive content
11. Final introductory remarks
PART I CURRENT WORK
	CHAPTER 1 - HIGH-LEVEL AND LOW-LEVEL SYNTHESIS
1.1 Differentiating between low-level and high-level synthesis
1.2 Two types of text
1.3 The context of high-level synthesis
1.4 Textual rendering
	CHAPTER 2 - LOW LEVEL SYNTHESISERS - CURRENT STATUS
		2.1 The range of low-level synthesisers available
			2.1.1 Articulatory synthesis
			2.1.2 Formant synthesis
			2.1.3 Concatenative synthesis
				Units for concatenative synthesis
				Representation of speech in the database
				Unit selection systems: the data-driven approach
				Unit joining
				Cost evaluation in unit selection systems
				Prosody and concatenative systems
				Prosody implementation in unit concatenation systems
2.1.4 Hybrid system approaches to speech synthesis
	CHAPTER 3 - TEXT-TO-SPEECH METHODS
		3.1 Methods		
		3.2 The syntactic parse
	CHAPTER 4 - DIFFERENT LOW-LEVEL SYNTHESISERS - WHAT CAN BE EXPECED
4.1 The competing types		
4.2 The theoretical limits
		4.3 Upcoming approaches
	CHAPTER 5 - LOW-LEVEL SYNTHESIS POTENTIAL
		5.1 The input to low-level synthesis
		5.2 Text marking
			5.2.1 Unmarked text
			5.2.2 Marked text - the basics
			5.2.3 Waveforms and segment boundaries
			5.2.4 Marking boundaries on waveforms - the alignment problem
			5.2.5 Labelling the database - segments
			5.2.6 Labelling the database - endpointing and alignment
PART II A NEW DIRECTION FOR SPEECH SYNTHESIS
	CHAPTER 1 - A VIEW OF NATURALNESS
1.1 The naturalness concept
1.2 Switchable databases for concatenative synthesis
1.3 Prosodic modifications
	CHAPTER 2 - PHYSICAL PARAMETERS AND ABSTRACT INFORMATION CHANNELS
		2.1 Limitations in the theory and scope of speech synthesis
		2.2 Intonation contours from the original database
		2.3 Boundaries in intonation
	CHAPTER 3 - VARIABILITY AND SYSTEM INTEGRITY	
		3.1 Accent variation
		3.2 Voicing
		3.3 The Festival system
		3.4 Syllable duration
		3.5 Changes of approach in speech synthesis
	CHAPTER 4 - AUTOMATIC SPEECH RECOGNITION
		4.1 Advantages of the statistical approach
		4.2 Disadvantages of the statistical approach
		4.3 Unit selection synthesis compared with automatic speech recognition
PART III HIGH-LEVEL CONTROL
	CHAPTER 1 - THE NEED FOR HIGH-LEVEL CONTROL
1.1 What is high-level control?
1.2 Generalisation in linguistics
1.2.1 Units in the signal
1.3 Achievements of a separate high-level control
1.3.1 The advantages of identifying high-level control
	CHAPTER 2 - THE INPUT TO HIGH-LEVEL CONTROL
		2.1 Segmental linguistic input
		2.2 The underlying linguistics model
		2.3 Prosody
		2.4 Expression
	CHAPTER 3 - PROBLEMS FOR AUTOMATIC TEXT MARKUP
		3.1 The markup and the data
		3.2 Generality on the static plane
		3.3 Variability in the database - or not
		3.4 Multiple databases and perception
		3.5 Selecting within a marked database
PART IV - AREAS FOR IMPROVEMENT
	CHAPTER 1 - FILLING GAPS
1.1 General prosody
1.2 Prosody - expression
1.3 The segmental level - accents and register
1.4 Improvements to be expected from filling the gaps
	CHAPTER 2 - USING DIFFERENT UNITS
		2.1 Trade-offs between units
		2.2 Linguistically motivated units
		2.3 A-linguistic units
		2.4 Concatenation
		2.5 Improved naturalness using large units
	CHAPTER 3 - WAVEFORM CONCATENATION SYSTEMS - NATURALNESS AND LARGE DATABASES
		3.1 The beginnings of useful automated markup systems
		3.2 How much detail in the markup?
			3.2.1 Prosodic markup and segmental consequences
		3.3 Summary of database markup and content
	CHAPTER 4 - UNIT SELECTION SYSTEMS
		4.1 The supporting theory for synthesis
			4.1.1 Terms
		4.2 The database paradigm and the limits of synthesis
			4.2.1 Variability in the database
		4.3 Types of database
		4.4 Database size and searchability at low-level
			4.4.1 Database size
			4.4.2 Database searchability
PART V MARKUP
	CHAPTER 1 - VoiceXML
1.1 Introduction
1.2 VoiceXML and XML
1.3 VoiceXML - functionality
1.4 Principal VoiceXML elements
1.5 Tapping the autonomy of the attached synthesis system
	CHAPTER 2 - SPEECH SYNTHESIS MARKUP LANGUAGE (SSML)
		2.1 Introduction
		2.2 Original W3C design criteria for SSML
		2.3 Processing the SSML document
		2.4 Main SSML elements and their attributes
	CHAPTER 3 - SABLE
	CHAPTER 4 - THE NEED FOR PROSODIC MARKUP
		4.1 What is prosody?
		4.2 Incorporating prosodic markup
		4.3 How markup works
		4.4 Distinguishing layout from content
		4.5 Uses of markup
		4.6 Basic control of prosody
		4.7 Intrinsic and extrinsic structure and salience
		4.8 Automatic markup to enhance orthography - interoperability with the synthesiser
		4.9 Hierarchical application of markup
		4.10 Markup and perception
		4.11 Markup - the way ahead?
		4.12 Mark what and how?
			4.12.1 Automatic annotation of databases for limited domain systems
			4.12.2 Database markup with the minimum of phonology
		4.13 Abstract vs. physical prosody
PART VI STRENGTHENING THE HIGH-LEVEL MODEL
	CHAPTER 1 - SPEECH
1.1 Speech production
1.2 Relevance to acoustics
1.3 Summary
1.4 Information for synthesis
1.4.1 Limitation of information
	CHAPTER 2 - BASIC CONCEPTS
		2.1 How does speaking occur?
		2.2 Underlying basic disciplines - contributions from linguistics
			2.2.1 Linguistic information and speech
			2.2.2 Specialist use of the terms phonology and phonetics
			2.2.3 Rendering the plan
				2.2.3.1 Long and short term events in phonetics
			2.2.4 Types of model underlying speech synthesis
				2.2.4.1 The static model
				2.2.4.2 The dynamic model
	CHAPTER 3 - UNDERLYING BASIC DISCIPLINES - EXPRESSION STUDIES
		3.1 Biology and cognitive psychology
		3.2 Modelling biological and cognitive events
		3.3 Basic assumptions in our proposed approach
		3.4 Biological events
		3.5 Cognitive events
		3.6 Indexing expression in XML
		3.7 Summary
	CHAPTER 4 - LABELLING EXPRESSIVE/EMOTIVE CONTENT
		4.1 Data collection
		4.2 Sources of variability
		4.3 Summary
	CHAPTER 5 - THE PROPOSED MODEL
		5.1 Organisation of the model
		5.2 The model has two stages
		5.3 Conditions and restrictions on XML
		5.4 Summary
	CHAPTER 6 - TYPES OF MODEL
		6.1 Category models
		6.2 Process models
PART VII EXPANDED STATIC AND DYNAMIC MODELLING
	CHAPTER 1 - THE UNDERLYING LINGUISTICS SYSTEM
1.1 Dynamic planes
1.1.1 Specifying time
1.2 Computational dynamic phonology for synthesis
1.3 Computational dynamic phonetics for synthesis
1.3.1 Adding how, what and notions of time
1.4 Static Planes
1.5 Computational static phonology for synthesis
1.5.1 The term process in linguistics
1.6 Computational static phonetics for synthesis
1.7 Supervision
1.7.1 Time constraints
1.8 Summary of the phonological and phonetic models
	CHAPTER 2 - PLANES FOR SYNTHESIS	
PART VIII THE PROSODIC FRAMEWORK, CODING AND INTONATION
	CHAPTER 1 - THE PHONOLOGICAL PROSODIC FRAMEWORK	
1.1 Characterising the phonological and phonetic planes
	CHAPTER 2 - SAMPLE CODE
	CHAPTER 3 - XML CODING
		3.1 Adding detail
		3.2 Timing and fundamental frequency control on the dynamic plane
		3.3 The underlying markup
		3.4 Intrinsic durations
		3.5 Rendering intonation as a fundamental frequency contour
	CHAPTER 4 - PROSODY: GENERAL
		4.1 The analysis of prosody
		4.2 The principles of some current models of intonation used in synthesis
			4.2.1 The Hirst and Di Cristo model (including INTSINT)
			4.2.2 Taylor's Tilt model
		 	4.2.3 The ToBI (tones and break indices) model
			4.2.4 The basis of intonation modelling
			4.2.5 Details of the ToBI model
			4.2.6 The INTSINT (International Transcription System for Intonation) model
			4.2.7 The Tatham and Morton intonation model
				Units in T&M intonation
	CHAPTER 5 - PHONOLOGICAL AND PHONETIC MODELS OF INTONATION
		5.1 Phonological models
		5.2 Phonetic models
		5.3 Naturalness
		5.4 Intonation modelling - levels of representation
PART IX APPROACHES TO NATURAL SOUNDING SYNTHESIS
	CHAPTER 1 - THE GENERAL APPROACH	
1.1 Parameterisation
1.2 A proposal for a model to support synthesis
1.3 Segments and prosodics - hierarchical ordering
1.3.1 A sample wrapping in XML
1.4 Prosodic wrapper for XML
1.4.1 The phonological prosodic framework
	CHAPTER 2 - THE EXPRESSION WRAPPER IN XML
		2.1 Expression wrapping the entire utterance
		2.2 Sourcing for synthesis
		2.3 Attributes vs. elements
			2.3.1 Attribute sources vary
		2.4 Sample cognitive and biological components
			2.4.1 Parameters of expression
			2.4.2 Blends
			2.4.3 Identifying and characterising differences in expression
			2.4.4 A grammar of expressions
	CHAPTER 3 - ADVANTAGES OF XML in wrapping
		3.1 Constraints imposed by the XML descriptive system
		3.2 Variability
	CHAPTER 4 - CONSIDERATIONS IN CHARACTERISING EXPRESSION/EMOTION
		4.1 Suggested characterisation of features of expressive/emotive content
		4.2 Categories
		4.3 Choices in dialogue design
		4.4 Extent of underlying expressive modelling
		4.5 Pragmatics
	CHAPTER 5 - SUMMARY
		5.1 Speaking
		5.2 Mutability
CONCLUDING OVERVIEW
1. Shared characteristics between database and output - the integrity of the synthesised utterance
2. Concept-to-speech
3. Text-to-speech synthesis - the basic overall concept
4. Prosody in text-to-speech systems
5. Optimising the acoustic signal for perception
6. Conclusion
REFERENCES
INDEX

Library of Congress Subject Headings for this publication:

Speech processing systems.