Data-Driven Techniques in Speech Synthesis

Data-Driven Techniques in Speech Synthesis gives a first review of this new field. All areas of speech synthesis from text are covered, including text analysis, letter-to-sound conversion, prosodic marking and extraction of parameters to drive synthesis hardware. Fuelled by cheap computer processing...

Full description

Bibliographic Details
Other Authors: Damper, R.I. (Editor)
Format: eBook
Language:English
Published: New York, NY Springer US 2001, 2001
Edition:1st ed. 2001
Series:Telecommunications Technology & Applications Series
Subjects:
Online Access:
Collection: Springer Book Archives -2004 - Collection details see MPG.ReNa
Table of Contents:
  • 8.2 Problem Background
  • 8.3 Data Inputs and Outputs to Module M1
  • 8.4 Detailed Architecture of the Text-to-Phonetics Module
  • 8.5 Model Selection
  • 8.6 Results
  • 8.7 Conclusions and Further Work
  • 9 Using the Tilt Intonation Model: A Data-Driven Approach
  • 9.1 Background
  • 9.2 Tilt Intonation Model
  • 9.3 Training Tilt Models
  • 9.4 Experiments and Results
  • 9.5 Conclusion
  • 10 Estimation of Parameters for the Klatt Synthesizer from a Speech Database
  • 10.1 Introduction
  • 10.2 Global Parameter Settings
  • 10.3 Synthesis of Vowels, Diphthongs and Glides
  • 10.4 Stop Consonants (and Voiceless Vowels)
  • 10.5 Estimation of Fricative Parameters
  • 10.6 Other Sounds
  • 10.7 Application: A Database of English Monosyllables
  • 10.8 Conclusion
  • 11 Training Accent and Phrasing Assignment on Large Corpora
  • 11.1 Introduction
  • 11.2 Intonational Model
  • 11.3 Classification and Regression Trees
  • 11.4 Predicting Pitch Accent Placement
  • 4.7 Error Analyses
  • 4.8 Evaluating the Hierarchical Representation
  • 4.9 Discussions and Future Work
  • 5 English Letter-Phoneme Conversion by Stochastic Transducers
  • 5.1 Introduction
  • 5.2 Modelling Transduction
  • 5.3 Stochastic Finite-State Transducers
  • 5.4 Inference of Letter-Phoneme Correspondences
  • 5.5 Translation
  • 5.6 Results
  • 5.7 Conclusions
  • 6 Selection of Multiphone Synthesis Units and Grapheme-to-Phoneme Transcription using Variable-Length Modeling of Strings
  • 6.1 Introduction
  • 6.2 Multigram Model
  • 6.3 Multiphone Units for Speech Synthesis
  • 6.4 Learning Letter-to-Sound Correspondences
  • 6.5 General Discussion and Perspectives
  • 7 TreeTalk: Memory-Based Word Phonemisation
  • 7.1 Introduction
  • 7.2 Memory-Based Phonemisation
  • 7.3 tribl and TreeTalk
  • 7.4 Modularity and Linguistic Representations
  • 7.5 Conclusion
  • 8 Learnable Phonetic Representations in a Connectionist TTS System — I: Text to Phonetics
  • 8.1 Introduction
  • 1 Learning About Speech from Data: Beyond NETtalk
  • 1.1 Introduction
  • 1.2 Architecture of a TTS System
  • 1.3 Automatic Pronunciation Generation
  • 1.4 Prosody
  • 1.5 The Synthesis Module
  • 1.6 Conclusion
  • 2 Constructing High-Accuracy Letter-to-Phoneme Rules with Machine Learning
  • 2.1 Introduction
  • 2.2 The Nettalk Approach
  • 2.3 High-Performance ML Approach
  • 2.4 Evaluation of Pronunciations
  • 2.5 Conclusions
  • 3 Analogy, the Corpus and Pronunciation
  • 3.1 Introduction
  • 3.2 Why Adopt a Psychological Approach?
  • 3.3 The Corpus as a Resource
  • 3.4 The Sullivan and Damper Model
  • 3.5 Parallels with Optimality Theory
  • 3.6 Implementation
  • 3.7 Corpora
  • 3.8 Performance Evaluation
  • 3.9 Future Challenges
  • 4 A Hierarchical Lexical Representation for Pronunciation Generation
  • 4.1 Introduction
  • 4.2 Previous Work
  • 4.3 Hierarchical Lexical Representation
  • 4.4 Generation Algorithm
  • 4.5 Evaluation Criteria
  • 4.6 Results on Letter-to-Sound Generation
  • 11.5 Predicting Phrase Boundary Location
  • 11.6 Conclusion
  • 12 Learnable Phonetic Representations in a Connectionist TTS System — II: Phonetics to Speech
  • 12.1 Introduction
  • 12.2 Architecture of Phonetics-to-Speech Module
  • 12.3 Training and Alignment
  • 12.4 Phonetics-to-Speech Results
  • 12.5 Conclusions and Further Work