Data-Driven Techniques in Speech Synthesis
Data-Driven Techniques in Speech Synthesis gives a first review of this new field. All areas of speech synthesis from text are covered, including text analysis, letter-to-sound conversion, prosodic marking and extraction of parameters to drive synthesis hardware. Fuelled by cheap computer processing...
Other Authors: | |
---|---|
Format: | eBook |
Language: | English |
Published: |
New York, NY
Springer US
2001, 2001
|
Edition: | 1st ed. 2001 |
Series: | Telecommunications Technology & Applications Series
|
Subjects: | |
Online Access: | |
Collection: | Springer Book Archives -2004 - Collection details see MPG.ReNa |
Table of Contents:
- 8.2 Problem Background
- 8.3 Data Inputs and Outputs to Module M1
- 8.4 Detailed Architecture of the Text-to-Phonetics Module
- 8.5 Model Selection
- 8.6 Results
- 8.7 Conclusions and Further Work
- 9 Using the Tilt Intonation Model: A Data-Driven Approach
- 9.1 Background
- 9.2 Tilt Intonation Model
- 9.3 Training Tilt Models
- 9.4 Experiments and Results
- 9.5 Conclusion
- 10 Estimation of Parameters for the Klatt Synthesizer from a Speech Database
- 10.1 Introduction
- 10.2 Global Parameter Settings
- 10.3 Synthesis of Vowels, Diphthongs and Glides
- 10.4 Stop Consonants (and Voiceless Vowels)
- 10.5 Estimation of Fricative Parameters
- 10.6 Other Sounds
- 10.7 Application: A Database of English Monosyllables
- 10.8 Conclusion
- 11 Training Accent and Phrasing Assignment on Large Corpora
- 11.1 Introduction
- 11.2 Intonational Model
- 11.3 Classification and Regression Trees
- 11.4 Predicting Pitch Accent Placement
- 4.7 Error Analyses
- 4.8 Evaluating the Hierarchical Representation
- 4.9 Discussions and Future Work
- 5 English Letter-Phoneme Conversion by Stochastic Transducers
- 5.1 Introduction
- 5.2 Modelling Transduction
- 5.3 Stochastic Finite-State Transducers
- 5.4 Inference of Letter-Phoneme Correspondences
- 5.5 Translation
- 5.6 Results
- 5.7 Conclusions
- 6 Selection of Multiphone Synthesis Units and Grapheme-to-Phoneme Transcription using Variable-Length Modeling of Strings
- 6.1 Introduction
- 6.2 Multigram Model
- 6.3 Multiphone Units for Speech Synthesis
- 6.4 Learning Letter-to-Sound Correspondences
- 6.5 General Discussion and Perspectives
- 7 TreeTalk: Memory-Based Word Phonemisation
- 7.1 Introduction
- 7.2 Memory-Based Phonemisation
- 7.3 tribl and TreeTalk
- 7.4 Modularity and Linguistic Representations
- 7.5 Conclusion
- 8 Learnable Phonetic Representations in a Connectionist TTS System — I: Text to Phonetics
- 8.1 Introduction
- 1 Learning About Speech from Data: Beyond NETtalk
- 1.1 Introduction
- 1.2 Architecture of a TTS System
- 1.3 Automatic Pronunciation Generation
- 1.4 Prosody
- 1.5 The Synthesis Module
- 1.6 Conclusion
- 2 Constructing High-Accuracy Letter-to-Phoneme Rules with Machine Learning
- 2.1 Introduction
- 2.2 The Nettalk Approach
- 2.3 High-Performance ML Approach
- 2.4 Evaluation of Pronunciations
- 2.5 Conclusions
- 3 Analogy, the Corpus and Pronunciation
- 3.1 Introduction
- 3.2 Why Adopt a Psychological Approach?
- 3.3 The Corpus as a Resource
- 3.4 The Sullivan and Damper Model
- 3.5 Parallels with Optimality Theory
- 3.6 Implementation
- 3.7 Corpora
- 3.8 Performance Evaluation
- 3.9 Future Challenges
- 4 A Hierarchical Lexical Representation for Pronunciation Generation
- 4.1 Introduction
- 4.2 Previous Work
- 4.3 Hierarchical Lexical Representation
- 4.4 Generation Algorithm
- 4.5 Evaluation Criteria
- 4.6 Results on Letter-to-Sound Generation
- 11.5 Predicting Phrase Boundary Location
- 11.6 Conclusion
- 12 Learnable Phonetic Representations in a Connectionist TTS System — II: Phonetics to Speech
- 12.1 Introduction
- 12.2 Architecture of Phonetics-to-Speech Module
- 12.3 Training and Alignment
- 12.4 Phonetics-to-Speech Results
- 12.5 Conclusions and Further Work