- TTS pipeline: text normalisation → phoneme conversion → acoustic model → vocoder
- Vocoders: WaveNet, WaveRNN, WaveGlow, HiFi-GAN, neural source-filter models
- Acoustic models: Tacotron 1/2, FastSpeech 1/2 (non-autoregressive, duration prediction)
- Modern TTS: VITS (end-to-end), VALL-E (codec language model), StyleTTS
- Prosody modelling: pitch, duration, energy, style embeddings
- Voice conversion: speaker embeddings, disentangled representations
- Voice cloning: few-shot and zero-shot approaches
- Voice activity detection (VAD) and acoustic activity detection