- ASR pipeline: audio → features → acoustic model → decoder → text
- Traditional ASR: GMM-HMM, WFST decoding
- End-to-end ASR: CTC loss and decoding, RNN-Transducer (RNN-T)
- Attention-based ASR: Listen Attend and Spell (LAS)
- Modern architectures: Whisper, Conformer (convolution + attention), wav2vec 2.0 (self-supervised pretraining)
- Language model integration: shallow fusion, deep fusion, rescoring
- Streaming vs offline ASR: chunked attention, lookahead, latency constraints
- Evaluation: word error rate (WER), character error rate (CER)