- Why tokenise images: bridging continuous pixels and discrete language model vocabularies
- VQ-VAE: vector quantisation, codebook learning, commitment loss
- VQ-GAN: combining VQ-VAE with adversarial training for higher fidelity
- Residual quantisation and multi-scale codebooks
- Image tokenisers: DALL-E tokeniser, LlamaGen, Cosmos tokeniser
- Video tokenisation: temporal compression, 3D VQ-VAE, causal video tokenisers
- Continuous vs discrete tokens: when to quantise and when to project
- Applications: autoregressive image generation, unified vision-language tokens