- Vision Transformer (ViT): patch embedding, class token, position embeddings
- Hybrid architectures: DeiT, Swin Transformer (shifted windows, hierarchical), PVT
- Self-supervised visual learning: contrastive (SimCLR, MoCo, BYOL, DINO), masked image modelling (MAE, BEiT)
- Image generation: GANs (generator, discriminator, mode collapse, training tricks, StyleGAN), VAEs
- Diffusion models: forward/reverse process, DDPM, DDIM, score-based models, classifier-free guidance, latent diffusion (Stable Diffusion)
- Flow matching: continuous normalising flows, optimal transport, rectified flows