Speaker and Audio Analysis

  • Speaker recognition: verification vs identification
  • Speaker embeddings: i-vectors, d-vectors, x-vectors (TDNN-based), ECAPA-TDNN
  • Speaker diarisation: who spoke when, clustering-based, end-to-end neural diarisation
  • Audio classification: environmental sounds, music genre, audio tagging
  • Audio event detection: Sound Event Detection (SED), AudioSet
  • Acoustic scene classification
  • Audio embeddings: VGGish, PANNs, audio spectrogram transformer (AST)
  • Music information retrieval: beat tracking, chord recognition, source separation basics