Lecture: Content-Based Audio Retrieval¶

After working through the material of this lecture, you should be able to answer the following questions:

What is meant by "content-based retrieval"? What is meant by "query-by-example retrieval"?
What is the difference between audio identification, audio matching, and version identification? How are these tasks arranged in the specificity–granularity plane? (See Fig. 7.22.)
What are the general requirements for an audio identification system?
What is the main idea of the Shazam fingerprinting system? What are the fingerprints used in the system? To which extent are they suited to meet the general requirements?
What are does the term "constellation map" refer to?
How can the matching of constellation maps be accelerated?
What is the basic idea of the peak pairing strategy? (See Fig. 7.7.)
What is the acceleration when using the peak pairing strategy compared to the original procedure? (See Eq. 7.15)
What is the main idea of audio matching? What is the role of the matching function?
What is the difference between dynamic time warping (DTW) and subsequence DTW? (See Fig. 7.23.)
What is the main idea of version identification?
What is the difference between the identification procedure (common subsequence matching) and subsequence DTW? (See Fig. 7.23.)

Reading Assignments¶

Chapter 7, Müller, FMP, Springer 2015
- Introduction to Chapter 7
Section 7.1: Audio Identification
- Section 7.1.1: General Requirements
- Section 7.1.2: Audio Fingerprints Based on Spectral Peaks
- Section 7.1.3: Indexing, Retrieval, Inverted Lists
- Section 7.1.4: Index-Based Audio Identification
Section 7.2: Audio Matching
- 7.2.3: DTW-Based Matching (only main idea)
Section 7.3: Version Identification
- Section 7.3.2: Identification Procedure (only main idea)
Section 7.4.4: Alignment Scenarios

Here is a selection of videos related to tempo and beat.

How Shazam Works (10:24)
Database match; timbre; fundamental frequency; overtone; spectrogram; fingerprint; stand-out frequencies; match; hash function; equal distribution; collision avoidance; calculation time; anchor point
Tech Talk: What's that Sound? An Overview of Shazam's Audio Search Algorithm (11:01)
Guiding principles; fingerprinting; combinatorial hashing; searching
Happy Birthday in the Styles of 10 Classical Composers (18:26)
Bach (0:00); Beethoven (1:17); Schumann (3:34); Chopin (4:51); Liszt (6:34); Debussy (9:18); Satie (12:10); Rachmaninoff (13:30); Cage (14:29); Reich (16:11)
7 Happy Songs in Horror Versions (4:44)
Twinkle, twinkle little star (0:13); Happy birthday to you (0:37); Jesus bleibt meine Freude (1:23); Jingle bells (1:47); Bach's Toccata and Fuge in D minor (3:09); Ode an die Freude (3:35); Frère Jacques (3:56); Bach's Toccata and Fuge in D minor (4:17)