1 / 30

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing. Michael A. Casey Digital Musics Dartmouth College, Hanover, NH. Scalable Similarity. 8M tracks in commercial collection PByte of multimedia data Require passage-level retrieval (~ 2 bars)

len-griffin
Télécharger la présentation

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing Michael A. Casey Digital Musics Dartmouth College, Hanover, NH ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals

  2. Scalable Similarity • 8M tracks in commercial collection • PByte of multimedia data • Require passage-level retrieval (~ 2 bars) • Require scalable nearest-neighbor methods

  3. Specificity • Partial track retrieval • Alternate versions: remix, cover, live, album • Task is mid-high specificity

  4. Example: remixing • Original Track • Remix 1 • Remix 2 • Remix 3

  5. Audio Shingles • Shingles provide contextual information about features • Originally used for Internet search engines: • Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: • “Syntactic Clustering of the Web”.Computer Networks 29(8-13): 1157-1166 (1997) • Related to N-grams, overlapping sequences of features • Applied to audio domain by Casey and Slaney : • Casey, M.   Slaney, M.   “The Importance of Sequences in Musical Similarity”, in Proc. • IEEE Int. Conf. onAcoustics, Speech and Signal Processing, 2006. ICASSP 2006 , concatenate l frames of m dimensional features A shingle is defined as:

  6. Audio Shingle Similarity

  7. Audio Shingle Similarity For shingles with M dimensions (M=l.m); m=12, 20; l=30,40 , a query shingle drawn from a query track {Q} , database of audio tracks indexed by (n) , a database shingle from track n Shingles are normalized to unit vectors, therefore:

  8. AudioDB: Shingle Nearest Neighbor Search • Open source: google: “audioDB” • Management of tracks, sequences, salience • Automatic indexing parameters • OMRAS2, Yahoo!, AWAL, CHARM, more… • Web-services interface (SOAP / JSON) • Implementation of LSH for large N ~ 1B • 1-10 ms whole-track retrieval from 1B vectors

  9. AudioDB: Shingle Nearest Neighbor Search

  10. Whole-track similarity • Often want to know which tracks are similar • Similarity depends on specificity of task • Distortion / filtering / re-encoding (high) • Remix with new audio material (mid) • Cover song: same song, different artist (mid)

  11. Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks:

  12. Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks: • Requires a threshold for considering shingles to be related • Need a way to estimate relatedness (threshold) for data set

  13. Statistical approaches to modeling distance distributions

  14. Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.

  15. Radius-bounded retrieval performance: cover song (opus task) • Performance depends critically on xthresh, the collision threshold • Want to estimate xthresh automatically from unlabelled data

  16. Order Statistics • Minimum-value distribution is analytic • Estimate the distribution parameters • Substitute into minimum value distribution • Define a threshold in terms of FP rate • This gives an estimate of xthresh

  17. Estimating xthresh from unlabelled data • Use theoretical statistics • Null Hypothesis: • H0: shingles are drawn from unrelated tracks • Assume elements i.i.d., normally distributed • M dimensional shingles, d effective degrees of freedom: • Squared distance distribution for H0

  18. ML for background distribution • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality

  19. Background distribution parameters • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality

  20. Minimum value over N samples

  21. Minimum value distribution of unrelated shingles

  22. Estimate of xthresh , false positive rate

  23. Unlabelled data experiment • Unlabelled data set • Known to contain: • cover songs (same work, different performer) • Near duplicate recordings (misattribution, encoding) • Estimate background distance distribution • Estimate minimum value distribution • Set xthresh so FP rate is <= 1% • Whole-track retrieval based on shingle collisions

  24. Cover song retrieval

  25. Scaling • Locality sensitive hashing • Trade-off approximate NN for time complexity • 3 to 4 orders of magnitude speed-up • No noticeable degradation in performance • For optimal radius threshold

  26. LSH

  27. Remix retrieval via LSH

  28. Current deployment • Large commercial collections • AWAL ~ 100,000 tracks • Yahoo! 2M+ tracks, related song classifier • AudioDB: open-source, international consortium of developers • Google: “audioDB”

  29. Conclusions • Radius-bounded retrieval model for tracks • Shingles preserve temporal information, high d • Implements mid-to-high specificity search • Optimal radius threshold from order statistics • null hypothesis: shingles are drawn from unrelated tracks • LSH requires radius bound, automatic estimate • Scales to 1B shingles+ using LSH

  30. Thanks • Malcolm Slaney, Yahoo! Research Inc. • Christophe Rhodes, Goldsmiths, U. of London • Michela Magas, Goldsmiths, U. of London • Funding: EPSRC: EP/E02274X/1

More Related