1 / 58

The Problem with Music: Modeling Distance Distributions of Large Music Collections

The Problem with Music: Modeling Distance Distributions of Large Music Collections. Prof. Michael Casey Program in Digital Musics Dartmouth College, Hanover, NH. a.k.a. The Problem with Multimedia: Music Music Videos Videos Images. Scalable Similarity. 8M tracks in commercial collection

paul2
Télécharger la présentation

The Problem with Music: Modeling Distance Distributions of Large Music Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Problem with Music:Modeling Distance Distributions of Large Music Collections Prof. Michael Casey Program in Digital Musics Dartmouth College, Hanover, NH Comp. Sci. Colloquium

  2. a.k.a.The Problem with Multimedia:MusicMusic VideosVideosImages

  3. Scalable Similarity • 8M tracks in commercial collection • 6B Images on WWW • Require scalable nearest-neighbor methods • Increase scale, decrease search complexity

  4. Example: Hattogate

  5. Example: Remixing / Sampling in Yahoo! Music • Original Track • Remix 1 • Remix 2 • Remix 3

  6. Example: 3B Images in Flickr

  7. Specificity • Partial document (sub-track) retrieval • Alternate versions: remix, cover, live, album • Task is mid-high specificity

  8. Machine Listening

  9. Feature Extraction

  10. Feature Extraction

  11. Feature Extraction

  12. Feature Extraction

  13. Feature Extraction

  14. Feature Extraction

  15. Audio Shingles • Shingles provide contextual information about features • Originally used for Internet search engines: • Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: • “Syntactic Clustering of the Web”.Computer Networks 29(8-13): 1157-1166 (1997) • Related to N-grams, overlapping sequences of features • Applied to audio domain by Casey and Slaney : • Casey, M.   Slaney, M.   “The Importance of Sequences in Musical Similarity”, in Proc. • IEEE Int. Conf. onAcoustics, Speech and Signal Processing, 2006. ICASSP 2006 , concatenate l frames of m dimensional features A shingle is defined as:

  16. Audio Shingle Similarity

  17. Audio Shingle Similarity For shingles with M dimensions (M=l.m); m=12, 20; l=30,40 , a query shingle drawn from a query track {Q} , database of audio tracks indexed by (n) , a database shingle from track n Shingles are normalized to unit vectors, therefore:

  18. AudioDB: Shingle Nearest Neighbor Search

  19. Whole-track similarity • Often want to know which tracks are similar • Similarity depends on specificity of task • Distortion / filtering / re-encoding (high) • Remix with new audio material (mid) • Cover song: same song, different artist (mid)

  20. Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks:

  21. Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks: • Requires a threshold for considering shingles to be related • Need a way to estimate relatedness (threshold) for data set

  22. SCALE • Mazurkas: 10,000 tracks 10-100ms features • 3s clips (30 – 300 frames per vector) • 12d – 20d features (360 – 600d vectors) • Yahoo! Music • 6M tracks • 1000 vectors per track • (6M x 1k)^2 search for near neighbours

  23. LSH

  24. Approximate Near Neighbor Matching

  25. Approximate near neighbors • In many applications we need only near neghbors • We can exploit this by allowing a degree of approximation in retrieval

  26. Space partitioning

  27. Curse of dimensionality d=4 d=8 d=1024 Pr(dist)‏ dist.

  28. Border effects in high d

  29. ε-NN : approximate near neighbors

  30. Setting the range

  31. Hashing • Types of hashes • String : put Bash vs Bush in different bins • Locality sensitive : close matches in same bin • High-dimensional and probabilistic • Nearest Neighbor implementations • Pair-wise distance computation • 1,000,000,000,000 comparisons in 2M song database • Hash bucket collisions • 1,000,000,000 hash projections

  32. Exact matching via hashing • Audio fingerprinting • Shazzam, etc. • Make the feature robust • Use exact matching on integer hash • Find a sequence of hashes to identify specific recording or image • Drawback: only exact matches possible

  33. Locality-Sensitive Hashing (Indyk-Motwani’98)‏ • Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have: • Pr[h(p)=h(q)] is “high” if p is “close” to q • Pr[h(p)=h(q)] is “low” if p is”far” from q

  34. Locality Sensitive Hashing

  35. Random Projections • Random projections estimate distance • Multiple projections improve estimate

  36. h’s are locality-sensitive • Pr[h(p)=h(q)]=(1-D(p,q)/d)k • We can vary the probability by changing k Pr k=1 Pr k=2 distance distance

  37. LSH Random Projections3d to 2d

  38. Statistical approaches to modeling distance distributions

  39. Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.

  40. Radius-bounded retrieval performance: cover song (opus task) • Performance depends critically on xthresh, the collision threshold • Want to estimate xthresh automatically from unlabelled data

  41. Order Statistics • Minimum-value distribution is analytic • Estimate the distribution parameters • Substitute into minimum value distribution • Define a threshold in terms of FP rate • This gives an estimate of xthresh

  42. Estimating xthresh from unlabelled data • Use theoretical statistics • Null Hypothesis: • H0: shingles are drawn from unrelated tracks • Assume elements i.i.d., normally distributed • M dimensional shingles, d effective degrees of freedom: • Squared distance distribution for H0

  43. ML for background distribution • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality

  44. Background distribution parameters • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality

  45. Minimum value over N samples

  46. Minimum value distribution of unrelated shingles

  47. Estimate of xthresh , false positive rate

  48. Unlabelled data experiment • Unlabelled data set • Known to contain: • cover songs (same work, different performer) • Near duplicate recordings (misattribution, encoding) • Estimate background distance distribution • Estimate minimum value distribution • Set xthresh so FP rate is <= 1% • Whole-track retrieval based on shingle collisions

  49. Misattributions • Joyce Hatto: 100% of known misattributions in first rank • Sergie Fiorentino • Eleven out of twenty-six Mazurkas performances on another Concert Artists/Fidelio disc, issued under the name of Sergio Fiorentino, are in fact copies of recordings by other artists. This is the first time that such practices have been found in the Concert Artist‘ Fidelio recordings issued other than under the name of Joyce Hatto, and prompts speculation as to how much more misattributed material remains to be found in the Concert Artists/Fidelio catalogue. Click here for further details.

More Related