Finding Duplicates and Generating Audio Thumbs
10 likes | 236 Vues
Finding Duplicates and Generating Audio Thumbs. Chris Burges, Erin Renshaw, Dan Plastina † , John Platt and Rico Malvar . Communication, Collaboration and Signal Processing Group. Generating Thumbnails. Problem Statement. Finding Duplicates. In my music collection: Find duplicate files
Finding Duplicates and Generating Audio Thumbs
E N D
Presentation Transcript
Finding Duplicates and Generating Audio Thumbs Chris Burges, Erin Renshaw, Dan Plastina†, John Platt and Rico Malvar Communication, Collaboration and Signal Processing Group Generating Thumbnails Problem Statement Finding Duplicates • In my music collection: • Find duplicate files • Find junk audio files • Generate15 second audio • thumbnails for browsing • All automatically! • Load audio file • Compute traces in fixed • user-chosen window • Compare against • fingerprints (FPs) from • previous files • If a match, declare a • duplicate • Else, save FP for that file • Repeat from (1) 'til done How Does It Work? • Use RARE fingerprints (FPs) • = 64 numbers encoding 6 sec • of audio • Duplicate Detection • Find files with similar FPs, or • FPs that match ‘junk’ • Audio Thumbnails • Generate FPs for entire file • Find repeated FPs within file • Form clumps of repeated FPs • Take well-separated, high • energy clump as thumbnail Above: ‘Spectral fullness’ for a 5-verse Dylan song. Results on 40,991 songs: Results: Using 6 quality measures (e.g. ‘Includes sung title?’), blind testing against thumbs chosen 30 seconds into the song gave 28% improvement. †Windows Digital Media Division