10 likes | 223 Vues
Finding Duplicates and Generating Audio Thumbs. Chris Burges, Erin Renshaw, Dan Plastina † , John Platt and Rico Malvar . Communication, Collaboration and Signal Processing Group. Generating Thumbnails. Problem Statement. Finding Duplicates. In my music collection: Find duplicate files
E N D
Finding Duplicates and Generating Audio Thumbs Chris Burges, Erin Renshaw, Dan Plastina†, John Platt and Rico Malvar Communication, Collaboration and Signal Processing Group Generating Thumbnails Problem Statement Finding Duplicates • In my music collection: • Find duplicate files • Find junk audio files • Generate15 second audio • thumbnails for browsing • All automatically! • Load audio file • Compute traces in fixed • user-chosen window • Compare against • fingerprints (FPs) from • previous files • If a match, declare a • duplicate • Else, save FP for that file • Repeat from (1) 'til done How Does It Work? • Use RARE fingerprints (FPs) • = 64 numbers encoding 6 sec • of audio • Duplicate Detection • Find files with similar FPs, or • FPs that match ‘junk’ • Audio Thumbnails • Generate FPs for entire file • Find repeated FPs within file • Form clumps of repeated FPs • Take well-separated, high • energy clump as thumbnail Above: ‘Spectral fullness’ for a 5-verse Dylan song. Results on 40,991 songs: Results: Using 6 quality measures (e.g. ‘Includes sung title?’), blind testing against thumbs chosen 30 seconds into the song gave 28% improvement. †Windows Digital Media Division