330 likes | 489 Vues
Content-based music retrieval (CBMR) leverages acoustic input to identify music tracks by melody. Instead of traditional text queries, CBMR allows users to sing or hum a tune to find corresponding songs. This outline addresses essential CBMR components, including methods like signal processing, similarity comparison, and performance metrics. The study discusses previous work in the field, experimental setups with various databases, and results from notable algorithms including dynamic time warping for improved retrieval accuracy. Future enhancements are also proposed, focusing on efficiency and robustness.
E N D
Outline • What is CBMR? • Methods • Signal processing • Similarity comparison • Experiment results • Demo • Future work
What is CBMR? • CBMR : • Content-based Music Retrieval • Traditional database query : • Text-based or SQL-based • Our goal : • Music retrieval by singing/humming
Related Work • Query by humming by Ghias,Loga and Chamberlin in 1995 • Autocorrelation pitch detection • 183 songs in database • MELDEX system by New Zealand Digital Library Project in 1996 • Gold/Rabiner Algorithm (800 songs) • Sing ‘la’ or ‘ta’ when transposition • Karaoke song recognizer by J.F. Wang in 1997 • Novel pitch detection • 50 songs in database
Flowchart On-line processing Microphone Signal Input Filtering Pitch Tracking Post Signal Processing Sampling 11KHz Mid-level Representation Similarity Comparison QueryResults (Ranked SongList) Midi message Extraction Off-line processing Songs Database
Original Wave Input 小雨中的回憶 11025 Hz 8 Bits Mono
Single Frame Overlap Zoom in Frame 512 points/frame 340 points overlap
Pitch Tracking • Range • E2 - C6 • 82 Hz - 1047 Hz ( - ) • Method • Auto-correlation
Center Clipping Clipping limits are set to r% of the absolute maximum of the auto-correlation data 0 0 0 (a) (b) (c)
Signal Process • Remove violent point & short notes • Down sampling & smoothing • Frequency to semitone • Semitone : A music scale based on A440
Similarity Comparison • Goal • Find the most similar Midi file • Challenge • Tempo variance • Dynamic time warping (DTW) • Tune variance • Key transposition
Compare by DTW Wave File DTW Mid File
Dynamic Time Warping (DTW) j window r(j) r(j-1) window i t(i-1) t(i)
DTW (cont.) j dist(i,j) = |t(i)-r(j)| if ( t(i) = Rest && r(j) = Rest ) dist(i,j) = 0; elseif ( t(i) = Rest || r(j) = Rest) dist(i,j) = restWeight; i
Key Transposition • Mean sift • Binary search in the searching area • O( N) --> O (log N) Mean Searching Area
Score Function • m : length of match string • n : length of input string • e : DTW distance • A = 0.8 • B = 0.6
Experiment Environment • 290 wave files • Wave length : 5 - 8 sec • Wave format : PCM, 11025Hz, 8bits, Mono • Environment • Celeron 450 with 128Mb RAM under Matlab 5.3 • Database • 493 midi files
Experiment Result (Pie) Total time: 4589 sec (15.8 sec/per-wave)
Experiment Result (Pie) - With Rest Total time : 7893 sec (27.2 sec/per-wave)
How to Accelerate? • Branch and bound • O(N) -> O(lnN) • Triangle inequality • d(a,b) + d(b,c) ≧ d(a,c) • Hierarchical • 2 phase • 3/32 sec • 2/32 sec
Experiment Result (Pie) - 3/32 sec Total time : 2358 sec (8.9 sec/per-wave)
Experiment Result (Pie) - 2 Phase Total time: 3006 sec (11.2 sec/per-wave)
Error Analysis • Midi error • Singing error • Low pitch • Broken vocalism • Noise
Future Work • Time consuming • Better similarity comparison • Different comparison unit • Hardware acceleration • Better searching algorithm • Steadier pitch tracking algorithm • Noise handle