Sound Detection

Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004

Objective • Learn model of sound object from few (10-20) examples and distinguish from all other sounds • Examples of sound classes: • Gunshots, screams, laughter, car horns, meow, dog bark, etc

Applications • “Tell me if you hear a gunshot.” (monitoring) • “Get me video clips containing dogs barking.” (search and retrieval) • “What’s going on?” (scene understanding)

Why its difficult • Sound classes have large variations • Sounds are often ambiguous without context • Overlaid “noise” obscures sound

Sound or not? Which of these sounds are not from their named classes? Car horn Dog bark Laser gun

Previous work • Sound Classification (Wold 1996, Casey 2001, etc) • Categorize short sound clips • Reasonable accuracy (5-20% error) • Sound Detection (Defaux 2000, Piamsa-nga 1999) • Localize and recognize sound objects in long clips • Poor performance or assumption of unrealistic conditions (e.g., very quiet background)

Clip 1 Clip 2 … Clip N Detection via Windowed Search Long Track Clip Classifier Return locations of detected sound object Break audio track into short overlapping short clips Independently classify short clips as object or non-object

Features Features Features Features Time-frequency analysis: windowed Fourier transform Extract power percentage in each band over time and total power over time Compute features used for classification Representation meows phone rings Raw Representation

Classification Features • Diverse feature set: • Different sound classes are distinctive in different ways • means and standard deviations of power at different frequencies • Band-width, peaks, loudness, etc. • 138 features in all

Classification by Decision Trees • Try to find simple rules that discriminate object from non-object • Each decision is based on a threshold of a feature value • Assign confidence based on likelihood of data for object and non-object classes at each leaf node Decision nodes Leaf Nodes

Boosted Trees • Problem: One decision tree by itself may not be a great classifier • Solution: Use several trees, with each one focusing on the mistakes of previously learned trees • Adaboost: • Weight training data uniformly • Learn a decision tree classifier on weighted data • Re-weight data giving more weight to incorrectly classified examples • Final classification based on linear combination of confidences from all learned decision trees

Examples of Decision Trees Meow Gunshot Low percentage of power in low frequencies in mid-time of sound High power amplitude range Very high power amplitude range Gunshot More complex tree that focuses on examples misclassified by tree above

Cascade of Classifiers • Goal: eliminate false positives with few false negatives in early stages • Advantages: • Allows use of large set of negative training examples • Improves classification speed • Dangers: cannot recover from false negatives Pass (5%) Pass (2%) Pass (0.005%) Sound Clip Stage 1 Stage 2 Stage 3 Pass Fail Fail Fail Fail

Best Performance Worst Performance Results: Classification Error

Results: ROC curves Note: to approximate negative error rate divide FP by 25,000

Results: Anecdotal Gunshots Female Laugh Male Laugh Swords Scream

Sound Detection

Sound Detection

Presentation Transcript

~ Sound ~

SOUND

Sound

SOUND

Sound

Sound

SOUND

Sound

Sound

Sound

SOUND

Sound

Sound

Sound

Active detection of sound in the inner ear

Sound

Sound

Sound

SOUND

Sound

Sound

Sound Detection