1 / 57

Polyphonic music information retrieval based on multi-label cascade classification system

www.kdd.uncc.edu. Polyphonic music information retrieval based on multi-label cascade classification system. http//:www.mir.uncc.edu. presented by Zbigniew W. Ras University of North Carolina, Charlotte, NC College of Computing and Informatics. Student: Wenxin Jiang

miller
Télécharger la présentation

Polyphonic music information retrieval based on multi-label cascade classification system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.kdd.uncc.edu Polyphonic music information retrieval based on multi-label cascade classification system http//:www.mir.uncc.edu presented by Zbigniew W. Ras University of North Carolina, Charlotte, NC College of Computing and Informatics

  2. Student: Wenxin Jiang Advisor: Dr. Zbigniew W. Ras Polyphonic music information retrieval based on multi-label cascade classification system

  3. Survey of MIR- http://mirsystems.info/ • 43 MIR systems • Most are pitch estimation-based melody and rhythm match • This presentation will focus on timbre estimation

  4. MIRAI - Musical Database (mostly MUMS) [music pieces played by 59 different music instruments] Goal: Design and Implement a System for Automatic Indexing of Music by Instruments (objective task) and Emotions (subjective task) Outcome: Musical Database [music pieces indexed by instruments and emotions]. Resulting Database will be represented as FS-tree guarantying efficient storage and retrieval .

  5. Automatic Indexing of Music What is needed? Database of monophonic and polyphonic music signals and their descriptions in terms of new features (including temporal) in addition to the standard MPEG7 features. These signals are labeled by instruments and emotions forming additional features called decision features. Why is needed? To build classifiers for automatic indexing of musical sound by instruments and emotions.

  6. MIRAI - Cooperative Music Information Retrieval System based on Automatic Indexing Indexed Audio Database Query … … Instruments … Durations … Query Adapter … … Music Objects Empty Answer? User …

  7. Binary File PCM : Sampling Rate 44.1K Hz 16 bits 2,646,000 values/min. Raw data--signal representation PCM (Pulse Code Modulation) - the most straightforward mechanism to store audio. Analog audio is sampled & individual samples are stored sequentially in binary format.

  8. Challenges to applying KDD in MIR The nature and types of raw data

  9. Feature extractions Amplitude values at each sample point lower level raw data form Feature Extraction Higher level representations Feature Database manageable traditional pattern recognition classification clustering regression

  10. MPEG7 features Hamming NFFT Window FFT points Power STFT Spectral Centroid Spectrum Log Attack Time Signal envelope Temporal Centroid Signal Instantaneous Harmonic Spectral Spread Harmonic STFT Peaks Instantaneous Detection Harmonic Spectral Centroid Hamming Window Fundamental Frequency Instantaneous Harmonic Spectral Deviation Instantaneous Harmonic Spectral Variation

  11. Derived Database Extended MPEG7 features Other features & new features

  12. Hierarchical Classification Schema I

  13. Schema II - Hornbostel Sachs Idiophone Membranophone Aerophone Chordophone Lip Vibration Single Reed Free Side C Trumpet Tuba Bassoon Whip Flute French Horn Oboe Alto Flute

  14. Schema III - Play Methods …… Blow Bowed Muted Picked Pizzicato Shaken Alto Flute Flute Piccolo Bassoon ……

  15. Database Table Xin Cynthia Zhang Xin Cynthia Zhang 17 17

  16. Example 1 2 1 2 3 LevelI C[1] C[2] d[1] d[2] d[3] LevelII 1 2 1 2 C[2,1] C[2,2] d[3,1] d[3,2] Classification Attributes Decision Attributes

  17. Classification • 90% training, 10% testing. • 10 folds. • Hierarchical (Schema I) vs none hierarchical. • Compare with different classifiers. • J48 tree • Naïve Baysian

  18. Results of the none-hierarchical Classification

  19. Results of the hierarchical Classification (Schema I) with MPEG7 features

  20. Results of the hierarchical Classification (Schema I) with all features

  21. Classification Results

  22. Polyphonic sounds – how to handle? • Single-label classification Based on Sound Separation • Multi-labeled classifiers Problems? Polyphonic Sound Get frame Classifier . segmentation Feature extraction Sound separation Get Instrument Sound Separation Flowchart Information loss during the signal subtraction

  23. This presentation will focus on timbre estimation in polyphonic sounds and designing multi-labeled classifiers • timbre relevant descriptors • Spectrum Centroid, Spread • Spectrum Flatness Band Coefficients • Harmonic Peaks • Mel frequency cepstral coefficients (MFCC) • Tristimulus

  24. Sub-pattern of single instrument in mixture Feature extraction

  25. Timbre estimation based on multi-label classifier 40ms segmentation Feature Extraction Single label database Acoustic descriptors Features Classifier

  26. Flowchart of multi-label classification system Polyphonic Sound Perform multiple classifying Feature extraction Get frame Multiple labels Finish all the Frames estimation Get Final winners Voting process based on context

  27. Timbre Estimation Results based on different methods [Instruments - 45, Training Data (TD) - 2917 single instr. sounds from MUMS, Testing on 308 mixed sounds randomly chosen from TD, window size – 1 sec, frame size – 120ms, hop size – 40ms, MFCC extracted from each frame (following MPEG-7)] Threshold 0.4 controls the total number of estimations for each index window.

  28. Polyphonic Sound (window) Polyphonic Sounds Classifiers Feature extraction Get frame Multiple labels Compressed representations of the signal: Harmonic Peaks, Mel Frequency Ceptral Coefficients (MFCC), Spectral Flatness, …. Irrelevant information (inharmonic frequencies or partials) is removed. Violin and viola have similar MFCC patterns. The same is with double-bass and guitar. It is difficult to distinguish them in polyphonic sounds. More information from the raw signal is needed.

  29. Short Term Power Spectrum – low level representation of signal (calculated by STFT) Spectrum slice – 0.12 seconds long Power Spectrum patterns of flute & trombone can be seen in the mixture

  30. Experiment: Middle C instrument sounds (pitch equal to C4 in MIDI notation, frequency -261.6 Hz Training set: Power Spectrum from 3323 frames - extracted by STFT from 26 single instrument sounds: electric guitar, bassoon, oboe, B-flat, clarinet, marimba, C trumpet, E-flat clarinet, tenor trombone, French horn, flute, viola, violin, English horn, vibraphone, Accordion, electric bass, cello, tenor saxophone, B-flat trumpet, bass flute, double bass, Alto flute, piano, Bach trumpet, tuba, and bass clarinet. Testing Set: Fifty two audio files are mixed (using Sound Forge ) by two of these 26 single instrument sounds. Classifier – (1) KNN with Euclidean distance (spectrum match based classification); (2) Decision Tree (multi label classification based on previously extracted features)

  31. Timbre Pattern Match Based on Power Spectrum n – number of labels assigned to each frame; k – parameter for KNN

  32. Hierarchical structure Flute English Horn Viola Violin

  33. Instrument granularity classifiers which are trained at each level of the hierarchical tree Hornbostel/Sachs

  34. Modules of cascade classifier for single instrument estimation --- Hornboch /Sachs Pitch 3B 96.02% 91.80% 98.94% * = 95.00% >

  35. New Experiment: • Middle C instrument sounds (pitch equal to C4 in MIDI notation, frequency - 261.6 Hz • Training set: • 2762 frames extracted from the following instrument sounds: • electric guitar, bassoon, oboe, B-flat, clarinet, marimba, C trumpet, • E-flat clarinet, tenor trombone, French horn, flute, viola, violin, English horn, vibraphone, • Accordion, electric bass, cello, tenor saxophone, B-flat trumpet, bass flute, double bass, • Alto flute, piano, Bach trumpet, tuba, and bass clarinet. • Classifiers – WEKA: • (1) KNN with Euclidean distance (spectrum match based classification); • Decision Tree (classification based on previously extracted features) • Confidence – • ratio of the correct classified instances over the total number of instances

  36. Classification on different Feature Groups

  37. Feature and classifier selection at each level of cascade system KNN + Band Coefficients

  38. Classification on the combination of different feature groups Classification based on KNN Classification based on Decision Tree

  39. From those two experiments, we see that: • KNN classifier works better with feature vectors • such as spectral flatness coefficients, • projection coefficients and MFCC. • Decision tree works better with harmonic peaks • and statistical features. • Simply adding more features together does not improve • the classifiers and sometime even worsens classification • results (such as adding harmonic to other feature groups).

  40. Feature and classifier selection at each level of Cascade System - Hornbostel/Sachs hierarchical tree Feature and classifier selection at top level

  41. Feature and classifier selection at second level

  42. Feature and classifier selection at third level

  43. Feature and Classifier Selection Feature and Classifier Selection Table for Level 1 Feature and Classifier Selection Table for Level 2

  44. HIERARCHICAL STRUCTURE BUILT BY CLUSTERING ANALYSIS Common method to calculate the distance or similarity between clusters: single linkage (nearest neighbor), complete linkage (furthest neighbor), unweighted pair-group method using arithmetic averages (UPGMA), weighted pair-group method using arithmetic averages (WPGMA), unweighted pair-group method using the centroid average (UPGMC), weighted pair-group method using the centroid average (WPGMC), Ward's method. Most common distance functions: Euclidean, Manhattan, Canberra (examines the sum of series of a fraction differences between coordinates of a pair of objects), Pearson correlation coefficient (PCC) – measures the degree of association between objects, Spearman's rank correlation coefficient. Clustering algorithm – HCLUST (Agglomerative hierarchical clustering) – R Package

  45. Testing Datasets (MFCC, flatness coefficients, harmonic peaks) : The middle C pitch group which contains 46 different musical sound objects. Each sound object is segmented into multiple 0.12s frames and each frame is stored as an instance in the testing dataset. There are totally 2884 frames We also extract three different features (MFCC, flatness coefficients, and harmonic peaks) from those sound objects. Each feature produces one dataset of 2884 frames for clustering. Clustering: When the algorithm finishes the clustering process, a particular cluster ID is assigned to each single frame.

  46. Contingency Table derived from clustering result

  47. Evaluation result of Hclust algorithm (14 results which yield the highest score among 126 experiments w – number of clusters, α - average clustering accuracy of all the instruments, score= α*w

  48. Clustering result from Hclust algorithm with Ward linkage method and Pearson distance measure; Flatness coefficients are used as the selected feature “ctrumpet” and “batchtrumpet” are clustered in the same group. “ctrumpet_harmonStemOut” is clustered in one single group instead of merging with “ctrumpet”. Bassoon is considered as the sibling of the regular French horn. “French horn muted” is clustered in another different group together with “English Horn” and “Oboe” .

More Related