1 / 73

Audio Databases

Audio Databases. Metadata. Using metadata to represent audio content is done in a very similar way as we did for video. The metadata used to represent audio content may be viewed as a set of objects spread out cover a time line.

hazel
Télécharger la présentation

Audio Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audio Databases MMDB-Audio

  2. Metadata • Using metadata to represent audio content is done in a very similar way as we did for video. • The metadata used to represent audio content may be viewed as a set of objects spread out cover a time line. • We may index the metadata associated with audio in exactly the same way as we indexed video, and the same query-processing techniques may be used over again. MMDB-Audio

  3. Example: • The following figure shows the line segments associated with part of an opera. • Activity1 may be Act 1 of the opera, activity2 may be Act 1, Scene 1, and so on. MMDB-Audio

  4. Example: (conti.) • Each activity may have an associated set of fields. • Singers: It may be a set valued field containing records having a Role, SingerType and SingerName. If the triple (Lohengrin, Tenor, Rene Kollo) appears in the segment [50, 100), Rene Kollo, a tenor, is singing the role of Lohengrin during the time segment [50, 100) of the opera. • Score: It may be a field of type music_doc which points to a relevant part of the music score associated with the time segment [50, 100). • Transcript: It may be a field of type document that points to the relevant part of the libretto during the time segment [50, 100). MMDB-Audio

  5. Signal-Based Audio Content • In some applications, creation of metadata is somewhat complex, speaker unknown or content unclear. • Audio data is considered as a signal, (x), over time x. • Different features of the signal  are extracted, indexed and stored for efficient retrieval. • Metadata may still be used to complement the signal data. MMDB-Audio

  6. Sample Audio Signals MMDB-Audio

  7. Signal • Period of vibration, T = time taken for a “particle” in the wave to return to its starting position, ex. from point A to point B. • Frequency of vibration, f = number of vibrations per second. f = 1/T. • Velocity, v = the speed of the crests and troughs move to the right. v= w/T = w f, where w denotes the wavelength of the wave. • Amplitude, a = the maximum intensity of the signal associated with the wave. MMDB-Audio

  8. Indexing by Segmentation • Split up the audio signal into relatively homogeneous “windows.” This may be done in one of two ways: • Application developer can specify, a priori, a window size w (in sec. or min.), and assume that the wave’s properties within that window are obtained by averaging. • Use a homogeneity predicate as in the case of images, except that this homogeneity predicate applies to the one-dimensional case.. MMDB-Audio

  9. Windowing Using audio signal The following figure shows a nonhomogeneous audio signal. After split into five windows, each window is homogeneous in the sense that it has a constant amplitude, wavelength, and wave velocity. MMDB-Audio

  10. Indexing Using Feature Extraction • After segmentation, the audio signal may be viewed as a sequence of n windows, w1, …, wn. • For each window, we extract some features associated with the audio signal. • If k features are extracted, then an audio signal may be considered to be a sequence of n points in a k-dimensional space. MMDB-Audio

  11. Example Features • Intensity(I): the power of the signal generated by the wave (in Watts per square meters). Where  is the density of the material through which the sound is being propagated. • Loudness(L): Where L0 denotes the loudness with the lowest frequency (about 15Hz) that a human ear can detect. MMDB-Audio

  12. Content Index In general, to index the content of an audio signal, we proceed with the following two step: • Find a set w1, …, wn of window segments. • For each window wi, store a vector consisting of K acoustical attributes. An audio database may be viewed as a set of (K+3)-tuples consisting of the audio source (audio file), the window (within that audio file), the duration of the window, and the K feature values associated with that window. • A k-d tree can be used to index audio data. MMDB-Audio

  13. Content-based Retrieval for Music Databases MMDB-Audio

  14. Introduction • The management of large collections of music data in a multimedia database has received much attention in the past few years. • For music content-based retrieval, we can extract the features, such as melodies, rhythms and chords, from the music data and develop indices that will help to retrieve the relevant music data quickly. MMDB-Audio

  15. Music Feature string Ex: “ sol-do-re-mi-mi-mi-mi-re-mi-do-do” Melody feature string:eabccccbaa Rhythm string:1-1-1-2-2-1-1-1-1-2-2 Music feature sting:e1a1b1c2c2c1c1b1a2a2 A sample of “You Are My Sunshine” MMDB-Audio

  16. Features of Music Data • Coding scheme: a music object  a sequence of music segments • music segment = (segment type, segment duration, segment pitch) • four segment types: ┌┐(type A), └┘(type B), ┌┘(type C), and └┐(type D) MMDB-Audio

  17. Features of Music Data • For example, the sequence of music segments: (B,3,-3) (A,1,+1) (D,3,-3) (B,1,-2) (C,1,+2) (C,1,+2) (C,1,+1) MMDB-Audio

  18. music segment = (type, duration, pitch) MMDB-Audio

  19. Music Data Retrieval: System Architecture MMDB-Audio

  20. Indexing • String Indexing for music data • Suffix tree • Numeric Indexing for music data • R-tree MMDB-Audio

  21. Suffix tree • A suffix tree is an index structure that has been proposed to locate strings that are exactly matched to a target string. • No two edges out of a node can have edge-labels beginning with the same character. • For any leaf i, the concatenation of the edge-labels on the path from the root to leaf i exactly spells out the suffix of string that starts at position i. MMDB-Audio

  22. babc ababc ab  ababc babc 2 1 1 2 abc c 1 3 ab b c ab b abc c c abc 5 abc c c abc 1 3 4 2 1 3 4 2 Ex:ababc {ababc,babc,abc,bc,c}    MMDB-Audio

  23. c a b b 5 c a a 4 2 c 1 3 Ex:”Do Re Do Re Mi” →ababc MMDB-Audio

  24. Numeric Mapping • Numeric Mapping Function • v(m):the integer value of segment of m adjacent notes • m: adjacent notes from melody feature string • P(xi):the integer value of each note • 1 i m MMDB-Audio

  25. b c d b c b c d b c d b c 1 2 3 1     2 3 1 2 100 101 102 103=1321     100 101 102 103=2132 Numeric Mapping (Con.) • For example: A music feature string denoted by ‘bcdbc’ , n=10, m=4 MMDB-Audio

  26. Example: • two tigers(S1: Do Re Mi Do Do Re Mi Do) The integer value of music of two tigers. MMDB-Audio

  27. Numeric Indexing Structure (R-Tree) Non-leaf Node Leaf Node Link List MMDB-Audio

  28. Pitch Change • abca→bcdb─》1,1,-2 • m: adjacent notes from melody feature string • Adj: the maximum value of distance of two pitches • D: the total number of distances of pitches MMDB-Audio

  29. Music Segment value V(4) Integer value 0120 10,10, 7 10*190+10*191+7*192 2727  1200 10, 7, 9 10*190+7*191+9*192 3392 2001 7,9,10 7*190+9*191+10*192 3778 0012 9,10, 10 9*190+10*191+10*192 3809 0120 10,10, 7 10*190+10*191+7*192 2727  Example: “abcaabca”Suppose: m=10, Adj=9, D=19 MMDB-Audio

  30. Numeric Index MMDB-Audio

  31. V(4) V(4) Searching in Numeric Index • Exact Matching • For example: Music query segment is ‘ccdbb’ • {ccdbb}→{ccdb} →{cdbb} {1322} {1132} MMDB-Audio

  32. Non-leaf Node Leaf Node Link List • {s2,s3} {s2,s3}→ {s2,s3} • position_s2 2,3),position_s3 1,4) →s2. MMDB-Audio

  33. Approximate Searching We can examine the difference between the transformed value of the query string and existing data. • n: the number of pitches • m: adjacent notes from melody feature string • h: the distance of two pitches MMDB-Audio

  34. Example: Ex: b b c d a b c d 1 1 2 3 0 1 2 3 3 2 1 1 3 2 1 0 Approximate matching conditions for m=4, n=10,h=1 MMDB-Audio

  35. Multi-Feature indexing • Combine Suffix tree • Independent Suffix tree • Twin Suffix tree • Grid-Twin Suffix tree • Numeric Index • Hybrid Multi-feature Index MMDB-Audio

  36. Combine Suffix Tree • The feature strings are directly used to construct the index in the index structure Combined Suffix Tree. Ex:”a1a2b1→{12,7}” “121→{12,7,1,6…}” MMDB-Audio

  37. Independent Suffix Tree • The Independent Suffix Trees separates the feature strings into a melody and a rhythm string and stores them in two independent suffix trees. (Melody:ababc) (Rhythm:12122) constructed from “a1b2a1b2c2” MMDB-Audio

  38. Twin Suffix Tree The Twin Suffix Tree constructed from “a1b2a2b1a2b2c2” MMDB-Audio

  39. Grid-Twin Suffix Tree ”a1b2a2c1a3” MMDB-Audio

  40. Condensed Grid-Twin Suffix Tree MMDB-Audio

  41. entry Music ID entry entry Music ID Music ID entry entry Music ID Music ID entry Music ID Condensed Grid-Twin Suffix Tree • “abaca” • “caaca” MMDB-Audio

  42. rhythm Melody:“a1b1c1a1” 1000 500 melody 0 1000 500 Multi-Feature Numeric Indexing for Music Data MMDB-Audio

  43. Non- Leaf Node Leaf Node Link List Multi-Feature Numeric Indexing for Music Data MMDB-Audio

  44. Multi-FeatureNumeric Indexing for Music Data rhythm chord 500 melody 500 MMDB-Audio

  45. (4, 3.75) (3, 5) (5, 5) (6, 2) (1.5, 2 ) (2, 3) (1, 1) Hybrid Multi-Feature Index • Using a multi-feature tree structure instead of grid structure in GTST. MMDB-Audio

  46. Suffix Trees with Bit Arrays • Instead of the links between corresponding feature nodes in Twin Suffix Tree, the bit arrays are created to indicate the relationships between suffix trees. MMDB-Audio

  47. Feature Extraction of Music Data • We can find some sequence of notes appeared more than one time in a music object, which are called the repeating patterns. • A lot of researches in musicology and music psychology consent that the repeating pattern is one of general features in music structure modeling. MMDB-Audio

  48. Repeating Patterns of Music Data • Repeating patterns: In string S, there is a sub-string appearing more than once and its length being equal to or greater than 2 . • Non-trivial repeating patterns: The frequency of the repeating pattern X appearing in the string S is more than it is appearing in any other repeating patterns. • Fault tolerant non-trivial repeating patterns: It allows the sequences with partial different notes being as in the same non-trivial repeating pattern. MMDB-Audio

  49. Example: Consider the melody string “C-D-E-F-C-D-E-C-D-E-F”, this melody string has ten repeating patterns non-trivial: freq(“C-D-E-F”) = freq(“D-E-F”) = freq(“E-F”) = freq(“F”) =2 freq(“C-D-E”) = freq(“C-D”) = freq(“D-E”) = freq(“C”) =freq(“D”) = freq(“E”) = 3. ===>only “C-D-E-F” and “C-D-E” are non-trivial. MMDB-Audio

  50. Music Feature Extractions • Correlative Matrix • FastPET • RP-Tree • 2RC • Similar Non-trivial Repeating Pattern • Fault Tolerance Non-trivial Repeating Patterns MMDB-Audio

More Related