An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases

An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen Proceeding of International Conference on Database Systems for Advanced Applications (DASFAA 05) Speaker: Pei-Min Chou Date:2006/4/21

Introduction • Problem finding all the repeating patterns • exact repeating pattern • Suffix tree • String-join • Similarity pattern • costs a lot of time

Pitch String: 67,64,64, 65,62,62, 60,62,64,65, 67,67,67 Interval String: -3, 0, +1, -3, 0, -2, +2,+2,+1, +2, 0, 0 Definition • Pitch string: P=(p1,p2,…pm), ordered list, |P|=m • Interval string: D=(d1,d2,…dm-1), di=pi+1-pi • Interval segment: S[i:j]=(di,di+1,…,dj) • Constraints: Min_len , Max_len • Ex: String:(a,b,c,d) Min_len=2, Max_len=3 Segment (min_len): (a,b),(b.c),(c,d) Segment (max_len):(a,b,c),(b.c.d)

3 Def. Similar • Edit distance: edit(P,Q)=distance of transform segment P into segment Q • Distance threshold:δP=|P|*γ • |P|:segment length, γ:distance threshold ratio, γ<1 • Similar segment: if edit(P,Q)≤ δP • Q length is at least=|P|- δP • Ex:γ:50% • P=(1,2,3,1,1,2,3,1),Q=(1,3,3,1,2,3,1),edit(P,Q)=2 • δP=8*0.5=4 • edit(P,Q)=2 <4Q is similar to P

Def. Histogram vector • Histogram vector: • D={a1,a2,…an} • HV(S) = <h1S,h2S,….,hnS>, hkS: count of ak • Ex: • Segment S=(1,2,3,1,1,2,3,1) • D={1,2,3} • HV(S)=<4,2,2> • |HV(S)|=8 4 3 2 1 0 1 2 3

Def. Histogram distance • Histogram distance • HD(S1,S2)=max(ins(HV(S1), HV(S2)), ins(HV(S2),HV(S1))) • Ex1: • S1=(1,2,3,1,1,2,3,1) • S2=(1,3,3,1,2,3,1) • S1=S2={1,2,3} • HV(S1)=<4,2,2> ins(HV(S1), HV(S2))=1 • HV(S2)=<3,1,3> ins(HV(S2), HV(S1))=2 • Ex2: • S1=(1,2,3,1,1,2,3,1,3,4,5,3,4,5) • S2=(1,3,1,2,3,1,4,5,2,4,4) • edit(S1,S2)=5, HD(S1,S2)=4 ->HD(S1,S2)=2

Def. ARP • Extension P (Ext(P)) : • pivot P and all similar segments S • Each of segments satisfy overlapping threshold • Ex: P=(a,b,c,d) and min_len=2,max_len=3,δP=1 • segments=((a,b,d)(a,b,c)(b,c,d)(a,b,c,b)(a,b,c,d)) • Support: • The number of segments in an extension= |Ext(P)| • Approximate Repeating Pattern(APR) • |Ext(P)|≥min_sup

Def. Overlap • Overlapping degree: • I[a:b] and J[c:d], a≤c≤b and I similar to J • Overlapping degree=(b-c+1) • Ex:12122212 • Overlapping threshold: • OIJ=min(|I|,|J|)*ρ • ρ:overlapping threshold ratio,0≤ρ≤1 a c b d Degree=(7-4+1)=4

Extraction procedure • Index construction • Candidate generation • Range query formulation • MBR retrieval • Estimation for max number of similar segments • Candidate pruning after HD computations • Candidate pruning after HD computations • ARP extraction

S2 S5,S6 S7 S1,S3 S4 I=(0,0)(2,2) MBR(R1) R1 R3 RM pair={(1:5,2)} R2 Child-p2 I=(1,0)(2,1) I=(0,2)(1,2) R2 R3 RM pair={(1:2,2),(3:5,2)} RM pair={(1:4,2)} S1,S3,S4,S7 S2,S5,S6 Index construction • Construct parametric R*-tree • P(1,2,2,1,1) DP={1,2} • Segment len_2:S1(1,2),S2(2,2),S3(2,1),S4(1,1) • Segment len_3:S5(1,2,2),S6(2,2,1),S7(2,1,1) • Minimal bounding rectangle (MBR) • RM pair :(R,M), R: union range ,M: minimal length Level1 Level2 Segment level Dimention2 Child-p1 Dimention1 S1(1,2)->HV(S1)=<1,1> S2(2,2)->HV(S2)=<0,2>

S2 S5,S6 S7 S1,S3 R1 R3 S4 R2 Candidate generation • γ=0.5 , ρ=0.5 • Range query formulation • S7=P=(2,1,1) , Dp={1,2} , HV(P)=<2,1> • δP=3*0.5=1.5 • Range query=(HV(P), δP)=(<2,1>,1) • MBR retrieval • R1 and R2 included

Estimation for max number of similar segments • MLx=max(|P|-δP,Mx) • Mx :Each RM pair(Rx,Mx) in MBR • Numx:(n-L)/(L-m)+1,n=b-a+1,m=ρ*MLx • NumR: sum max num of MBR • Ex(Cont.) P=(2,1,1) • RM pair in R2=(1:2,2),(3:5,2) • ML1=max(3-1,2)=2, ML2=2 ,m=0.5*2=1 • num1=(2-2)/(2-1)+1=1,num2=1 • numR=1+1=2

m m m m m m m m Numx explain Numx:(n-L)/(L-m)+1 ,n=b-a+1,m=ρ*MLx =((b-a+1-L)/(L-m)) +1 L-m L a b n

Candidate pruning before & after HD computation • Before HD computation • Processed at level above segment level • If numR<min_sup ->prune and go next query • Else (numR≥min_sup)->recursive lower level • After HD computation • Processed at segment level • Compute the Hdistance between P and segment in MBRs. • All segment satisfy δPpermutated and compute max_num • If max_num<min_sup ->prune • Else(max_num≥min_sup) ->candidate ARP and segments

ARP extraction • Given a candidate ARP and its candidate segments • compute the edit distance • If candidate ARP and segments distance> threshold ->remove • Generate all the extention of candidate ARP by consider overlapping threshold • If support of extension < min_sup ->prune • Else Find a ARP

Experiment 1614 12 10 8 6 4 2 0 14 12 10 8 6 4 2 0 20 18 16 14 12 10 8 6 4 Min_len (interval number) 10 12 14 16 18 20 22 24 26 28 30 Max_len (interval number) 181614 12 10 8 6 4 2 0 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 distance threshold(%) 10 9 8 7 6 5 4 3 2 min_sup

An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases

An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases

Presentation Transcript

Repeating patterns

An Efficient Approach to Column Selection in HPLC Method Development

Search for Approximate Matches in Large Databases

G-string : a novel approach for efficient search in graph databases

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases

Repeating Patterns

Repeating and Growing Patterns

An Efficient Approach to Clustering in Large Multimedia Databases with Noise

Efficient Discovery of Frequent Approximate Sequential Patterns

An Efficient Index Structure for String Databases

Finding Approximate Repeating Patterns from Sequence Data

Repeating patterns

Music Classification Using Significant Repeating Patterns

An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases

An Efficient Index Structure for String Databases

An Efficient Approach to Learning Inhomogenous Gibbs Models

An approach to narratology in music analysis

APPROXIMATE QUERY PROCESSING IN DATABASES

Robotic Automation – An Advanced Approach to Efficient Production

An Efficient Algorithm for Read Matching in DNA Databases