480 likes | 694 Vues
Discovering RFM Sequential Patterns From Customers’ Purchasing Data. 中央大學資管系 陳彥良 教授 Date: 2014/8/21. Agenda. Introduction Related Work Problem Definition Algorithm Performance Evaluation Conclusion. Introduction 1. Sequential Pattern Mining 1. Sequential pattern mining
 
                
                E N D
Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2014/8/21
Agenda • Introduction • Related Work • Problem Definition • Algorithm • Performance Evaluation • Conclusion
Introduction1 Sequential Pattern Mining1 • Sequential pattern mining • To find the relationships between occurrences of sequential events • To find if there exist any specific order of the occurrences. • Example • Every time Microsoft stock drops 5%, IBM stock will also drops at least 4% within three days.
Introduction2 Sequential Pattern Mining2 • Applications of sequential pattern mining • Customer shopping sequences: • First buy computer, then CD-ROM, and then digital camera, within 3 months. • Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. • Telephone calling patterns, Weblog click streams • DNA sequences and gene structures
Introduction3 Sequential Patterns v.s. Association Rules Correlation between transactions Relationships intra transaction Which items are bought in a certain order? < , > Which items are bought together?( , )
Introduction4 What Is Sequential Pattern Mining? • Given a set of sequences, find the complete set of frequent subsequences A sequence : < (ef) (ab) (df) c b > A sequence database An element may contain a set of items. Items within an element are unordered and we list them alphabetically. <a(bc)dc> is a subsequence of <a(abc)(ac)d(cf)> Given support thresholdmin_sup =2, <(ab)c> is a sequential pattern
Introduction5 A SPM Example and the Problems • Since traditional SPM methods discover only frequencies of the maximal sequential patterns • In a real-life situation the environment may change constantly and users’ behavior may also change over time • A lot of patterns are of little value
Introduction6 RFM Definition in Marketing by Bult and Wansbeek • R (Recency): period from the last purchase to now • R↓: higher possibility the customer makes a repeated purchase • F (Frequency): number of purchases made in a certain period • F↑: the customer has higher loyalty • M (Monetary): the amount of money spent during a certain period • M↑: the customer is more important
Introduction7 The Proposed Algorithm: RFM-SPM • Frequency constraint (traditional SPM)  Frequency, Recency and Monetary constraints (RFM-SPM) • Each constraint has two thresholds • Upper threshold and lower threshold • Ensure considered factor can be restricted within a specified range • By setting these three factors to different intervals, we can discover those patterns which we feel interested
Introduction8 Recency Constraint • Specified by giving a range from Rtime_min to Rtime_max, which are the number of days away from the starting date of the sequence database. Ensuring that the last transaction of the pattern occurred in this interval Starting date Ending date Rtime_min = 200 Rtime_max = 270 200 2001/12/27 270 2001/12/27+200 2001/12/27+270 2002/12/31 Sequence DB
Introduction9 Monetary Constraint • Given by a range fromM_min to M_max. It ensures that the value of the discovered pattern must be between the M_min and M_max. • Suppose the pattern is <(a), (bc)>. Then we say a sequence satisfy this pattern with respect to the monetary constraint, if we can find an occurrence of pattern <(a), (bc)> in this data sequence whose value is within this range.
Introduction10 Frequency Constraint • The frequency of a pattern is the percentage of sequences in database that satisfy the recency constraint and monetary constraint. • A pattern could be output as an RFM-pattern if its frequency falls within the interval of minsup_min and minsup_max.
Introduction11 A Example of RFM-Pattern • 30% of customers who bought a computer would recently come back buying a scanner and a microphone and the total amount of these products is greater than NT 55,000 dollars. • 30% of customers who bought a computer would recently come back buying a scanner and a microphone and the total amount of these products is greater than NT 55,000 dollars.
Introduction Related work1 Related Work R F M • Cluster • Similar needs and/or characteristics that are likely to exhibit similar purchasing behaviors • Classification • Classifying customers to different categories of customer value and they are also used to classify unseen cases • Association rule • Extracting Share Frequent Itemsets with Infrequent Subsets • SPM • Constraint-Based Sequential Pattern Mining: the Consideration of Recency and Compactness • Discovering RFM sequential patterns from customers’ purchasing data R F M M R F
Introduction Related work Problem def1 Data-Sequence in RFM-SPM Traditional sequence DB Transferred sequence DB
Introduction Related work Problem def2 An Overview of Program Definition Containment of itemset Recent Monetary Subsequence Subsequence Recent Subsequence
Introduction Related work Problem def3 Example 3.1. (subsequence) • Data-sequence A = • < (a, 1, 10), (c, 3, 40), (a, 4, 30), (b, 4, 70), (a, 6, 50), (e, 6, 90), (c, 10, 70) > • Itemset (ab) -be contained in A [ ] • Sequence B <(ab)(ae)> - a subsequence of A [ ] Yes Yes
An Overview of Program Definition Introduction Related work Problem def4 Containment of itemset Recent Monetary Subsequence Subsequence Recent Subsequence 18
Introduction Related work Problem def5 Example 3.2. (recent subsequence) • Data-sequence A = <(a, 1, 10), (c, 3, 40), (a, 4, 30), (b, 4, 70), (a, 6, 50), (e, 6, 90), (c, 10, 70)> • Rtime_min = 5 and Rtime_max = 8. • Sequence B <(ab)(ae)> - is a recent subsequence of A [ ] • Sequence B <(ab)(ae)> is a subsequence of A • The occurring time of itemset (ae)= 6≥ Rtime_min and6< Rtime_max Yes
An Overview of Program Definition Introduction Related work Problem def6 Containment of itemset Recent Monetary Subsequence Subsequence Recent Subsequence 20
Introduction Related work Problem def7 Example 3.3.(recent monetary subsequence ) • Data-sequence A = • <(a, 1, 10), (c, 3, 40), (a, 4, 30), (b, 4, 70), (a, 6, 50), (e, 6, 90), (c, 10, 70)> • Rtime_min = 5, Rtime_max = 8 , M_min = 200, M_max = 250. • Sequence B <(ab)(ae)> - is a recent monetary subsequence of A [ ] • Sequence B <(ab)(ae)> is a recent subsequence of A • The total money of this subsequence = 240 ≥ M_min and240< M_max. Yes
Introduction Related work Problem def8 Definition 3.1. (f-pattern, rf-pattern, rfm-pattern) • Let B = <I1I2...Is> be a sequence of itemsets. 22
Introduction Related work Problem def9 Example 3.4. (RFM pattern) • Given a data-sequence DB and six thresholds • R: Rtime_min=10 ≤<Rtime_max = 21 • M: M_min = 150 ≤<M_max = 250 • F: Minsup_min = 2 ≤<Minsup_max = 4 • The RFM-patterns are listed as follows: • Containing 1 itemset = { } • Containing 2 itemsets ={<(ab)(c)> } • Containing 3 itemsets={<(c)(b)(c)>, <(c)(ab)(c)> } • Containing 4 itemsets ={<(c)(b)(a)(c)>} 23
Introduction Related work Problem def RFM-Apriori Algorithm1 RFM-Apriori Algorithm • The RFM-Apriori algorithm is developed by modifying the well-know Apriori (GSP) algorithm • GSP • Put all items into C1, the set of candidate f-patterns with length 1, and then scans the database to find the frequent 1-patterns (L1) • Assume we already have the set of frequent (k-1)-patterns Lk-1. Then it generates the set of candidate f-patterns Ck by joining Lk-1 with Lk-1 • Afterwards, it scan the database to determine the supports of the patterns in Ck, and then find out Lk
Introduction Related work Problem def RFM-Apriori Algorithm2 RFM-Apriori Algorithm Count B.supf All items L1 x L1 Lk-1 x Lk-1 … C1 L1 C2 L2 Lk-1 Ck Lk Candidate Generation Let CIk denote the set of candidate rf-patterns with length k in RFM-Apriori 2 3 4 Apriori LI1f x LI1rf LIk-1rf x LIk-1rf … LIk (LIkrf, LIkrfm) CI1 (LI1f) LI1 (LI1f, LI1rf, LI1rfm) CI2 LI2 (LI2rf, LI2rfm) LIk-1 (LIk-1rf, LIk-1rfm) CIk 1 Support Counting 1 2 Count B.Suprf B.suprfm Inverse Candidate Tree 25
Example 4.1. (Candidate generation- CI2) Suppose LI1f= {<a>, <b>, <c>, < (ab)>, < (bc)>} and LI1rf= {<b>, <c>}, the CI2 is as follows: CI2={<(a)(b)>, <(a)(c)>, <(b)(b)>, <(b)(c)>, <(c)(b)>, <(c)(c)>, (ab)(b)>, <(ab)(c)>, (bc)(b)>, <(bc)(c)> } Introduction Related work Problem def RFM-Apriori Algorithm3 illustration LI1f LI1rf a b c ab bc b c ……. 26
Example 4.2. (Candidate generation- CIk, k>2) Suppose LI3rf={<(b)(a)(c)>, <(c)(a)(c)>, <(b)(b)(c)>, <(c)(b)(c)>, <(b)(ab)(c)>, <(c)(ab)(c)> }, the CI4 is as follows: CI4={<(b)(c)(a)(c)>, <(c)(b)(a)(c)>, <(b)(b)(a)(c)>, <(c)(c)(a)(c)>,<(b)(b)(b)(c)>, <(b)(c)(b)(c)>, <(c)(b)(b)(c)>,<(c)(c)(b)(c)>,<(b)(b)(ab)(c)>,<(b)(c)(ab)(c)>, <(c)(b)(ab)(c)>,<(c)(c)(ab)(c)> } Introduction Related work Problem def RFM-Apriori Algorithm4 illustration <(c)(ab)(c)> <(b)(ab)(c)> LI3rf: CI4: {<(b)(c)(ab)(c)>, <(c)(b)(ab)(c)>} 27
Introduction Related work Problem def RFM-Apriori Algorithm5 RFM-Apriori Algorithm – Example • Given a data-sequence DB and six thresholds Rtime_min=10, Rtime_max=21, M_min=150, M_max=250, Minsup_min=2 and Minsup_max=4, try to find the patterns that satisfy RFM constrains
CI1 LI1
Introduction Related work Problem def RFM-Apriori Algorithm Experiment1 Synthetic data parameters
Introduction Related work Problem def RFM-Apriori Algorithm Experiment2 Synthetic data parameters settings |S| = 4, |I| = 1.25, NS= 5000, NI =25,000, N= 10000, TI = 10, H_price = 1000, M_price = 500, L_price = 100, H_quantity = 1, M_quantity = 3 and L_quantity = 1.
Introduction Related work Problem def RFM-Apriori Algorithm Experiment3 Real-life dataset – SC-POS • The sales data of a chain supermarket in Taiwan. • The SC-POS dataset recorded all transactions from twenty branches between 2001/12/27 and 2002/12/31. • Each transaction in SC-POS dataset is the shopping list of a customer’s transactions, each transaction of which recorded the purchased date and time and the purchased items. • A series of data preprocessing and cleaning tasks were performed, the final dataset contained 17685 items and 33500 customers’ data-sequences.
Introduction Related work Problem def RFM-Apriori Algorithm Experiment4 Test 4.1. Comparing the runtimes and number of patterns of the two algorithms • Varying minsup_min from 1.25% to 0.5% in synthetic datasets • Varying minsup_min from 3.5% to 2.5% in real-life dataset.
Introduction Related work Problem def RFM-Apriori Algorithm Experiment5 SYN-DS1 > More complicated procedure to generate candidate pattern and compute supports Generates fewer candidate and frequent patterns < SC-POS
Introduction Related work Problem def RFM-Apriori Algorithm Experiment6 Test 4.2. Scalability test • During this test, we vary the value of a selected parameter and keep all the other parameters constant. • In each test, a parameter is increased to determine how the algorithms scale-up as the parameter increases. • The first test varies the number of customers, lDl; from 250,000 to 750,000; • The second varies the average number of transactions per customer, lCl; from 10 to 20 • The final one varies the average number of items bought per transaction, lTl; from2.5 to 4.5
Introduction Related work Problem def RFM-Apriori Algorithm Experiment7
Introduction Related work Problem def RFM-Apriori Algorithm Experiment8 Longer sequences would result in more patterns
Introduction Related work Problem def RFM-Apriori Algorithm Experiment9 Test 4.3. Testing the reaction of runtime and number of patterns by varying following parameters • Varying the Rtime_min from 75 to 115 • Varying the M_min from 1000 to 5000
Introduction Related work Problem def RFM-Apriori Algorithm Experiment10 CIK=LIK-1rf x LIK-1rf
Introduction Related work Problem def RFM-Apriori Algorithm Experiment11 Test 4.4. Comparing the number of three kinds of interesting patterns • (*F*) • (RF*) • (RFM)
Introduction Related work Problem def RFM-Apriori Algorithm Experiment12
Introduction Related work Problem def RFM-Apriori Algorithm Experiment13 Test 4.5. Segment the discovered patterns by RFM constraints as following
Introduction Related work Problem def RFM-Apriori Algorithm Experiment14 Managerial Applications • Growing patterns: (RFM) • A(BC) in segments 122, 233, 334, 445, 555 • Weakening patterns • A(BC) in segments 134, 233, 322, 421, 511 • Dead patterns: • A(BC) in segments 123, 211 • Emerging patterns • A(BC) in segments 412, 523
Introduction Related work Problem def RFM-Apriori Algorithm Experiment14 Managerial Applications • Stable patterns • A(BC) in segments 132, 232, 332, 432, 532 • Sort all patterns with R=3 according to M • Sort all patterns with R=3 according to F
Conclusion • We have developed an efficient algorithm for mining frequent patterns with consideration of Recency and Monetary. • These two factors can help users identify those patterns which are active recently and have high monetary value • Besides, the experiments showed our approach is more efficient than the traditional GSP algorithm.