Mining Frequent Patterns II: Mining Sequential & Navigational Patterns

Mining Frequent Patterns II:Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University

Sequential pattern mining • Association rule mining does not consider the order of transactions. • In many applications such orderings are significant. E.g., • in market basket analysis, it is interesting to know whether people buy some items in sequence, • e.g., buying bed first and then bed sheets some time later. • In Web usage mining, it is useful to find navigational patterns of users in a Web site from sequences of page visits of users

Sequential PatternsExtending Frequent Itemsets • Sequential patterns add an extra dimension to frequent itemsets and association rules - time. • Items can appear before, after, or at the same time as each other. • General form: “x% of the time, when A appears in a transaction, B appears within z transactions.” • note that other items may appear between A and B, so sequential patterns do not necessarily imply consecutive appearances of items (in terms of time) • Examples • Renting “Star Wars”, then “Empire Strikes Back”, then “Return of the Jedi” in that order • Collection of ordered events within an interval • Most sequential pattern discovery algorithms are based on extensions of the Apriori algorithm for discovering itemsets • Navigational Patterns • they can be viewed as a special form of sequential patterns which capture navigational patterns among users of a site • in this case a session is a consecutive sequence of pageview references for a user over a specified period of time

Objective • Given a set S of input data sequences (or sequence database), the problem of mining sequential patterns is to find all the sequences that have a user-specified minimum support • Each such sequence is called afrequent sequence, or a sequential pattern • The support for a sequence is the fraction of total data sequences in S that contains this sequence

Sequence Databases • A sequence database consists of an ordered lis of elements or events • Each element can be a set of items or a single item (a singleton set) • Transaction databases vs. sequence databases A transaction database A sequence database Elements in (…) are sets

Subsequence vs. super sequence • A sequence is an ordered list of events, denoted < e1 e2 … el > • Given two sequences α=< a1 a2 … an > and β=< b1 b2 … bm > • α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2 <…< jn ≤m such that a1⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn • Examples: • < (ab), d> is a subsequence of < (abc), (de)> • 3, (4, 5), 8is contained in (or is a subsequence of) 6, (3, 7), 9, (4, 5, 8), (3, 8) • <a.html, c.html, f.html> ⊆ <a.html, b. html, c.html, d.html, e.html, f.html, g.html>

What Is Sequential Pattern Mining? • Given a set of sequences and support threshold, find the complete set of frequentsubsequences A sequence: < (ef) (ab) (df) c b > A sequence database An element may contain a set of items. Items within an element are unordered and we list them alphabetically. <a(bc)dc> is a subsequenceof <a(abc)(ac)d(cf)> Given support thresholdmin_sup=2, <(ab)c> is a sequential pattern

Another Example Transactions Sorted by Customer ID

Example (continued) Sequences produced from transactions Final sequential patterns

GSP mining algorithm • Very similar to the Apriori algorithm

Sequential Pattern Mining Algorithms • Apriori-based method: GSP (Generalized Sequential Patterns: Srikant& Agrawal, 1996) • Pattern-growth methods: FreeSpan & PrefixSpan (Han et al., 2000; Pei, et al., 2001) • Vertical format-based mining: SPADE (Zaki 2000) • Constraint-based sequential pattern mining (SPIRIT: Garofalakis, et al., 1999; Pei, et al., 2002) • Mining closed sequential patterns: CloSpan (Yan, Han & Afshar, 2003) From: J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji

Mining Navigation Patterns • Each session induces a user trail through the site • A trail is a sequence of web pages followed by a user during a session, ordered by time of access • A sequential pattern in this context is a frequent trail • Sequential pattern mining can help identify common navigational sequences which in turn helps in understanding common user behavioral patterns • If the goal is to make predictions about future user actions based on past behavior, approaches such as Markov models (e.g., Markov Chains) can be used

Mining Navigational Patterns • Another Approach: Markov Chains • idea is to model the navigational sequences through the site as a state-transition diagram without cycles (a directed acyclic graph) • a Markov Chain consists of a set of states (pages or pageviews in the site) S = {s1, s2, …, sn} and a set of transition probabilities P = {p1,1, … , p1,n, p2,1, … , p2,n, … , pn,1, … , pn,n} • a path r from a state si to a state sj, is a sequence states where the transition probabilities for all consecutive states are greater than 0 • the probability of reaching a state sj from a state si via a path r is the product of all the probabilities along the path: • the probability of reaching sj from si is the sum over all paths:

Construct Markov Chain from Web Navigational Data • Add a unique start state • the start state has a transition to the first page in each session (representing the start of a session) • alternatively, could have a transition to every state, assuming that every page can potentially be start of a session • Add a unique final state • the last page in each trail has a transition to the final state (representing the end of the session) • The transition probabilities are obtained from counting click-throughs • The Markov chain built is called absorbing since we always end up in the final state

A Hypothetical Markov Chain • What is the probability that a user who visits the Home page purchases a product? • Home -> Search -> PD -> $ = 1/3 * 1/2 *1/2 = 1/12 = 0.083 • Home -> Cat -> PD -> $ = 1/3 * 1/3 * 1/2 = 1/18 = 0.056 • Home -> Cat -> $ = 1/3 * 1/3 = 1/9 = 0.111 • Home -> RS -> PD -> $ = 1/3 * 2/3 * 1/2 = 1/9 = 0.111 An example Markov Chain Sum = 0.361

Markov Chain Example Calculating conditional probabilities for transitions Sessions: A, B A, B A, B, C A, B, C A, B, C, D A, B, C, E A, C, E A, C, E A, B, D A, B, D A, B, D, E B, C B, C B, C, D B, C, E B, D, E Web site hyperlink graph B D 0.57 A C E Transition BC: Total occurrences of B: 14 Total occurrence of BC: 8 Pr(C|B) = 8/14 = 0.57

Markov Chain Example (cont.) Sessions: A, B A, B A, B, C A, B, C A, B, C, D A, B, C, E A, C, E A, C, E A, B, D A, B, D A, B, D, E B, C B, C B, C, D B, C, E B, D, E 0.14 The Full Markov Chain 0.31 Start 0.21 B D 0.67 0.82 0.69 0.20 Final 0.57 0.33 A 0.18 0.40 1.00 C E Probability that someone will visit page C? SBC + SAC + SABC (0.31 * 0.57) + (0.69 * 0.18) + (0.69 * 0.82 * 0.57) = 0.503 Prob. that someone who has visited B will visit E? BDE + BCE + BCDE (0.21 * 0.33) + (0.57 * 0.40) + (0.57 * 0.20 * 0.33) = 0.335 Probability that someone visiting page C will leave the site? 0.40 = 40% 0.40

Mining Frequent Trails Using Markov Chains • Support s in [0,1) – accept only trails whose initial probability is above s • Confidence c in [0,1) – accept only trails whose probability is above c • Recall: the probability of a trail is obtained by multiplying the transition probabilities of the links in the trail • Mining for Patterns • Find all trails whose initial probability is higher than s,and whose trail probability is above c. • Use depth-first search on the Markov chain to compute the trails • The average time needed to find the frequent trails is proportional to the number of web pages in the site

Markov Chains: Another Example

Frequent Trails From Example Support = 0.1 and Confidence = 0.3

Frequent Trails From Example Support = 0.1 and Confidence = 0.5

Efficient Management of Navigational Trails • Approach: Store sessions in an aggregated sequence tree • Initially introduced in Web Utilization Miner (WUM) - Spiliopoulou, 1998 • for each occurrence of a sequence start a new branch or increase the frequency counts of matching nodes • in example below, note that s6 contains “b” twice, hence the sequence is <(b,1),(d,1),(b,2),(e,1)>

Mining Navigational Patterns The aggregated sequence tree can be used directly to determine support and confidence for navigational patterns Note that each node represents a navigational path ending in that node Support = count at the node / count at root Confidence = count at the node / count at the parent Navigation pattern: a  b Support = 11/35 = 0.31 Confidence = 11/21 = 0.52 Nav. pattern: a  b  e Support = 11/35 = 0.31 Confidence = 11/11 = 1.00 Nav. patterns: a  b  e  f Support = 3/35 = 0.086 Confidence = 3/11 = 0.27

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns