LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner

LCM: An Efficient Algorithm forEnumerating Frequent Closed Item SetsLinear time Closed itemset Miner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo Uchida National Institute of Informatics Kyushu University Kyushu University Kyushu University 19/Nov/2003 FIMI 2003

・sparse/dense (occ-deliv/diffsets) ・database reduction ・remove infrequent items Motivation ・exact enumeration of closed item set ・generation of all/maximal item set from closed item set Few solutions for small support Many solutions for even large support small supports IBMdatas BMS POS retail BMS web1,2 kosarak - We want to solve difficult problems in short time chess connect pumsb* accidents mushroom pumsb large supports #closed set =#freq. set #closed set <<#freq. set

Outline of Our Research - Exact enumeration of closed item sets (no sophisticated pruning, post processing, nor memory for obtained closed item sets) - Enumerate all/maximal frequent item sets using closed item set - Algorithms for updating occurrences/maximality check in dense/sparse cases, and their adaptive hybrid - Save additional memoryuse (right first sweep, adjacency matrix only for large transactions)

Exact Enumeration of Closed Item Sets - Introduce acyclic parent-child relationship on freq. closed sets ( it induces a tree-shaped transversal route ) - Traverse the route in depth-first manner ( find a child, and go to it ) root (=φ) Exact enumeration (linear time to #closed set)  Any child is found by taking closure (in short time) Not need to store obtained item sets (small memory) can enumerate all closed item sets (even without min. support)

Definition of Parent Closure = maximal item set with the same occurrences x X : closed item set parent of X = closure of X∩{1,…,i} where i is the maximum s.t.X ≠closure of X∩{1,…,i}  parent of X ⊆X, acyclic X' =child of X ⇔ X' is closure of X∪{i} for some i and (cond)X'＼X includes no item <i x' child All children are found by taking closure of X∪{i} (cond)can be checked in short time by using some algorithms

Adaptive Hybrid Algorithm Computation of OccurrencesX∪{i}for Sparse and Dense Cases - In sparse case, by tracing items of each occurrence of X (occurrence deliver : maybe a known technique) - In dense case, use diffsets(proposed by Zaki) We choose best one according to estimations of computation time in each iterations

Maximal and All Frequent Sets closed item set class - Maximal frequent sets generated from closed item sets - All frequent sets (hypercube decomposition)  -- decompose classes of closed item sets into complete sublattices -- enumerate pairs of greatest/least elements of sublattices -- generate others from the pairs 000 ••• 0 01 lattice 111 ••• 1

Result fast or usual fast small supports IBMdatas BMS POS retail BMS web1,2 kosarak chess Slower than others fast if support is small connect pumsb* accidents mushroom pumsb large supports

Conclusion Fast without pruning, trie, other existing method - For data sets s.t. #freq. closed sets<<#freq. sets - large business datasets: BMS-web1,2, retails - machine learning datasets with small supports: UCI repository exact enumeration of closed item sets and hypercube decomposition perform well -These techniques are orthogonal to other techniques, ( ・database reduction, ・pruning infrequent items,… )  we can do better for large supports / accidents (blue area). - Parameter of hybrid is not tuned  not fast for kosarak, IBMdatas  now faster For further speed up

We think… ● What are the real problem (bottleneck) ? ---- Mining structured item sets (closed item sets, association rule with threshold,… ) ●Is it only a counting problem ? ---- for all frequent item set mining, Yes. the problem is how to make the occurrences of an item set from other item sets (choose best way, represent ● Is maximal item set useful ? ---- closed item set is useful!! have an application for classification, association rule mining

Pruning of infrequent sets really necessary? Some Observations frequency X X∪{1} X∪{2} X∪{3} X∪{4} X∪{5} Usually, < 1/2 Really need to prune ? - Computing occurrences for infrequent items from X Need for accelerating occurrence computation ? - Almost computation is for updating occurrences - There is a best e to get occurrence of X from X - e Can we design algorithm choosing e in each iteration ? how we find this e ? Does this accelerate? ( we can evaluate the lower bound of occurrence computation )

Some Observations frequency X X∪{10} X∪{11} X∪{12} X∪{13} X∪{14} - Computing occurrences for infrequent items from X Usually, < 1/2 Really need to prune ?

Right First Sweep D - Generate recursive calls in decreasing order of items - Clear memory after the recursive call - Re-use the memory in the following recursive calls B D D C A A A B E X∪{10} X∪{11} X∪{12} X∪{13} X∪{14} Child iterations need no memory

Occurrence deliver E D C B A Compute T(X∪{i}) by tracing each occurrence of X D B D D C A A A B E X∪{10} X∪{11} X∪{12} X∪{13} X∪{14} In sparse cases, fast

Checking (cond) of Closure - Check (cond) closure of X∪{i}＼X includes no item <i - In sparse case, find an occurrence not including j, for all possible item j - In dense case, update occurrences of all frequent X∪{j}, and compute T(X∪{i}∪{j}) C C C ・・・ B B A A A X∪{1} X∪{2} X∪{i} X∪{14} ・・・ Quite faster than computing the closure of X∪{i}

all closed maximal Results

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner

Presentation Transcript

Solving Recurrence Relations

11. Stability of Closed-Loop Control Systems

The setpoint overshoot method: A simple and fast closed-loop approach for PI tuning

Linear Filters

Closed Vs. Open Population Models

Chapter 11 STEADY CLOSED-CONDUIT FLOW

Linear probing

From Systems to Descriptive Models

Data Mining: Concepts and Techniques — Chapter 5 — Mining Frequent Patterns

JCO ( JCos Online) Menu

Data Mining: Concepts and Techniques

Closed Fractures of the Tibial Diaphysis

Dynamic Behavior and Stability of Closed-Loop Control Systems

Closed Loop Magnetic Levitation Control of a Rotary Inductrack System

Association Rule Mining

Specifications and closed-loop control

Chapter 4 Vector Spaces

A Simple Linear Time Algorithm for the Domatic Partition Problem on Strongly Chordal Graphs

Eye Fun Facts

Powering the Nurture Dialog: Automating Marketing Programs in a Closed-Loop Environment

Logs Miner : Portal for Data Mining Web Access Logs

A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger , Stanford