1 / 7

Frequent Itemsets Mining in Distributed Wireless Sensor Networks

Frequent Itemsets Mining in Distributed Wireless Sensor Networks. Manjunath Rajashekhar. Motivation. Sensor network: Battery powered, wireless communication Limited RAM (10K – 32M), large flash (512MB – 1GB) Communication over wireless Speed (4MHz – 40MHz)

krollm
Télécharger la présentation

Frequent Itemsets Mining in Distributed Wireless Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frequent Itemsets Mining in Distributed Wireless Sensor Networks Manjunath Rajashekhar

  2. Motivation • Sensor network: • Battery powered, wireless communication • Limited RAM (10K – 32M), large flash (512MB – 1GB) • Communication over wireless • Speed (4MHz – 40MHz) • Centralized Distributed • I/O Communication • Different Data Rates • Can think of data as baskets? • Data is not uniform distributed across all nodes! • Trivial solution

  3. Algorithm (1) • Preprocessing • Each node sends {node-id, #baskets-count} to the base station • Sampling • Query the network to collect the random sample • Generation of Frequent Itemsets • Apriori algorithm • Scaled threshold

  4. Algorithm (2) • Verification of Frequent Itemsets • Eliminate False Negatives • Negative Border • Aggregate counts of negative border over the network • Fails: Repeat the whole algorithm • Eliminate False Positives • Aggregate counts of frequent itemsets over the network

  5. Experiments • Setup: # Nodes = 100 # Baskets = 10400, baskets are distributed non-uniformly across nodes. Threshold scaling factor = 0.9 Support threshold = 25% Synthetic dataset Values averaged over 100 trials. • ~73 % saving in communication • Insights?

  6. Backup slides

  7. Analysis • Preprocessing: (C1) • size-of {node-id, count} * # nodes * cumulative-communication-distance • Sampling (C2) • average-size-of-baskets * size-of-random-sample * cumulative-communication-distance • False Negatives (C3) • size-of-negative-border * # nodes * aggregation-distance • False Positives (C4) • size-of-frequent-itemsets * # nodes * aggregation-distance • Total Cost = C1 + C2 + C3 + C4

More Related