1 / 16

Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Harnessing the Strengths of Anytime Algorithms for Constant Data Streams. Philipp Kranen , Thomas Seidl Data Management and Data Exploration Group RWTH Aachen University, Germany.

nicole
Télécharger la présentation

Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harnessing the Strengthsof Anytime Algorithmsfor Constant Data Streams Philipp Kranen, Thomas Seidl Data Management and Data Exploration Group RWTH Aachen University, Germany

  2. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Agenda • Problem statement • Formal model • Novel approaches • Evaluation • Conclusion

  3. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Motivation – data streams in all day life … tf td constant data stream type 1 … type 2 type m arrival interval ta

  4. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Problem statement Budget algorithms Tailored to a specific application - no result in less time - no improvement Anytime algorithms Natural choice for varying streams + result after any time + exploit additional time • Data streams • are ubiquitous • Network traffic • Sensor measurements • Customer data • Surveillance data • … Constantstreams Varyingstreams Goal: Improve the resulting quality on constant streams over that of budget algorithms Basic idea: spend less time on “confident” items Prerequisite: a confidence measure for the current result quality time

  5. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Model – premise • Example: classification of items on a conveyor belt • Given • Anytime classification algorithm (e.g. anytime nearest neighbor) • Confidence measure • (td – tf) ≥ta • Time is normalized to [0, 1] • t=0 corresponds to the result just after initialization • t=1 complete model has been read, no further improvement possible • n improvement steps (e.g. n training set items for nearest neighbor) • Confidence measure ranges from 0 to 1 • 0  no confidence • 1  certain • First: assume linear dependency between confidence and accuracy

  6. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Model – assumptions Individual confidences are scattered around the mean value (budget confidence) confidence F(ĉ, t) ĉ budget confidence [ = μ(t) ] scattering [ = σ(t) ] time

  7. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Model – expected time to reach confidence ĉ • F(ĉ, t) is the probability that the confidence at time t is larger than ĉ • Use F(ĉ, t) as a cumulative distribution function (n steps!) • h(ĉ, tj) is the probability that we first exceeded ĉ from tj-1 to tj • Determine the expected time needed to reach ĉ by F(0.3, t) 1-F(0.3, t) time time trad. budget batch approach confidence

  8. tf td Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Batch approach time: t0 type 1 batchapproach type 2 … … 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 • To improve the over all quality of the results,we have to process several items in parallel Buffer type m arrival interval ta time: t0 + 5∙ta type 1 batchapproach type 2 … … 17 16 15 14 13 12 11 10 9 8 7 6 Buffer type m

  9. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09FiFo approach • Use FiFo queue with capacity of r • Initialize and insert newly arriving items • Remove eldest item on overflow • Improve item s with lowest time weighted confidence •  if confidences are similar, give priority to older items weight weight remaining time remaining time

  10. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – classifiers and confidence measures • Anytime nearest neighbor classifier(ordered w.r.t. leave-one-out performance on training set) • Anytime support vector machine(m times one class versus all) • Anytime Bayesian classification(Hierarchy of mixture densities per class)

  11. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – batch approach • Throughout: 4-fold cross validation, time scaled to [0, 1] • Budget: performance increases with allotted time • Batch: accuracy increases with growing window size (equal time) • Largest (relative) improvement for small window sizes

  12. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – batch approach and model • Results confirm theoretic model: • “linear” dependency betweenaccuracy and confidence • Expected time t(c) decreases with growing window size time trad. budget batch approach confidence

  13. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – FiFo approach • FiFo approach also outperforms the respective budget algorithm • Accuracy increases with larger minimal time factor mtf •  Confidence alone yields the best distribution of time allowance weight 1 mtf remaining time

  14. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – comparison • FiFo approach performs better than the batch approach in comparable settings throughout all experiments • Performance improvement even for small window/queue sizes

  15. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Conclusion • Data streams are ubiquitous • So far: budget algorithms on constant streams • Achievement: quality improvement over budget algorithmsby harnessing the strengths of anytime algorithms • Two simple yet effective approaches • Evaluation using three prominent classifiers and simple confidence measures •  Both approaches outperform the respective budget algorithms • Results confirm theoretic model and motivate further research • Anytime algorithms • Confidence measures

  16. Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Poster session tonight • Discuss about the paper • Investigate stream data items …

More Related