160 likes | 289 Vues
Harnessing the Strengths of Anytime Algorithms for Constant Data Streams. Philipp Kranen , Thomas Seidl Data Management and Data Exploration Group RWTH Aachen University, Germany.
E N D
Harnessing the Strengthsof Anytime Algorithmsfor Constant Data Streams Philipp Kranen, Thomas Seidl Data Management and Data Exploration Group RWTH Aachen University, Germany
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Agenda • Problem statement • Formal model • Novel approaches • Evaluation • Conclusion
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Motivation – data streams in all day life … tf td constant data stream type 1 … type 2 type m arrival interval ta
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Problem statement Budget algorithms Tailored to a specific application - no result in less time - no improvement Anytime algorithms Natural choice for varying streams + result after any time + exploit additional time • Data streams • are ubiquitous • Network traffic • Sensor measurements • Customer data • Surveillance data • … Constantstreams Varyingstreams Goal: Improve the resulting quality on constant streams over that of budget algorithms Basic idea: spend less time on “confident” items Prerequisite: a confidence measure for the current result quality time
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Model – premise • Example: classification of items on a conveyor belt • Given • Anytime classification algorithm (e.g. anytime nearest neighbor) • Confidence measure • (td – tf) ≥ta • Time is normalized to [0, 1] • t=0 corresponds to the result just after initialization • t=1 complete model has been read, no further improvement possible • n improvement steps (e.g. n training set items for nearest neighbor) • Confidence measure ranges from 0 to 1 • 0 no confidence • 1 certain • First: assume linear dependency between confidence and accuracy
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Model – assumptions Individual confidences are scattered around the mean value (budget confidence) confidence F(ĉ, t) ĉ budget confidence [ = μ(t) ] scattering [ = σ(t) ] time
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Model – expected time to reach confidence ĉ • F(ĉ, t) is the probability that the confidence at time t is larger than ĉ • Use F(ĉ, t) as a cumulative distribution function (n steps!) • h(ĉ, tj) is the probability that we first exceeded ĉ from tj-1 to tj • Determine the expected time needed to reach ĉ by F(0.3, t) 1-F(0.3, t) time time trad. budget batch approach confidence
tf td Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Batch approach time: t0 type 1 batchapproach type 2 … … 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 • To improve the over all quality of the results,we have to process several items in parallel Buffer type m arrival interval ta time: t0 + 5∙ta type 1 batchapproach type 2 … … 17 16 15 14 13 12 11 10 9 8 7 6 Buffer type m
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09FiFo approach • Use FiFo queue with capacity of r • Initialize and insert newly arriving items • Remove eldest item on overflow • Improve item s with lowest time weighted confidence • if confidences are similar, give priority to older items weight weight remaining time remaining time
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – classifiers and confidence measures • Anytime nearest neighbor classifier(ordered w.r.t. leave-one-out performance on training set) • Anytime support vector machine(m times one class versus all) • Anytime Bayesian classification(Hierarchy of mixture densities per class)
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – batch approach • Throughout: 4-fold cross validation, time scaled to [0, 1] • Budget: performance increases with allotted time • Batch: accuracy increases with growing window size (equal time) • Largest (relative) improvement for small window sizes
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – batch approach and model • Results confirm theoretic model: • “linear” dependency betweenaccuracy and confidence • Expected time t(c) decreases with growing window size time trad. budget batch approach confidence
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – FiFo approach • FiFo approach also outperforms the respective budget algorithm • Accuracy increases with larger minimal time factor mtf • Confidence alone yields the best distribution of time allowance weight 1 mtf remaining time
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Evaluation – comparison • FiFo approach performs better than the batch approach in comparable settings throughout all experiments • Performance improvement even for small window/queue sizes
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Conclusion • Data streams are ubiquitous • So far: budget algorithms on constant streams • Achievement: quality improvement over budget algorithmsby harnessing the strengths of anytime algorithms • Two simple yet effective approaches • Evaluation using three prominent classifiers and simple confidence measures • Both approaches outperform the respective budget algorithms • Results confirm theoretic model and motivate further research • Anytime algorithms • Confidence measures
Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09Poster session tonight • Discuss about the paper • Investigate stream data items …