220 likes | 232 Vues
Application of SIS based TDT. Yang Hu University of Pittsburgh Department of Computer Science. Introduction to SIS Topic Detection and Tracking (TDT) Concept Goals Major Tasks Methods TDT based Power Efficiency Web Server Motivation Implementation Conclusion. Outline.
E N D
Application of SIS based TDT Yang Hu University of Pittsburgh Department of Computer Science
Introduction to SIS • Topic Detection and Tracking (TDT) • Concept • Goals • Major Tasks • Methods • TDT based Power Efficiency Web Server • Motivation • Implementation • Conclusion Outline
Slow Intelligence System can provide a software development framework for general-purpose system with insufficient computing resources to gradually improve performance over time. Introduction to SIS
It contains five stages Slow Intelligence System 5 1 2 3 4 Elimination Adaptation Concentration Enumeration Propagation Introduction to SIS (cont’d)
What is TDT • A DARPA-sponsored initiative to investigate the state of the art in finding the trend in a stream of broadcast news stories. Concept
To develop automatic techniques for finding topically related material in streams of data. This could be valuable in a wide variety of applications where efficient and timely information access is important. Eg. (CNN or Yahoo News) • Make the computers able to map out data automatically finding story boundaries, determining what stories go with one another, and discovering when something new (unforeseen) has happened. Goals
Story Segmentation - Detect changes between topically cohesive sections • Topic Tracking - Keep track of stories similar to a set of example stories • Topic Detection - Build clusters of stories that discuss the same topic • First Story Detection - Detect if a story is the first story of a new, unknown topic • Link Detection - Detect whether or not two stories are topically linked Major Tasks
General Linear Abstraction of Seasonality (GLAS) • Henderson Filter (HF) • Lowess (LW) • Smoothing splines (SS) • Kalman Filter (KF) Methods
It’s a package currently used in Bank of England for seasonal adjustment and trend estimation. • The trend series is constructed using a moving –average of data with triangular shaped weighting pattern. GLAS
It’s used in the X11-ARIMA and X-12-ARIMA packages which are also packages currently used in Bank of England. • The rational is the same as GLAS, but using a different weighting pattern. HF
Lowess identifies a certain number of nearest-neighbors to a given point, x0, and assigns a weight to each neighbor based on the distance of that neighbor to the point. A value of the trend at x0 is then calculated based on these weights. • The number of nearest neighbors which are used is the smoothing parameter. • The bigger the number, the smoother the trend. LW
The smoothing spline smoother is derived as the explicit solution to the functional minimization problem. • represents the smoothing parameter, which is the trade-off between the smoothness of the curve (the second derivative term in the integral) and the fidelity to the data (the residual sum of squares). SS
This approach employs the idea of structural time series modeling where the unobserved component of trend is assumed to follow a well-defined stochastic process. • General form for the trend component is given below. KF
Server power consumption is rapidly becoming a hot topic in the IT industry. • Over the last decade, power has emerged as a critical design constraint in modern computer architecture. In many cases system power consumption is increasing exponentially. Motivation
SIS Coordinator Implementation
SIS based TDT 1st KB Enumerator Eliminator Concentrator 2nd KB Implementation(cont’d)
For most data centers, the cost of power has become a top budget item. In fact, in 2008, the average cost of power used by a server exceeded its purchase price (4). • Nationally, the EPA estimated data center power consumption to cost over $4.5 Billion a year in 2006, projected to grow to $7.4 Billion in 2011 (5). • One main reason is typically, due to lack of communication between the guys that pays the power bill, and the IT department that operates the servers. Conclusion
Shih and Peng “Building Topic/Trend Detection System based on Slow Intelligence ” • Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y., "Topic detection and tracking pilot study: Final report" • Bianchi, M., Boyle, M., and Hollingsworth, D., "A comparison of methods for trend estimation" • Belady, Christian. 2007. “In the Data Center, Power and Cooling Costs More Than the IT Equipment it Supports.” Electronics Cooling. Vol. 23, No. 1, February 2007. • U.S. Environmental Protection Agency. 2007. “EPA Report to Congress on Server and Data Center Energy Efficiency”. References