230 likes | 360 Vues
This paper discusses the application of control theory to optimize stream processing systems, specifically through the implementation of the Tuple Control Queue (TCQ). It addresses the challenges of managing tuple drops when the result queue is full, inaccurate data rates from sources, and how to maintain optimal queue lengths. Through feedback control mechanisms and various controller designs, including PI controllers, the system improves throughput and robustness against disturbances. This work highlights the advantages of control theory in making complex systems operate efficiently within desired parameters while providing a framework for future enhancements.
E N D
Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu) Bill Kramer (kramer@lbl.gov) Joe Hellerstein ( hellers@us.ibm.com )
TCQ drops tuples silently if result queue is full Description of the system TCQ Complex internal structure Data Source Input Buffer
Why do we need control? • Data source does not provide accurate data rate
Why do we need control? • TCQ node drops tuples when result queue fill up Source Buffer TCQ Result Q
Control Problems • Providing an accurate data source • Get the actual data rate • Regulate queue length on TCQ node • Prevent dropping tuples • Maximize throughput (and adapts when disturbance happens)
2 Queue Length Monitor System with Control Controlled Data Source Output Rate Controller
PI Controller The Control Architecture P Controller
Result – An accurate data source P Controller with Pre-compensation PI Controller
Result – regulating queue length Source Buffer TCQ Result Q
Result – Under CPU Contention Source Buffer TCQ Result Q
Why theory is useful? • One of my implementations .. What happened? Source Buffer TCQ Result Q
What is going on? Controlled Output Thread(Code Reuse) Queue Length Controller Desired Queue length Data Rate to TCQ Actual Queue Length
Output Y from simulation Theory meets reality Queue length Time
Tricky part of parameter estimation Model evaluation – Making the system operate in desired range Data rate vs free space Free Space Non-Linear range Easy for data source, but queue length ..
Settling Time and Overshoot matters A lot of small disturbance in a Java program Incremental garbage collection P Controller PI Controller
Conclusion • Advantages of feedback control • Make system more robust under disturbance • Treat complex systems as black boxes • Cope with the system characteristics instead of having to change it • Encourage reporting system statistics • Implementation is easy and has theoretical guarantees
Future Work • Load balancer • Smaller sample time to reduce disturbance caused by Java GC? • Controller on scheduling of system shared by multiple streams
Outline • Problems and Motivation • Controller design • Result • Discussion
Description of the System Tuples TCQ Node Tuple Blocks Routing Logic Input Buffer Data Source TCQ Node Load Splitter Tuples Queue length • Operation of Load Splitter • Arriving blocks wait in Input Buffer • Tuples are routed to balance TCQ queue lengths • Stop routing if queue length is too large to avoid tuple discards Revised
Compare to Open Loop Control We know Y(k) , and we know what we want y(k+1) to be.. Use transfer function to solve for u(k)… (Expected result – accuracy and disturbance ) -- do be done
Estimation of the transfer function y(k+1)=ay(k)+bu(k) Regression
Tricky part of parameter estimation Model evaluation – A data rate that make it operate in linear range