170 likes | 290 Vues
This research focuses on enhancing thermal monitoring in data centers by optimizing sensor placement for efficient hot server detection. Overheating can lead to hardware malfunction, system shutdowns, and excessive cooling energy consumption. By employing Computational Fluid Dynamics (CFD) modeling and a greedy Lightweight Sensor Placement (LSP) algorithm, we aim to maximize detection probability of overheating servers. Our findings show that intelligent sensor placement based on CFD analysis leads to significantly improved monitoring capabilities, paving the way for robust thermal management and energy savings in data centers.
E N D
Towards Optimal Sensor Placement for Hot Server Detection in Data Centers Xiaodong Wang, Xiaorui Wang, Guoliang Xing, Jinzhu Chen, Cheng-Xian Lin. and Yixin Chen.
Outline • Introduction • Related work • Hot server detection problem • CFD-guided sensor placement • Evaluation • Summary
Introduction • Thermal monitoring is important in data center operation: • Overheating is harmful to data center. • Malfunction of hardware components. • Server shut down. • Excessive cooling energy is consumed. • Operation of cooling systems is not efficient enough. • Excessive energy consumption required by overcooling. • To have precise hot server detection: • Precise hot server detection can guide air conditioning system. • Thermal dynamics in data center need to be better studied. • Place more sensors to increase thermal visibility.
Related Work • Studies of thermal profile • [Choi et al. HPCA ‘07 ] studied thermal profile of a rack. • [Patel et al. IPACK ‘01] studied the air temperature specification of a data center in normal condition. • Improve thermal monitoring with sensor networks: • [Liang et al. SenSys ‘09] deployed sensor networks in data center to achieve a high-fidelity thermal monitoring. • [Moore et al. USENIX ‘05] and [Bash et al. USENIX, ’07] proposed to allocate server job and workload based on thermal readings from sensor networks. Not used to guide sensor placement. How to effectively place sensors?
Hot Server Detection Problem • Problem to solve: • To intelligently place sensors for a maximum hot server detection probability. • Problem formulation: • Given M locations to monitor and N (N<M) sensors to use: Subject to the constraint: • : Detection probability of overheating at monitored location • : False alarm rate of overheating at monitored location
Problem Solving Architecture • Overheating data center analysis. • Analyze the data center in overheating condition. • Obtain the temperature distribution for overheating cases. • Find the sensor placement solution. • Sensor readings usually are corrupted by noise. • Sensors need to collaboratively make hot server detection decision (data fusion) Overheating Analysis Sensor Placement Data fusion & placement algorithm Temperature Interpolation CFD Modeling
Overheating Data Center Analysis • Computational Fluid Dynamics (CFD) model for overheating data center • A finite volume method. • Example: Datacenter physical model temperature distribution CFD Modeling Temperature Interpolation Power consumption …… CRAC settings A/C in A/C In/Out A/COut A/C Out A/C Out
Overheating Data Center Analysis (cont’d) • Spatial temperature interpolation • Results from CFD are discrete in locations. • Granularity of CFD modeling is a tradeoff between accuracy and computational complexity. • Inverse Distance Weighting (IDW) interpolation: • Weighted average of the available temperature data. • Optimize sensor placement based on the overheating analysis • To achieve a maximized average overheating server detection probability.
CFD Guided Sensor Placement • Sensor placement with existing solver: • To decide the x, y, and z variables of each sensor location. • Constrained Simulated Annealing (CSA) • An existing solver with 3N variables. • Computational time increases exponentially. • Lightweight Sensor Placement (LSP): • Only searches placement solution at areas with clustered racks. • A greedy algorithm, which adds sensors one by one. • Search space and computational time are significantly reduced.
Simulation Setup • Experiment environment setup • CFD software packages: Gambit and Fluent • Server room size: 32m x 7m x 3m • 13 racks in the server room. • 4 monitored locations each rack (52 locations in total) • 14,400 watts power consumption for each overheating rack. • CRAC settings are collected by external sensor.
Simulation Results • Different sensor numbers • Baselines: • Uniformly Random, current practice. • CFD+ proportional. • Using more sensors increases the detection probability. • CFD+LSP (our solution) is closest to the optimal solution
Simulation Results (cont’d) • Different temperature threshold • Detection probability decreases when temperature threshold increases • Different fusion range: • A proper fusion range can increase the detection probability.
Hardware Experiment in a Server Room • Setup: • A small cluster of two racks is used. • Overheating is created by a heater • Results:
Summary • We place sensors intelligently in data centers • To reach a maximum hot server detection probability • Various overheating conditions are studied to guide sensor placement • CFD is used to analyze data centers under overheating condition. • Future consideration: • Integrate with thermal control approaches. • More detail CFD modeling.
Q&A Thank You! • Acknowledgement • NSF CAREER Award CNS-0845390 • NSF under Grants CNS-0720663, CNS-0915959, CCF-1017336, and CNS-0954039 • Microsoft Research under a Power-Aware Computing Award
Appendix A • Sensor readings usually are corrupted by noise. • Overheating scenario detected when the measured temperature is larger than the threshold. • False alarm happens when the overheating detection is intrigued by noise only.
Appendix B • Rack clustering • The closest distance of two monitored locations in two different clusters is larger than 2R. • Inverse Distance Weighting (IDW) interpolation: