1 / 13

LAS

LAS. Lemon Alarm System Miroslav Siket, Karol Stanislawek CERN-IT/FIO-FD. Outline. Scope of LAS SURE and history LAS architecture Exceptions vs. SURE alarms LAS logic and LAS GUI Current status SURE phase-out and LAS phase-in plan Future and concluding notes. Scope.

faunus
Télécharger la présentation

LAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LAS Lemon Alarm System Miroslav Siket, Karol Stanislawek CERN-IT/FIO-FD

  2. Outline • Scope of LAS • SURE and history • LAS architecture • Exceptions vs. SURE alarms • LAS logic and LAS GUI • Current status • SURE phase-out and LAS phase-in plan • Future and concluding notes FIO Group meeting

  3. Scope • Provide alarm system for the operators in the Computer Centre at CERN • Scalable to 10k+ machines, 300+ alarms • Provide high availability solution FIO Group meeting

  4. SURE and history • SURE is 13 years old • Scalability issues • Old interface • Missing features • Further maintenance issues • Previously considered systems • PVSS, Spectrum, LASER, EDG prototype,… • Either hard to interface (bad support) or not scalable to desired number of machines/alarms • Configuration limited FIO Group meeting

  5. LAS architecture FIO Group meeting

  6. LAS Schema FIO Group meeting

  7. Exceptions vs. SURE alarms • Sure alarms are based on comparing individual single- valued metrics with reference values • Exceptions • Are based on correlation of multiple metrics • Allow multi-valued metrics and on-behalf metrics • Allow regular expressions • Allow logical operations • Allow basic mathematical operations (+,-,*,/) • Allow corrective actions (actuators) up to n-times or within given time window • Allow distinguishing of the alarm state (failed actuator,…) • Example: (10004:7 > 100 && (10005:3 – 34:5)>100:56) FIO Group meeting

  8. LAS business logic • Evaluation of exceptions -> alarms • Entity status derived out of CDB • Reductions – horizontal and vertical • Hide and inhibit FIO Group meeting

  9. Notifications, RSS, SMS,… • Built in notification mechanisms: • E-mail • SMS • RSS (requires SSL aware RSS reader) • Configurable per user as to what entities are to be reported on and what are the days/hours during which the notifications are enabled (SMS) • Possible aggregation of the notifications FIO Group meeting

  10. LAS GUI • Running in any recent browser with: • support XMLHTTPRequest() call - AJAX • is DOM 1.1 compliant • supports CSS • SSL strong encryption • Netscape 6+, IE5+, FF 0.5+, Mozilla 1.0+,… • Requires that you have FlashPlayer installed • Allows multiple users – actions as synchronized • Configurable FIO Group meeting

  11. Current status • Work in progress • Pre-production version running, together with GUI • Preparing operators and user GUIs • Testing and optimizing business logic of LAS • Preparing notification and RSS structures • Synchronization of SURE alarms with Exceptions (major CDB endeavour) • Waiting for “reliable” hw for the servers • Preparing for testing of HA solutions (Oracle RAC/DG) FIO Group meeting

  12. SURE phase-out plan • Several phases are needed • Replacement of ForSure sensor with sensor-sure that would be based on exceptions • Synchronization of SURE alarms and Exceptions with extra SURE server running in sync mode • Lemon-sensor-remote and UIMON port to LAS • Phasing in lemon-host-check • Running LAS in parallel with SURE for a month • Training of operators • LAS production FIO Group meeting

  13. Future and conclusions • Future alarm system based on modern technologies • More useful, provides direct information to the service managers about status of their services FIO Group meeting

More Related