Automatic Data Quality Monitoring in the BaBar Online and Offline Systems

Automatic Data Quality Monitoring in the BaBar Online and Offline Systems Scott D. Metzler California Institute of Technology For the BaBar Computing Group CHEP 2000, Padova, IT Feb. 7-11

Context • Online System • 32 Sun Ultra-5 workstations • 2000 Hz maximum input rate into Level 3 • Level 3 reduces the rate to 100 Hz. • Real-time monitoring is a system requirement. • Diagnostic Data • Same set of diagnostic data is produced on all 32 nodes. • Data summed over all nodes is available to GUIs and automatic monitor. • Multiple levels of monitoring are available. CHEP 2000, Padova, IT Feb. 7-11

Need for Automation • Thousands of diagnostic objects are produced for each run. • These are organized by detector system. • Systems typically provide a high-level diagnostic page for use within JAS for shift monitoring. • Some plots are too subtle for shift crews to digest. • Inconsistent checking of data is common depending on staffing. • Automatic monitoring provides: • consistent checking • objective, system-defined tests • greater coverage of the detector CHEP 2000, Padova, IT Feb. 7-11

Diagnostic Data Types • Histograms provide time-integrated monitoring of system-defined quantities. • Three types of histograms are available: • 1D • 2D • 1D Profile • Histogram contents can be monitored as the total sum since the beginning of the run or as the sum since the last automatic comparison. CHEP 2000, Padova, IT Feb. 7-11

Histograms Displayed in JAS CHEP 2000, Padova, IT Feb. 7-11

Diagnostic Data Types (Cont.) • Scalers provide tracking of quantities over time • Each scaler has a rotating buffer of time bins. The bins are synchronized over nodes. • Scaler Groups control the granularity of bins. • Four types of scalers are available: • Averaging (weighted average over time/nodes) • Integrating (summed over time/nodes) • Value (set over time; single node only) • Multi (a list of the above types) CHEP 2000, Padova, IT Feb. 7-11

Scalers Displayed in JAS NewRun CHEP 2000, Padova, IT Feb. 7-11

Conceptual Design hbook Data Retriever Comparator Fit Network c2 1 2 1 1 Comparison Record Manager N 1 1 N Responses GUIs CHEP 2000, Padova, IT Feb. 7-11

Comparison Techniques • Fixed Spectrum • Compare histograms against a reference histogram using Kolmogorov-Smirnof or Chi-Squared testing. • Compare individual bins against a reference looking for hot or dead channels. A single bad bin causes an error. • Comparison against parameterized functions is available. CHEP 2000, Padova, IT Feb. 7-11

Comparison Techniques (Cont.) • Fitting • Fitting is intended to handle histograms that are difficult to compare against fixed spectrums because of changing conditions. • Detector systems define the function to which they wish to fit a histogram. They also define the allowed ranges of the fitted parameters. • It is possible to ignore certain parameters (e.g. background fraction) in the comparison. • Fitting is not fully available yet, but we anticipate that it will be soon. CHEP 2000, Padova, IT Feb. 7-11

Comparison Techniques (Cont.) • Monitoring Scalers • Comparison against a fixed range. • Comparison as a function of other scalers (e.g. luminosity). • Scaler comparisons are also not available yet, but are anticipated. CHEP 2000, Padova, IT Feb. 7-11

Responding to Problems • The comparison techniques return a value which is passed to user-defined responses. • The responses are triggered if the comparison falls outside of allowed bounds. • Systems define the severity of the error based on the return value and determine how to respond to the error. • E-mail • Occurrence Logger • Multiple responses are possible for a single comparison. CHEP 2000, Padova, IT Feb. 7-11

Graphical Tools • The Occurrence Logger gives the shift crew a list of potential problems to investigate in real-time. Feed-back capabilities are being improved. • A custom GUI provides control and performance information of the automatic monitoring system so that it can be tuned. • Command-line administration is available for use with Run Control. • Integration with JAS is a longer-term goal. CHEP 2000, Padova, IT Feb. 7-11

Error Browser CHEP 2000, Padova, IT Feb. 7-11

Automatic Monitoring Control CHEP 2000, Padova, IT Feb. 7-11

Lessons Learned and Conclusions • This system requires significant user configuration. We would have benefited by providing an early prototype to familiarize users with what was coming. • The system has been shown to be well abstracted and extensible. • The system is now in production use comparing histograms against fixed references. • More advanced comparisons are coming soon. CHEP 2000, Padova, IT Feb. 7-11

Automatic Data Quality Monitoring in the BaBar Online and Offline Systems

Automatic Data Quality Monitoring in the BaBar Online and Offline Systems

Presentation Transcript

Online and Offline Integration

Online and Offline Computing systems in the PHENIX experiment

Data Systems Quality

Monitoring RSR Data Quality 201

Offline Trigger Monitoring

Data Quality Monitoring

The ALICE data quality monitoring

Brandbuilding – Offline and Online

Securing monitoring data quality by quality management in TCCs

OFFLINE TRIGGER MONITORING

The ALICE data quality monitoring

Online and Data Quality Monitoring

Offline monitoring

Online/Offline Data Transfer Discussions

CWG9 Data Quality Monitoring, Quality Assurance and Visualization

CMS Online-To-Offline Data Replication

Online and offline test

Data Quality Monitoring at LIGO

Data Quality monitoring and feedback system

Felidae Systems' Online Energy Monitoring Systems

Online And Offline Learning