1 / 13

ONCS Computing in the ER

ONCS Computing in the ER. Success stories of “online monitoring in the counting house” Offline and Online computing environment consolidation efforts ONCS software highlights Objectivity news Time classes New manuals ROOT multithreading news Worries and concerns. Success Stories (DCH).

Télécharger la présentation

ONCS Computing in the ER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ONCS Computing in the ER • Success stories of “online monitoring in the counting house” • Offline and Online computing environment consolidation efforts • ONCS software highlights • Objectivity news • Time classes • New manuals • ROOT multithreading news • Worries and concerns

  2. Success Stories (DCH) • We went the “whole length” to set the DCH folks up with their monitoring in the Counting House • Before that they would take data, transfer it to RCF, fire up STAF, look at the data,… • Now: Take data, look at the results right away. • Doesn’t sound too tricky, we have done something like that, well, in April 1997, but here... • we have wrapped STAF modules, PHOOL, PhHistogramFactory, basically the whole offline environment • shared library versions have to match • both sides (RCF and ONCS) have their own set of version constraints (ROOT version, egcs versions, libstdc++, and so on • wrapped STAF and “big-event” DD are true memory hogs (had to tweak the machine setups, 0.5GB virtual memory) • The routine operation is more like near-line monitoring, analysis is done from a file from a very short run • Tassilo and I funneled those data through a DD pool, no problem. • No difference whether you read from file or from a DD pool.

  3. DCH monitoring Drift time distributions for X and stereo (UV) wires... …taken online.

  4. Online and Offline environment (DPM & MLP) • We want to arrive at a state where the user can run his or her programs in RCF and the Counting house without re-linking. • We give the “general login script” another shot • several failures to get that in the past years • stop the proliferation of account-specific “do-it-all” .login files which are hard to maintain • source one script which sets up the environment for you, anywhere (*) • centrally maintained, changes picked up by everyone right away, no more stale paths • allow a standard Redhat Linux box without root access (but with afs) to use this • in the counting house we will use local copies of most software, no hard dependence on AFS • script will adapt to local software, use AFS distribution else • after executing that script, you should be able to run the analysis software no matter where you are. (*) we concentrate on Linux and Solaris for now

  5. ONCS Software highlights • ONCS “Run Control” is now the de-facto standard for taking data in the counting house • gives you “one-window operation” of DAQ, timing, DD system, start/stop • gives you fancy scripting capabilities • has become very robust over the past couple of weeks • DD pool can handle “huge events” (more a sysadmin thing) • DD pool interface is now the right “policy” by default (right for online monitoring) • new account “phnxoncs” to run standard DAQ, phoncs account used to develop DAQ • lower the chance to screw up • give the right (todays’s) environment for the not-so-proficient user • Some updates in Event library • NEW/PRO scheme a la CERNLIB in place

  6. data file data file MDC2 data file data file DD Pool DD Pool DD Pool DD Pool dpipe dpipe dpipe ddEventiterator ddEventiterator ddEventiterator testEventiterator DD pools as I like them New utility Worked very reliably, shuffled 10GB through

  7. Objectivity news • We (DPM & MLP) successfully set up a federated database with two “autonomous partitions” spanning RCF and the counting house. • An autonomous partition is a part of the federation which is independent from other partitions. • It has its own lock server • It has its own host machine(s) • In normal running, the partitioning is invisible (unless you want to know). • In case of a failure, the partitions will continue to function for processes only accessing data within one partition • This is the case for most DAQ processes (run control, etc), so if RCF is down, it won’t affect the data taking (and vice versa, but when are our machines ever down?) It’s a major step forward towards doing away with a lot of ASCII-based configuration files.

  8. Database time stamps We have talked about time stamps for database entries (and the time tags in the Event headers) in the past. So far we didn’t have a good common solution. A tag in the database should give you the time an entry was made somewhat accurately, but time isn’t good enough to identify individual events from the DAQ. You will need that capability to drop individual events from the analysis later, or to maintain a “hit list” of your favorite events. We will have a “composite” of time tag and run/event number for identifying events, and time only for things like HV readings, temperature readings, time a FPGA code was used, and so on. Our “time” supports comparisons, time windows, <, >, ==, +, so it’s easy to find out whether an “event” is within a certain time window (such as validity range for a calibration) or not. We will use that on Unix, NT, and VxWorks. And the best: Offline and online use the same time format!

  9. And the winner is: the VMS time format Why VMS? It is a well-known, widely-used (also outside VMS) format and at least one of us is a VMS nostalgic :-) 100 ns granularity, 64 bit tick counter since “Smithsonian base date” 00:00 17-Nov-1858 Unix (Posix): granularity 1s (Unix date, etc) We won’t come close to making use of the 100ns granularity now but we might be able to do better (NT clock, correlate accelerator ticks with time ticks, a common time base, something) in the future and will be ready 64 bit integer ready to be fed to conversion routines, and then to ctime, etc VMS date (CH) ticks Unix date(+timezone) 26-APR-1999 14:51:50.00 44318551100000000 Mon Apr 26 10:51:50 1999 6-AUG-2034 14:51:50.00 55452055100000000 Sun Aug 6 10:51:50 2034

  10. Y10K and Y31K bugs COMPONENT: SYSTEM TIME OP/SYS: VMS, Version 4.n LAST TECHNICAL REVIEW: 06-APR-1988 SOURCE: Customer Support Center/Colorado Springs This base time of Nov. 17, 1858 has since been used by TOPS-10, TOPS-20, and VAX/VMS. Given this base date, the 100 nanosecond granularity implemented within VAX/VMS, and the 63-bit absolute time representation (the sign bit must be clear), VMS should have no trouble with time until: 31-JUL-31086 02:48:05.47 At this time, all clocks and time-keeping operations within VMS will suddenly stop, as system time values go negative. Note that all time display and manipulation routines within VMS allow for only 4 digits within the 'YEAR' field. We expect this to be corrected in a future release of VAX/VMS sometime prior to 31-DEC-9999.

  11. New Manuals Over the past few weeks, we have beefed up the manuals for the DD, Message system, and, believe it or not, the Event library. Html and postscript, mostly automatically generated. Fun reading. Has some real-life examples, in-depth reference information, a must for everybody analyzing data! we now return to our presentation.

  12. Root status (multithreading) • At the FNAL workshop I agreed to re-visit the “multi-threaded Root” issue. • Some progress: I was able to run ROOT with 2 threads. • only for compiled code (that’s ok) • heavy use of mutex’es because most ROOT classes are not thread-safe • this is just the beginning, far from a working system • MLP2’s socket classes and shared memory are fallback solutions, not very efficient. • I see people roll their own -- better help us! I had to put that project on the back burner for the past two weeks.

  13. Worries and concerns • Too little coordination with the online monitoring efforts. The “tool sharing”/common solution concept doesn’t catch on (what little there is is all by the same author). • We see people roll their own monolithic solutions -- that will hurt us badly soon. Instead, better use your time to work with us and make something which more than one subsystem can use. • Very little time contingency, when things like the hacker incident happen, other projects are delayed. • Number of machines: It’s crowded, but far from being enough CPU power. We need more over time. • Space: where do we put more terminals? • Memory: most Linux machines have 64Mb. Offline sees the minimum at 256, better 512. We should upgrade as soon as we can. Institutional contributions, anyone?

More Related