160 likes | 179 Vues
This paper discusses the use of Very Lightweight Agents (VLAs) in the BTeV-RTES project, focusing on their platform independence, hardware and software monitoring capabilities, error prediction and logging, and scheduling and priorities of test events.
 
                
                E N D
BTeV-RTES Project Very Lightweight Agents: VLAs Daniel Mossé, Jae Oh, Madhura Tamhankar, John Gross Computer Science Department University of Pittsburgh
Shameless plug LARTES IEEE Workshop on Large Scale Real-Time and Embedded Systems In conjunction with IEEE Real-Time Systems Symposium (RTSS 2002 is on Dec 3-5, 2002) December 2, 2002 Austin, TX, USA http://www.rtss.org/LARTES.html
BTeV Test Station Collider detectors are about the size of a small apartment building. Fermilab's two detectors-CDF and DZero-are about four stories high, weighing some 5,000 tons (10 million pounds) each. Particle collisions occur in the middle of the detectors, which are crammed with electronic instrumentation. Each detector has about 800,000 individual pathways for recording electronic data generated by the particle collisions. Signals are carried over nearly a thousand miles of wire and cable. Information from FERMI National Accelerator Laboratory
L1/L2/L3 Trigger Overview Information from FERMI National Accelerator Laboratory
System Characteristics Software Perspective • Reconfigurable node allocation • L1 runs one physics application, severely time constrained • L2/L3 runs several physics applications, little time constraints • Multiple operating systems and differing processors • TI DSP BIOS, Linux, Windows? • Communication among system sections via fast network • Fault tolerance is essentially absent in embedded and RT systems
L1/L2/L3 Trigger Hierarchy Regional L2/L3 Manager (1) TimeSys RT Linux Regional Manager VLA Global Manager TimeSys RT Linux Global Manager VLA Regional L1 Manager (1) TimeSys RT Linux Regional Manager VLA Gigabit Ethernet Gigabit Ethernet Section Managers (8), RH 8.x Linux, Section Manager VLA Crate Managers (20), TimeSys RT Linux, Crate Manager VLA Linux Nodes (320) RH 8.x Linux Low-Level VLA Farmlet Managers (16) TimeSys RT Linux Farmlet Manager VLA DSPs (8) TI DSP BIOS Low-Level VLA Data Archive External Level
Very Lightweight Agents (VLAs) Proposed Solution: Very Lightweight Agent Minimize footprint Platform independence Monitor hardware Monitor software Comprehensible source code Communication with high-level software entity Error prediction Error logging and messaging Schedule and priorities of test events
Level 2/3 Farm Nodes Hardware OS Kernel (Linux) Physics Application VLA Physics Application Network API L2/L3 Manager Nodes VLAs on L1 and L2/3 nodes Level 1 Farm Nodes Hardware OS Kernel (DSP BIOS) Physics Application VLA Network API L1 Manager Nodes
DSP VLA VLA Error Reporting Level 1/2/3 Manager Nodes Hardware Linux Kernel ARMOR VLA Manager Application Network API To Network
VLA Error Prediction Buffer overflow: 1. VLA message or application data input buffers may overflow 2. Messages or data lost in each case 3. Detection through monitoring fill rate and overflow condition 4. High fill rate indicative of * high error rate, producing messages * undersized data buffers Throttled CPU: 1. Throttled from high temperature 2. Throttle by erroneous power saving feature 3. Causes missed deadlines due to low CPU speed 4. Potentially critical failure if L1 data not processed fast enough Note the the CPU may be throttled on purpose
FILTERS VLA Error Logging Hardware Failures Software Failures Communication API Message Buffer ARMOR 1. Reads messages 2. Stores/uses for error prediction 3. Appends appropriate info 4. Sends to archive TCP/IP Ethernet VLA Packages info: 1. Message time 2. Operational data 3. Environmental data 4. Sensor values 5. App & OS error codes 6. Beam crossing ID “15” Message Buffer Communication API Data Archive
VLA Scheduling Issues L1 trigger application has highest priority VLA must run sufficiently to ensure efficacy of purpose VLA must internally prioritize error tests VLA must preempt the L1 trigger app on critical errors Task priorities must be alterable during run-time
When physics app is unexpectedly ended, more VLAs can be scheduled Adaptive Resource Scheduling Kernel Physics Application VLA VLA VLA VLA VLA Kernel Physics Application VLA VLA has ability to control its own priority and that of other apps, based on internal decision making Alternative Scheduling Concept Kernel VLA Physics Application Kernel Physics Application VLA VLA Scheduling Issues Normal Scheduling Kernel Physics Application VLA VLA Kernel Physics Application VLA VLA
VLA Scheduling Issues External Message Source (FPGA) VLA Inhibitor Kernel No VLA Physics Application
Current Status VLA skeleton and timing implemented in Syracuse (poster) Hardware platform from Vandy Software (muon application) from Fermi and UIUC Linux drivers to use GME and Vandy devkit Near term Muon application to run on the dsp board Muon application timing Instantiate VLAs with Vandy hardware and Muon application VLA Status
Network usage influences amount of data dropped by Triggers and other Filters Network usage typically not considered in load balancing algorithms (assume network is fast enough) VLAs monitor and report network usage Agents use this information to re-distribute loads Network architecture to control flows on a per-process basis (http://www.netnice.org) VLA and Network Usage