180 likes | 340 Vues
ATLAS Trigger/DAQ Workshop Chamonix 19-23 OCtober 1998. Reference Software Framework.
E N D
ATLAS Trigger/DAQ Workshop Chamonix 19-23 OCtober 1998 Reference Software Framework A.Belias5, R. Bock2, A. Bogaerts2, M.Boosten2, J. Bystricky8, D.Botterill5, D.Calvet8, P.Clarke9, J.M.Conte8, R.Cranfield9, G.Crone9, M.Dobson2, A. Dos Anjos2, S.Falciano7, F.Giacomini2, A.Guglielmi3, R.Hauser2, M.Huet8, R.Hughes-Jones4, K.Korcyl2, P.Le Du8, I.Mandjavidze8, R.P.Middleton5, D.Pelkman2, S.Qian6, J.Schlereth1, M.Sessler2, P.Sherwood9, S.Tapprogge2, W.Wiedenmann6, P.Werner2, F.Wickens5, H. Zobernig6 1Argonne National Lab, 2CERN, 3DEC Joint Project Office, 4University of Manchester, 5Rutherford Appleton Laboratory, 6University of Wisconsin, 7University of Rome, 8CEA Saclay , 9UCL A.Bogaerts2
Objectives • Full software chain from RoI data collection to trigger decision • common framework for different aspects within LVL2 (testbeds, algorithms, communications, special components) • Diversity of architectures • small systems for FeX algorithms, steering, trigger performance • large scale testbeds for network performance • HPCN farms (technology watch) • Decouple communications technology • TCP/IP or UDP over Ethernet to test functionality • allow alternative communications technologies (ATM, Ethernet, SCI, …) • Isolate algorithms (Steering, FeX) from architecture/framework • coping with distributed algorithms, distributed data • Implement on open, affordable platform • PCs with WNT or Linux • Good engineering, good quality software • introduction of software process • OO design & implementation • emphasis on portability (isolation of OS services, ANSI compliance, STL)
Functional Components • Supervisor (Emulator) • LVL1 Interface( provides RoI lists & LVL1 result to Steering) • EF interface ( passes LVL2 result to the Event Filter) • Communicates Final Decision to Event Filter/DAQ • Receives Event Copied message from Event Builder • Clears Readout Buffers • Steering • Global Decision driven by a Trigger Menu table (sequential or parallel) • schedules Feature Extraction as defined by Algorithm table • Feature Extraction • initiates Data Collection • extracts Features from RoI Data • Data Collection • initiates data transfers from RoBs (or reads from a file) • collates RoB fragments • Readout Buffer (Emulator) • data source for LVL2 and Event Filter/DAQ
Test Setups • Feature Extraction -- Single CPU • “stand alone” Feature Extraction for a single detector • purpose: algorithm development, physics performance • Single Node -- Single CPU • single node implementation of Steering & Feature Extraction for multiple detectors • data file reader replaces Supervisor and data Collector/RoB emulators • purpose: functional tests, physics performance, trigger strategy • Single Farm -- Multiple CPU • Steering & Feature Extraction combined in a single thread • multiple threads per Node • one processor farm (Steering & FeX), one RoB farm, one network • purpose: performance including communications, component tests • Split Steering/FeX Farms (HPCN cluster) -- Multiple CPU • assumes a single HPCN farm per subdetecor, one global farm • direct connection to detector data without external network (rob-ins) • requires a single small but high bandwidth external network • purpose: alternative for a Single Farm
Software Components • Applications • Application Framework (common to all functional components) • Emulators: Supervisor, RoB • Algorithms • Steering (“Stephandler”), FeX algorithms (presently TRT, Calorimeter) • Menu & algorithm table • ASCII data files • Run Control • Configuration File • Process manager, Run Control, Error Logging, Monitoring • client/server approach, compatible with DAQ/backend • Infrastructure • Operating System Services (encapsulates OS dependencies) • Object management • Message passing Interface and Buffer Management (encapsulates communications) • Application Interfaces to Run Control (Configuration database, Error Logging, Monitoring)
Proxy TRT Handler extract(…) extract(…) Proxy Steering poll(); retrieve(…) poll(); retrieve(…) Independence of underlying Architecture • All functional components (e.g. Supervisor, Steering, Algorithms, Data Collectors, RoB) are defined as objects • Objects interact through the invocation of methods (with arguments) • This simple model is always used to preserve independence of architecture • But CORBA-like techniques (RPC, proxy-objects, marshalling of arguments, serialisation of objects) are used to cross processor boundaries • Example: interface between TRT FeX Algorithm and Steering class TRTHandler { public: void extract (Region, EventId, Algorithm); // TRT Feature Extraction bool poll(); // polls if result has already been produced list <TRTTrack> retrieve(); // allows asynchronous execution }; extract(…) Super visor Steering TRTHandler TRTCollector RoB poll(); retrieve(…) Steering TRTHandler
Supervisor Steering FeX RC RoB Error Logger Config Information Histogram Application Framework [Architecture Independent & Data Location Independent] Application (Supervisor, RoB, Steering) Object Management • Distribution - proxies • Location - broker • Transport - messages Control Services [Hide Event Data Location] [Hide Network Technologies] Threads, Pipes, Synchronisation, Shared memory, Timers, Sockets, etc. Ethernet,SCI,ATM... (WNT,Linux,...) [Hide OS Specifics] Network Abstraction Operating System Abstraction
Objects, Threads, Processes, Processors • Objects • Since all functional components are objects they can easily be instantiated, exist in multiple copies or as multiple variants • Simplest architecture is the classical single threaded single process which is used for tests, development of algorithms and physics performance studies • Threads • multiple threading allows efficient parallelism (light weight process) and easy communication (shared address space) • parallelism allows overlap between program execution and I/O as well as exploitation of SMP systems • Processes • Use of multiple processes is only kept as a convenient substitute for multi-processors for testing as multi-threading is preferred • Processors • CORBA-like techniques with proxy-objects hide distribution of algorithms and data to preserve independence of architectures • Communication (with associated message passing, buffer management and queuing) is hidden in proxy-objects
Status • Design: March - Oct 1998 • Supervisor & RoB emulators, Application Framework, infrastructure • Steering and Feature Extraction algorithms • Prototyping: Sept - Oct 1998 • Single Node: Steering (“Stephandler”), FeX algorithms (TRT, Calorimeter), data files • Multi Node: Application Framework, Supervisor & RoB emulator, Messaging over TCP/UDP, Configuration, Error Reporting • CVS Repository • Full implementation Nov - Dec 1998 • Integration of communications technologies (ATM, Ethernet, SCI) • Full set of representative Algorithms (SCT in preparation …) • Complete run Control, Configuration, Error Reporting, Monitoring • Complete, improve and tune Application Framework and Emulators • Distribute turn-key system for WNT and Linux on PCs 98 99 TP < 2000 Design & Implementation Testbed Operations
Supervisor • Tasks of the Supervisor • Provides interface to Level 1 by collecting data needed to steer Level 2 decision • Receives back Level 2 decision from steering object • Notifies RoB object to release buffers • Level 1 information is summarized in an object, LVL1result • object contains data and access methods for steering object • RoI objects accessed as STL containers • 3 sources of Level 1 data • Internally generated distributions based on parameter file • Read from Event store to synchronize with RoB emulator, Steering • Read from RoI Builder hardware via S-link interfaces (not yet implemented) • Status • Simple Supervisor using Event Store source has been integrated with Application Framework; used to exchange messages with steering object
RoB Emulator • Follows the standard conventions of the Application Framework • It is a single communications “node” • It responds to • data requests originating from the RoB Data Collector • delete event <event list> originating from the Supervisor • Event Filter/DAQ allows three implementations: • mapping to the entire detector (a single RoB provides all data) • mapping to a subdetector (one RoB per subdetector) • mapping to part of a subdetector (each Rob holds a slice of the RoI data) • It may access the EventStore to obtain data from a file and preload events in memory
Error Reporting & Logging • Uses high level designed employed by DAQ-1 ‘s MRS • Independently implemented • Senders “inject” messages into ERL using C++ stream I/O to send ASCII strings • Receivers “subscribe” to get a subset of the messages (selection criteria) • Consists of: C++ API, Message Server, Command Module, Message Database • Status: all implemented and tested except Message Database ERL Server Receiver Sender Commander
Process Manager • Service Layer for Run Control to handle startup and shutdown of LVL2 tasks • Modeled after BE Process manager • Run Control sends requests to, receives replies from “Agents” • One Agent on each LVL2 node, started at boot time • Agents create/destroy LVL2 processes and maintain a process database • Status: not yet implemented Process Application replies requests requests info Client API Agent DataBase replies status
Run Control • Tasks: Startup of LVL2 processes; Control and Monitoring; Intervention in case of malfunctioning; Controlled shutdown; User Interface for interaction • Basic Element: Controller (concept borrowed from BE) • receives commands; reads Configuration DB; performs actions; reacts to events; reports status • Hierarchy of Controllers (tree structure but two levels expected to be sufficient) • State machines • each Controller represents processes under its control by a standard Finite State Machine • Status: not yet implemented commands events Controller Component under Control status actions Configuration Database
Integration • Lab equipped with 6 dual PII PCs booting Linux or WNT; monitor switch + 2 displays; AFS and NFS common file base; CVS sw repository • Fast Ethernet network with 8-port switch; 500Mbytes/s SCI/PCI network with 4-port switch • Single CPU/Linux: Steering, TRT and Calorimeter FeX, ASCII data files • Multi-CPU/Linux: Farm (Supervisor, 2 Steering/FeX, 2 RoB), dummy algorithms, Error Reporting • Idem for WNT • Multi-CPU/Linux farm with Algorithms, ASCII data files • SCI Message passing tested under WNT
Multi-CPU Farm Supervisor Eventlist LVL1Decisions Steering Proxy LVL1Results LVL2Results Farm Node Supervisor Proxy Queue of LVL1Decisions Workerthreads of Steering/FeX/DataCollectors RoB Proxy Farm Node Supervisor Proxy Queue of LVL1Decisions Workerthreads of Steering/FeX/DataCollectors RoB Proxy Data Requests Data Responses ASCII Data File RoB EventList DataCollector Proxy RoB EventList DataCollector Proxy
Summary • High Level Design of Functional Components (including Steering, TRT and Calorimeter algorithms) finished • Single CPU system prototype: Steering, 2 FeX algorithms (TRT, Calorimeter) with data read from ASCII files prototyped and integrated. • Multi-CPU farm prototype: Application Framework (Farm node) integrated with Supervisor and RoB Emulators. RoB RoB Supervisor ERL TCP/Fast Ethernet Steering FeX Steering FeX
Pending items • handling of the LVL2 result • software performance evaluation • quality assurance of the software • system robustness, error recovery • monitoring • construction of testbeds • integration of technologies