40 likes | 164 Vues
This document outlines the implementation of a data processing system (DPS) for the DC2 project, detailing the integration of pipelines and slices using Python and C++. Key features include event handling, policy configuration, and MPI communications. The system supports multiple stages, clipboard management, and robust exception handling, ensuring synchronized processing across slices. Performance metrics and stability observations are discussed, along with open questions regarding API design, communication strategies, and potential enhancements for future iterations.
E N D
DPS for DC2 Summary • Model Implementation • Pipeline & Slice in Python and C++ • Stage Loop, Policy configuration, Event handling in Python • MPI env and communications in C++ • Executable scripts (run by mpiexec): runPipeline.py, runSlice.py • Pipeline and Slice configured from same Policy file • Clipboard, Queue, Stage in Python • One Clipboard per Pipeline/Slice used in DC2 • New: Generic Stages • InputStage, OutputStage, EventStage, SymLinkStage • Model elements not completed • Complete C++ implementation • Pipeline-Slice communication of data (DataProperty’s) • Full Queue capabilities • Clipboard metadata: less ad hoc mechanism (schema?)
DPS for DC2 Summary (cont.) • Key Features • Events handled prior to Stage execution • Policy designates the stages that require a trigger event • Pipeline receives Events from external sources => events to Slices • MPI Communications are collective • All Slices need to be present, running thru Stage loop • Slices process each Stage in sync : MPI_Bcast, MPI_Barrier • Exception handling in important places • Exceptions from stage preprocess(), process(), postprocess() caught • If one Slice catches an Exception, others are undisturbed. • Multiple visits supported • Shutdown event implemented • Clean shutdown of MPI env/Slices at the end of Stage loop • Todo: a “no more data event” of the same topic as trigger events • Logging integrated into Pipeline/Slice • Memory management (Clipboard cleanup) stabilized
DPS: DC2 and Beyond • Results • Three Parallel Pipelines executing Application Stages • Reasonable stability observed (~36 Slices across 6 nodes) • Performance: e.g., Utilization of 8 cores ? • Open Questions • Stage API : preprocess(), process(), postprocess() • Has this model been useful (validated)? • Direct MPI Communications • Finer communication between Pipeline/Slices? • Avoid events, collective operations? • Restart a Slice that disappears? • Slice/CCD mapping • Should these mapping strategies be integral part of dps? • High level script to run pipelines • run.sh, startPipeline.py? • Should dc2pipe/ scripts be incorporated into dps?
lsst1 lsst2 lsst3 lsstN ActiveMQ Mule MySQL Event System