1 / 60

Karabo: The European XFEL software framework Design Concepts

Karabo: The European XFEL software framework Design Concepts. Burkhard Heisen for WP76 Novemeber , 2013. The star marks concepts, which are not yet implemented in the current release. Functional requirements. A typical use case:. Control drive hardware and complex experiments

vui
Télécharger la présentation

Karabo: The European XFEL software framework Design Concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Karabo: The European XFEL software frameworkDesign Concepts Burkhard Heisen for WP76 Novemeber, 2013 The star marks concepts, which are not yet implemented in the current release

  2. Functional requirements A typical use case: Control drive hardware and complex experiments monitor variables & trigger alarms DAQ data readout online processing quality monitoring (vetoing) Sample Injector allow some control & show hardware status Undulator Accelerator Beam Transport show online data whilst running setup computation & show scientific results DM storage of experiment & control data data access, authentication authorization etc. Control DAQ SC processing pipelines distributed and GPU computing specific algorithms (e.g. reconstruction) DM SC Tight integration of applications Burkhard Heisen (WP76)

  3. Functionality: What are we dealing with? Distributed end points and processes Data containers (Hash, Schema, Image, …) Data transport (data flow, network protocol) Process control (automation, feedback) States (finite state machines, sequencing, automation…) Data acquisition(front end hardware) Time synchronization/tagging (time stamps, cycle ids, etc.) Real-time needs (where necessary) Central services (archive, alarm, name resolution, …) Security (who’s allowed to do what from where?) Statistics(control system itself, operation, …) Logging(active, passive, central, local) Processing workflows (parallelism, pipeline execution, provenance) Clients / User interfaces (API, languages, macro writing, CLI, GUI) Software management (coding, building, packaging, deployment, versioning, …) Burkhard Heisen (WP76)

  4. Distributed end points and processes • Concept: Device Server Model • Similar to: TANGO, DOOCS, TINE* • Elements are controllable objects managed by a device server. • Instance of such an object is a device, with a hierarchical name. • Deviceclassescan be loaded at runtime (plugins) • Actions pertaining to a device given by its properties andcommands • i.e. get, set, monitorsome propertyor executesome command • Properties, commands, and (optionally) associated FSM logic statically defined and further described (attributes) in device class. Dynamic (runtime) extension of properties and commands possible. • Devices can be written in either C++ or Python (later maybe also Java) and run on either Linux, MacOSX or Windows (later) Burkhard Heisen (WP76)

  5. DETAIL: Distributed endpointsConfiguration - API Class: MotorDevice static expectedParameters( Schema& s ) { FLOAT_ELEMENT(s).key(“velocity”) .description(“Velocity of the motor”) .unitSymbol(“m/s”) .assignmentOptional().defaultValue(0.3) .maxInc(10) .minInc(0.01) .reconfigurable() .allowedStates(“Idle”) .commit(); INT32_ELEMENT(s).key(“currentPosition”) .description = “Current position of the motor” .readOnly() .warnLow(10) […] SLOT_ELEMENT(s).key(“move”) .description = “Will move motor to target position” .allowedStates(“Idle”) […] } // Constructor with initial configuration MotorDevice( const Hash& config ) { […] } // Called at each (re-)configuration request onReconfigure( const Hash& config ) { […] } Property Any Device uses a standardized API to describe itself. This information is used to automatically create GUI input masks or for auto-completion on the IPython console Attribute We distinguish between properties and commandsand associated attributes, all of them can be expressed within the expected parameters function Command No need for device developers to validate any parameters. This is internally done taking the expectedParameters as white-list Properties and commands can be nested, such that hierarchical groupings are possible Burkhard Heisen (WP76)

  6. DETAIL: Distributed end points and processesCreating a new device Write a class (say: MyDevice) that derives from Device Compile it into a shared library (say libMyDevice.so) Select a running Device-Serveror start a new one Copy the libMyDevice.so to the plugins folderofthe Device-Server The Device-Server will emit a signal to the broker that a new Device class is available, it ships the expected parameters as read from static context of the MyDevice class libMy Device.so plugins signalNewDeviceClassAvailable (.xsd) GUI-Srv Master Central DB GUI Burkhard Heisen (WP76)

  7. DETAIL: Distributed end points and processesCreating a new device Given the mask of possible parameters the usermay fill a valid configuration and emit an instantiate signal to the broker The configuration will be validated by the Device factory and if valid, an instance of MyDevice will be created The constructor of the device class will be called and provided with the configuration The run method will be called which starts the state-machine and finally blocks by activating the event-loop The device will asynchronously listen to allowed events (slots) guided by the internal state machine factory: create(“MyDevice”, xml) plugins MyDevice1 signalInstantiate(“MyDevice”, xml) GUI-Srv Master Central DB GUI Burkhard Heisen (WP76)

  8. Data containers (Hash, Image/Matrix/Vector) • Concept: Have some containers for which Karabo provides special support • Hash • String-key, any-value associative container • Keeps insertion order (iteration possible), hash performance for random lookup • Provide (string-key, any-value) attributes per hash-key • Fully recursive structure (i.e. Hashes of Hashes) • Serialization: XML, Binary, HDF5, DB • Usage: configuration, device-state cache, database interface, message protocol, etc. • Schema • Describes possible/allowed structures for the Hash. In analogy: Schema would be for Hash, what an XSD document is for an XML file • Associates meta-data (called attributes) to properties • Image/Matrix/Vector • Some default containers needed for scientific computing • Seamless switching between CPU and GPU representation • Optimized serialization (network transfer) Burkhard Heisen (WP76)

  9. Data transport (data flow, network protocol) • Concept: Separation between broker based (less frequent, smaller data size) and point-to-point (frequent, large data size) communicationCommunication is cross-network, cross-language and cross-platform • Broker based • Highly available full N x N communication between devices of any category (Control, SC, DAQ, DM) via broker • Patterns: signal/slots, request/response, simple call • Point-to-Point • Transient (run-time) establishment of direct (brokerless) connections between devices • TCP-based, high performance for huge data • Asynchronous IO, memory optimization if local Burkhard Heisen (WP76)

  10. DETAIL: Data transportCommunication: Event-Driven vs. Scheduled Event-driven communication “Push Model” A minimal set of information is passed System is scalable (maintains performance) Failure is harder to detect Device 2 Device 2 Notify Notify Emit Device 3 Device 3 Device 1 Device 1 Notify Device 4 Device 4 Scheduled communication“Poll Model” Direct feedback on request Nodes may be spammed (DOS) Growing systems loose performance Typically, lots of extra traffic is generated Request Response Burkhard Heisen (WP76)

  11. DETAIL: Data transportBroker based communication - API Communication happens between ordinary (member, or free-standing) functions Functions on distributed instances are identified by a pair of strings, the instanceIdand the functionName The instanceId uniquely identifies a (e.g. device-)instance connected to a specific topic of the broker The functionName uniquely identifies an ordinary function registered under a given instanceId Functions of any signature (currently up to 4 arguments) can be registered to be remotely callable Registration can be done at runtime without extra tools Function calls can be done cross-network, cross-operating-system and cross-language (currently, C++and Python, Java will follow) The language’s native data types are directly supported as arguments A generic, fully recursive, key to any-value container (Hash) is provided as a data-type for complex arguments Burkhard Heisen (WP76)

  12. DETAIL: Data transportBroker based communication – Three Patterns Device2 Device3 SLOT(onFoo, int, std::string); void onFoo(constint& i, std::string& s) { } Device1 SIGNAL(“foo”, int, std::string); connect(“Device1”, “foo”, “Device2”, “onFoo”); connect(“”, “foo”, “Device3”, “onGoo”); connect(“”, “foo”, “Device4”, “onHoo”); emit(“foo”, 42, “bar”); Device4 SLOT(onGoo, int, std::string); void onGoo(constint& i) { } Notify Notify Emit SLOT(onHoo, int, std::string); void onHoo(constint& i, std::string& s) { } Notify Signals & Slots • SLOT ( function, [argTypes] ) • SIGNAL ( funcName, [argTypes] ) • connect ( signalInstanceId, signalFunc, slotInstanceId, slotFunc) • emit ( signalFunc, [args] ) Burkhard Heisen (WP76)

  13. DETAIL: Data transportBroker based communication - Patterns SLOT(onFoo, std::string); void onFoo(conststd::string& s) { } call(“Device2”, “onFoo”, “bar”); Device2 Device2 Notify Call Device1 Device1 • Request / Reply • request ( instanceId, funcName, [reqArgs] ).timeout( msec ).receive( [repArgs] ) SLOT(onFoo, int); void onFoo(constint& i) { reply( i + i ); } int number; request(“Device2”, “onFoo”, 21).timeout(100).receive(number); Request Notify Reply Notify Direct Call • call ( instanceId, funcName, [args] ) Burkhard Heisen (WP76)

  14. DETAIL: Data TransportIllustration Device Instance Device-Server Application Message Broker (Event Loop) APD Camera Disk Storage HV Pump Load Store Simulate Cali- brate1 Cali- brate2 Terminal(s) Logger GUI Server Device Sub Control GUI(s) RDB Burkhard Heisen (WP76)

  15. Process control (automation, feedback) • Concept: Single device processes vs. multiple device processes • Processes which involve a single device and e.g. some hardware • Implementation of a software FSM that mirrors the hardware FSM • Automation and feedbacks implemented using software FSM events. Events may be internally triggered (auto) or exposed to control system (interactive/manual) • Processes which involve coordination of multiple devices (non real-time) • Process is abstracted into parent device which sub-instructs children devices (composition). • Control system protects children devices from direct user control. • Parent devices FSM describes process automation/feedback. • Parent device is device and device-controller in person. Burkhard Heisen (WP76)

  16. DETAIL: Process controlA standardized hardware device • Concepts • The hardware is always safe even without software • Coupling between h/w devices at a “lower” lever than Karabo can exist (real time) • The authority (h/w or s/w) may be different and even change during runtime • A generic state transition table design exists, which allows for flexible h/w control Enter Error on exception which is not caught in FSM s/w thread s/w device’s call to onError() (only used in composite devices) Exit Error click reset button which moves s/w device to AllOk’s Initialization, where the h/w status is requested and the correct state (or Error) moved to depending on the reply Ok reset Error onException Enter HardwareError generic h/w error status bit is set (by PLC) Exit HardwareError click reset button calls resetHardwareAction() which should make any actions to ‘reset’ h/w , if not successful HardwareError Is reentered (eventually – timeout?) reset / action HardwareError onHwError CommunicationError reset* onComError Enter CommunicationError Heartbeat from PLC not received by BeckhoffCom BeckhoffCom dead Broker dead Exit CommunicationError reset*, the * means driven by internal recovery where no user action required (or possible) none Readjusting onOutOfSync [ autonomous] Burkhard Heisen (WP76)

  17. States (finite state machines, sequencing, automation…) • Concept: Devices optionally run finite state machines (FSMs) inside • Devices can implement a custom or inherit a common FSM • Events into the FSM can be triggered internally (automation, sequencing) or made device commands(remotely trigger-able) • The FSM provides four hooks fitting into the event-driven API style of devices (onGuard, srcStateOnExit, onTransitionAction, tgtStateOnEntry) • Any (writable) property or command can be access restricted according to the device’s current state. This is done using the attribute allowedStates. • As allowedStates is an attribute (and thus part of the static XSD) any UI system is able to pro-actively reflect the currently (state dependent) settable properties and commands. The command-line interface uses this information to provide state-aware auto-completion whilst the GUI uses it for widget-disabling(grey out). Burkhard Heisen (WP76)

  18. Detail: Device – Finite state machine (FSM) Start Stop State Machine Initialization none OK Stopped // Ok Machine FSM_TABLE_BEGIN(OkTransitionTable) // SrcState Event TgtState Action Guard Row< Started, StopEvent, Stopped, StopAction, none >, Row< Stopped, StartEvent, Started, StartAction, none > FSM_TABLE_END FSM_STATE_MACHINE(Ok, OkTransitionTable, Stopped, Self) stop start Started errorFound reset // Top Machine FSM_TABLE_BEGIN(TransitionTable) Row< Initialization, none, Ok, none, none >, Row< Ok, ErrorFoundEvent, Error, ErrorFoundAction, none >, Row< Error, ResetEvent, Ok, ResetAction, none > FSM_TABLE_END KARABO_FSM_STATE_MACHINE(StateMachine,TransitionTable,Initialization, Self) Error Burkhard Heisen (WP76) Any device uses a standardized way to express its possible program flow • The state machine calls back device functions (guard, onStateExit, transitionAction, onStateEntry) • The GUI is state-machine aware and enables/disables buttons proactively

  19. DETAIL: StatesFinite state machines – There is a UML standard State Machine: the life cycle of a thing. It is made of states, transitions and processes incoming events. State: a stage in the life cycle of a state machine. A state (like a submachine) can have an entry and exit behaviors Event: an incident provoking (or not) a reaction of the state machine Transition: a specification of how a state machine reacts to an event. It specifies a source state, the event triggering the transition, the target state (which will become the newly active state if the transition is triggered), guard and actions Action: an operation executed during the triggering of the transition Guard: a boolean operation being able to prevent the triggering of a transition which would otherwise fire Transition Table: representation of a state machine. A state machine diagram is a graphical, but incomplete representation of the same model. A transition table, on the other hand, is a complete representation Burkhard Heisen (WP76)

  20. DETAIL: StatesFSM implementation example in C++ (header only) // Events FSM_EVENT2(ErrorFoundEvent, onException, string, string) FSM_EVENT0(EndErrorEvent, endErrorEvent) FSM_EVENT0(StartEvent, slotMoveStartEvent) FSM_EVENT0(StopEvent, slotStopEvent) Regular callable function (triggers event) // States FSM_STATE_EE(ErrorState, errorStateOnEntry, errorStateOnExit) FSM_STATE_E(InitializationState, initializationStateOnEntry) FSM_STATE_EE(StartedState, startedStateOnEntry, startedStateOnExit) FSM_STATE_EE(StoppedState, stoppedStateOnEntry, stoppedStateOnExit) Transition table element Regular function hook (will be call-backed) // Transition Actions FSM_ACTION0(StartAction, startAction) FSM_ACTION0(StopAction, stopAction) Transition table element // AllOkState Machine FSM_TABLE_BEGIN(AllOkStateTransitionTable) // SrcState Event TgtState Action Guard Row< StartedState, StopEvent, StoppedState, StopAction, none >, Row< StoppedState, StartEvent, StartedState, StartAction, none > FSM_TABLE_END FSM_STATE_MACHINE(AllOkState, AllOkStateTransitionTable, StoppedState, Self) // StartStop Machine FSM_TABLE_BEGIN(StartStopTransitionTable) Row< InitializationState, none, AllOkState, none, none >, Row< AllOkState, ErrorFoundEvent, ErrorState, ErrorFoundAction, none >, Row< ErrorState, EndErrorEvent, AllOkState, EndErrorAction, none > FSM_TABLE_END KARABO_FSM_STATE_MACHINE(StartStopMachine, StartStopMachineTransitionTable, InitializationState, Self) FSM_CREATE_MACHINE(StartStopMachine, m_fsm); FSM_SET_CONTEXT_TOP(this, m_fsm) FSM_SET_CONTEXT_SUB(this, m_fsm, AllOkState) FSM_START_MACHINE(m_fsm) Burkhard Heisen (WP76)

  21. DETAIL: StatesFSM implementation example in Python # Events FSM_EVENT2(self, ‘ErrorFoundEvent’, ‘onException’) FSM_EVENT0(self, ‘EndErrorEvent’, ‘slotEndError’) FSM_EVENT0(self, ‘StartEvent’, ‘slotStart’) FSM_EVENT0(self, ‘StopEvent’, ‘slotStop’) # States FSM_STATE_EE(‘ErrorState’, self.errorStateOnEntry, self.errorStateOnExit ) FSM_STATE_E( ‘InitializationState’, self.initializationStateOnEntry ) FSM_STATE_EE(‘StartedState’, self.startedStateOnEntry, self.startedStateOnExit) FSM_STATE_EE(‘StoppedState’, self.stoppedStateOnEntry, self.stoppedStateOnExit) # Transition Actions FSM_ACTION0(‘StartAction’, self.startAction) FSM_ACTION0(‘StopAction’, self.stopAction) #AllOkState Machine allOkStt = [ #SrcState Event TgtState Action Guard (‘StartedState’, ‘StartEvent’, ‘StoppedState’, ‘StartAction’, ‘none’), (‘StoppedState’, ‘StopEvent’, ‘StartedState’, ‘StopAction’, ‘none’) ] FSM_STATE_MACHINE(‘AllOkState’, allOkStt, ‘InitializationState’) # Top Machine topStt = [ (‘InitializationState’, ‘none’, ‘AllOkState’, ‘none’, ‘none’), (‘AllOkState’, ‘ErrorFoundEvent’, ‘ErrorState’, ‘none’, ‘none’), (‘ErrorState’, ‘EndErrorEvent’, ‘AllOkState’, ‘none’, ‘none’) ] FSM_STATE_MACHINE(‘StartStopDeviceMachine’, topStt, ‘AllOkState’) self.fsm = FSM_CREATE_MACHINE(‘StartStopMachine’) self.startStateMachine() Burkhard Heisen (WP76)

  22. Data acquisition • Concept: FEM -> PC-Layer -> Online-Cache • PCL machines run highly tuned deviceswhich write data to file (online cache) as fast as possible. • Online cache is (one possible) data source for Karabo’s workflow system. Burkhard Heisen (WP76)

  23. Real time needs (where necessary) • Concept: Karabo itself does not provide real time processes • Real time processes (if needed) must be defined and executed in layers below Karabo. Karabo devices will only start/stop/monitor real time processes. • Examples: Beckhoffmotor-coupling, Beckhoff feedback systems, etc… Burkhard Heisen (WP76)

  24. Time synchronization (time stamps, cycle ids, etc.) • Concept: Any changed property will carry timing information as attribute(s) • Time information is assigned per property • Karabo’s timestamp consists of the following information: • Seconds since unix epoch, uint64 • Fractional seconds (up to atto-second resolution), uint64 • Train ID, uint64 • Time information is assigned as early as possible (best: already on hardware) but latest in the software device • On event-driven update, the device ships the property key, the property value and associated time information as property attribute(s) • Real-time synchronization is not subject to Karabo • Correlation between control system (monitor) data and instrument data will be done using the archived central DB information (or information previously exported into HDF5 files) Burkhard Heisen (WP76)

  25. DETAIL: Time synchronizationDistributed Train ID clock • Concept: A dedicated machine with a time receiver board (h/w) distributes clocks on the Karabo level • Scenario 1: No time information from h/w • Example: commercial cameras • Timestamp is associated to the event-driven data in the Karabo device • If clock signal is too late, the next trainId is calculated (extrapolated) given the previous one and the interval between trainId'sThe interval is configurable on the Clock device and must be stable within a run. Error is flagged if clock tick is lost. • Scenario 2: Time information is already provided by h/w • The timestamp can be taken from the h/w or the device (configurable). The rest is the same as in scenario 1. creates timestamp and associates to trainId Clock Device signals: trainId epochTime interval Time receiver board Burkhard Heisen (WP76)

  26. Central services (archive, alarm, name resolution, …) • Concept: Karabo’s central aspects will be reflected within a database • All properties of all devices will be archived into DBin an event-driven way by default • Any property carries an “archive policy” attribute to reduce or switch-off archiving • Karabo is user centric (login at client start-up), the DB will provide all needed information to perform later access control on devices • Any user-specific GUI settings will be saved to DB • The DB gives access to all pre-configuration (user-centric) of future device instances • Name resolution is handled by the message broker (filtering on broker, not client) • Besides the broker, other central services are technically not needed. • GUI clients are not directly talking to the broker but are going through a GUI server • Distributed alarm conditionsare planned to be handled by python devices that can check any (distributed) condition and can be instantiated (armed) at need Burkhard Heisen (WP76)

  27. Central services - Name resolution/access • Concept: The only central service needed is the broker, others are optional • Start-up issues • A fixed ID can (optionally) be provided prior start-up (via command line or file) • If no instance ID is provided the ID is auto-generated locally • Servers: hostname_Server_pid • Devices: hostname-pid_classId_counter • Any instance ID is validated (by request-response trial) prior startup • Running system issues • The engine for all inter-device communication is the DeviceClient class • The DeviceClient abstracts the SignalSlotable layer into a set of functions • instantiate, kill, set, execute, get, monitor etc. • The DeviceClient can act without a central entity and be started anytime • The DeviceClientcan act as master itself and boost performance of other DeviceClients • Master DeviceClients can come and go, everything is handled transparently Burkhard Heisen (WP76)

  28. Central services – Data archiving Device Instances • Concept: A central data logger device collects event driven data and persists • The data logger is a device which is listens to all other devices • The event-driven information is cached in form of a Hash object for some time and then persisted to either file or DB or both • Information is stored in a per parameter manner • Next to the parameter values the current valid schema is saved as well Device Instance Device-Server Instance Message Broker Device-Server Instance Logger GUI-Srv GUI-Client GUI-Client Master Device-Server Instance Central DB Burkhard Heisen (WP76)

  29. DETAIL: Access levels We will initially have five access levels (enum) with intrinsic ordering • ADMIN = 4 • EXPERT = 3 • OPERATOR = 2 • USER = 1 • OBSERVER = 0 Any Device can restrict access globally or on a per-parameter basis • Global restriction is enforced through the “visibility” property (base class) • Only if the requestor is of same or higher access level he can see/use the device • The “visibility” property is part of the topology info (seen immediately by clients) • Parameter restriction is enforced through the “requiredAccessLevel” schema-attribute • Parameter restriction typically is set programmatically but may be re-configured at initialization time (or even runtime?) • The “visibility” property might be re-configured if the requestors access level is higher than the associated “requiredAccessLevel” (should typically be ADMIN) • The default access level for settable properties and commands is USER • The default access level for read-only properties is OBSERVER • The default value for the visibility is OBSERVER Burkhard Heisen (WP76)

  30. DETAIL: Access levels A role is defined in the DB and consists of a default access level and a device-instance specific access list (overwriting the default level) which can be empty. • SPB_Operator • defaultAccessLevel => USER • accessList • SPB_* => OPERATOR • Undulator_GapMover_0 => OPERATOR • Global_Observer • defaultAccessLevel => OBSERVER • Global_Expert • defaultAccessLevel = EXPERT After authentication the DB computes the user specific access levels considering current time, current location and associated role. It then ships a default access and an access level list back to the user. • If the authentication service (or DB) is not available, Karabo falls back to a compiled default access level (in-house: OBSERVER, shipped-versions: ADMIN) For a ADMIN user it might be possible to temporarily (per session) change the access list of another user. Burkhard Heisen (WP76)

  31. DETAIL: Security GUI or CLI Device username password provider ownIP* brokerHost* brokerPort* brokerTopic* Broker-Message userId sessionToken defaultAccessLevel accessList Locking: if is locked: if is __uid== owner then ok Access control: if __accessLevel>= visibility: if __accessLevel>=param.accessLevelthen ok Header […] __uid=42 __accessLevel=“admin” Body […] GUI-Srv Authorizes Computes context based access levels Central DB Burkhard Heisen (WP76)

  32. Statistics (control system itself, operation, …) • Concept: Statistics will be collected by regular devices • OpenMQ implementation provides a wealth of statistics (e.g. messages in system, average flow, number of consumers/producers, broker memory used…) • Have a (broker-)statistic device that does system calls to retrieve information • Similar idea for other statistical data Burkhard Heisen (WP76)

  33. Logging (active, passive, central, local) • Concept: Categorized into the following classes • Active Logging Additional code (inserted by the developer) accompanying the production/business code, which is intended to increase the verbosity of what is currently happening. • Code Tracing Macro based, no overhead if disabled, for low-level purposes • Code Logging Conceptual analog to Log4j, network appender, remote and at runtime priority (re-)configuration • Passive Logging Recording of activities in the distributed event-driven system. No extra coding is required from developers, passive logging transparently records system relevant events. • Broker-message logging Low-level debugging purpose, start/stop, not active during production • Transactional logging Archival of the full distributed state Burkhard Heisen (WP76)

  34. Processing workflows (parallelism, pipeline execution, provenance) • Concept: Devices as modules of a scientific workflow system • Configurable generic input/output channels on devices • One channel is specific for one data structure (e.g. Hash, Image, File, etc.) • New data structures can be “registered” and are immediately usable • Input channel configuration: copy of connected output’s data or share the data with other input channels, minimum number of data needed • ComputeFsm as base class, developers just need to code the compute method • IO system is decoupled from processing system (process whilst transferring data) • Automatic (API transparent) data transfer optimization(pointer if local, TCP if remote) • Broker-based communication for workflow coordination and meta-data sharing • GUI integration to setup workflows graphically (drag-and-drop featured) • Workflows can be stored and shared (following the general rules of data privacy and security) executed, paused and stepped Parallel execution Burkhard Heisen (WP76)

  35. DETAIL: Processing workflowsParallelism and load-balancing by design • Devices within the same device-server: • Data will be transferred by handing over pointers to corresponding memory locations • Multiple instances connected to one output channel will run in parallel using CPU threads • Devices in different device-servers: • Data will be transferred via TCP • Multiple instances connected to one output channel will perform distributed computing Memory CPU-threads TCP Distributed processing • Output channel technically is TCP server, inputs are clients • Data transfer model follows an event-driven poll architecture, leads to load-balancing and maximum per module performance even on heterogeneous h/w • Configurable output channel behavior in case no input currently available: throw, queue, wait, drop Burkhard Heisen (WP76)

  36. DETAIL: Processing workflowsGPU enabled processing • Concept: GPU parallelization will happen within a compute execution • The data structures (e.g. image) are prepared for GPU parallelization • Karabo will detect whether a given hardware is capable for GPU computing at runtime, if not falls back to corresponding CPU algorithm • Differences in runtime are balanced by the workflow system CPU IO whilst computing Pixel parallel processing (one GPU thread per pixel) Notification about new data possible to obtain GPU Burkhard Heisen (WP76)

  37. Clients / User interfaces (API, languages, macro writing, CLI, GUI) • Concept: Two UIs – graphical (GUI) and scriptable command line (CLI) • GUI • Have one multi-purpose GUI system satisfying all needs • See following slides for details • Non-GUI • We distinguish APIs for programmatically set up of control sequences (others call those Macros) versus and API which allows interactive, commandline-based control (IPython based) • The programmatic API exists for C++ and Python and features: • Querying of distributed system topology (hosts, device-servers, devices, their properties/commands, etc.): getServers, getDevices, getClasses • instantiate, kill, set, execute (in “wait” or “noWait” fashion), get, monitorProperty, monitorDevice • Both APIs are state and access-role aware, caching mechanisms provide proper Schema and synchronous (poll-feel API) although always event-driven in the back-end • The interactive API integrates auto-completion and improved interactive functionality suited to iPython Burkhard Heisen (WP76)

  38. GUI: What do we have to deal with? Client-Server (network protocol, optimizations) User management (login/logout, load/save settings, access role support) Layout (panels, full screen, docking/undocking) Navigation (devices, configurations, data, …) Configuration (initialization vs. runtime, loading/saving, …) Customization (widget galleries, custom GUI builder, composition, …) Notification (about alarms, finished pipelines, …) Log Inspection (filtering, configuration of log-levels, …) Embedded scripting (iPython, macro recording/playing) Online documentation (embedded wiki, bug-tracing, …) Kerstin Weger (WP76)

  39. Client-Server (network protocol, optimizations) Message Broker • Concept: One server, many clients, TCP • Server knows what each client user sees (on a device level) and optimizes traffic accordingly • Client-Server protocol is TCP, messages are header/body style using Hash serialization (default binary protocol) • Client side socket will be threaded to decouple from main-event loop • On client start server provides current distributed state utilizing the DB, later clients are updated through the broker • Image data is pre-processed on server-side and brought into QImage format before sending onChange information only related to “A” I only see device “A” Master GUI-Srv Central DB GUI-Client Kerstin Weger (WP76)

  40. User management (login/logout, load/save settings, access role support) • Concept: User centralized, login mandatory • Login necessary to connect to system • Access role will be computed (context based) • User specific settings will be loaded from DB • View and control is adapted to access role • User or role specific configuration and wizards are available userId accessRole session username password Authorizes Computes context based access role Central DB Kerstin Weger (WP76)

  41. Layout (panels, full screen, docking/undocking) • Concept: Six dock-able and slide-able (optionally tabbed) main panels • Panels are organized by functionality • Navigation • Custom composition area (sub-GUI building) • Configuration (non-tabbed, changes view based on selection elsewhere) • Documentation (linked and updated with current configuration view) • Logging • Notifications • Panels and their tabs can be undocked (windows then belongs to OS’s window manager) and made full-screen (distribution across several monitors possible) • Custom composition area (central panel) will be optimized for phones and tablets • GUI behaves natively under MacOSX, Linux and Windows Kerstin Weger (WP76)

  42. DETAIL: LayoutDefault panel arrangement, docking and sliding Configuration Custom composition area Navigation Logging / Scripting console Documentation Notifications Kerstin Weger (WP76)

  43. Navigation (devices, configurations, data, …) • Concept: Navigate device-servers, devices, configurations, data(-files), etc. • Different views (tabs) on data • Hierarchical distributed system view • Device ownership centric (view compositions) • Available configurations • Hierarchical file view (e.g. HDF5) • Automatic (by access level) filtering of items • Auto select navigation item if context is selected somewhere else in GUI Kerstin Weger (WP76)

  44. Configuration (initialization vs. runtime, loading/saving, …) • Concept: Auto-generated default widgets for configuring classes and instances • Widgets are generated from device information (.xsd format) • 2-column layout for class configuration (label, initialization-value) • 3-column layout (label, value-on-device, edit-value) for instance configuration • Allows reading/writing properties (all data-types) • Allows executing commands (as buttons) • Is aware about device’s FSM, enables/disables widgets accordingly • Is aware about access level, enables/disables widgets accordingly • Single, selection and all apply capability Kerstin Weger (WP76)

  45. Customization (widget galleries, custom GUI builder, composition, …) • Concept: Combination of PowerPoint-like editor and online properties/commands with changeable widget types • Tabbed, static panel (does not change on navigation) • Two modes: Pre-configuration (classes) and runtime configuration (instances) • Visual composition of properties/commands of any devices • Visual composition of devices (workflow layouting) • Data-type aware widget factory for properties/commands (edit/display) • PowerPoint-like tools for drawing, arranging, grouping, selecting, zooming of text, shapes, pictures, etc. • Capability to save/load custom panels, open several simultaneously Kerstin Weger (WP76)

  46. DETAIL: CustomizationProperty/Command composition Display widget (Trend-Line) Editable widget drag & drop Display widget Kerstin Weger (WP76)

  47. DETAIL: CustomizationProperty/Command composition Display widget (Image View) Display widget (Histogram) drag & drop Kerstin Weger (WP76)

  48. DETAIL: CustomizationDevice (workflow) composition Workflow node (device) drag & drop Draw connection Kerstin Weger (WP76)

  49. DETAIL: CustomizationExpert panels - Vacuum Open/Save panel view Cut, copy, paste, remove item Group items Change between “Design/Control” mode Bring to front/back Insert text, line, rectangle, … Rotate, scale item Kerstin Weger (WP76)

  50. Notification (about alarms, finished runs, …) • Concept: Single place for all system relevant notifications, will link-out to more detailed information • Can be of arbitrary type, e.g.: • Finished experiment run/scan • Finished analysis job • Occurrences of errors, alarms • Update notifications, etc. • Intended to be conceptually similar to now-a-days smartphone notification bars • Visibility and/or acknowledgment of notifications may be user and/or access role specific • May implement some configurable forwarding system (SMS, email, etc.) Kerstin Weger (WP76)

More Related