1 / 12

GeantV scheduling framework revisited (aka V3)

Explore the new features of GeantV version 3, including a generic vector flow approach, improved scheduling framework, and NUMA awareness. Discover the performance and memory improvements compared to version 2.

Télécharger la présentation

GeantV scheduling framework revisited (aka V3)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GeantV scheduling framework revisited (aka V3) Andrei Gheata Weekly GeantV Meeting May 16, 2017

  2. Features GeantV scheduling framework revisited

  3. GeantV version 3: A generic vector flow approach loop loop vector scalar e.g. ComptonFilter::DoIt Handler “i” Handler 1 GeantPropagator Basketizer“I” Basketizer 1 virtual DoIt(track) workers GeantPropagator virtual Select(track) default behavior to override SimulationStage SimulationStage SimulationStage virtual DoIt( , ) SimulationStage Select next stage if different from: SimulationStage::fFollowUp GeantTaskData Stage buffer AddTrack(track, ) Stage buffer Stage buffer Stage buffer GeantTrack * Processing flow per thread Event server lane0 laneN Stack-like buffer lane1 … secondaries… primaries GeantV scheduling framework revisited

  4. Processing flow per propagator/NUMA node Event server Stage buffers Select Select Select Field prop. Linear prop. Volume2 Process2 PhysicsStage PropagationStage GeometryStage Basketizer Basketizer Basketizer Basketizer Process1 Volume1 Handlers Scalar code Scalar DoIt() Vector DoIt() Vectorized code Threads on same propagator/socket GeantV scheduling framework revisited

  5. Stack-like handling of tracks Stepping loop Stack-like buffer buffer buffer buffer buffer buffer buffer buffer buffer PreStepStage DiscreteProcStage XSecSamplingStage SteppingActionsStage PropagationStage ContinuousProcStage GeomQueryStage Generation 0 (primaries) Generation 0 (primaries) buffer Generation 1 Generation 1 buffer Generation 2 Generation 2 GeantConfig::fNstackLanes Generation 3 Generation 3 buffer Generation 4 Generation 4 buffer Generation 5 Generation 5 Generation 6 Generation 6 buffer Generation 7 Generation 7 Generation 8 Generation 8 buffer Generation > 10 Generation > 10 Number of lanes flushed into the stepping loop controlled by: GeantConfig::fNmaxBuffSpill GeantV scheduling framework revisited

  6. Performance V3 versus V2, Memory, scalability, NUMA, tuning knobs GeantV scheduling framework revisited

  7. Memory control Stack-like control using a special buffer inserted in the stepping loop Higher generation secondaries flushed with priority Very good behavior even for high number of threads/secondaries GeantV scheduling framework revisited

  8. NUMA awareness 8.5% Implemented using hwloc > 1.8 Enumerating NUMA nodes, cores, CPU’s Threads are bound to CPU’s A propagator will use threads bound to the same NUMA node More propagators can be bound to the same NUMA node Compact policy used for threads on same propagator, scatter for distributing propagators on different nodes Task data stage buffers, stack-like buffer, baskets and tracks bound to memory on the same node as the propagator owning the thread 1% GeantV scheduling framework revisited

  9. Scalability Not as good as expected Interaction between threads lesser, removed contingency points, SOA basketizing, no more basket queue Profiling comparison N/2N threads does not reveal obvious hotspots To be further pursued Memory operations are high in the profile, we expect picture to improve when having a more balanced scenario with more (vector) work on physics side. GeantV scheduling framework revisited

  10. Performance v3 versus v2 40% 80% Relevant improvements in both single and multi-threaded mode Coming mostly from the increase of locality (simulation stages) Removal of SOA gather/scatter overheads NUMA awareness Yardstick measurements to be redone GeantV scheduling framework revisited

  11. Tuning knobs Much less than before – complexity went down by large factor… Stack-like buffer parameters: up to 10% influence on performance Basket size used by basketizers: negligible impact now, expected to become important for vectorized stages NUMA placement parameters (number of propagators, threads per propagator): up to 10% impact on performance Many V2 knobs became obsolete… GeantV scheduling framework revisited

  12. Where we go from here • Implementation of simulation stages for New Physics • Abstraction of the interface defining stages needed for library decoupling • Mapping of actions performed by processes with stages, Geant::Handler interface for invoking models • Interfacing/testing vector physics • Currently all stages basketized by default, no overhead observed so far • Completion of interfaces for handling user actions • Porting examples to using them • … and all the rest… GeantV scheduling framework revisited

More Related