1 / 30

Cilk ++

Cilk ++. Kristoffer Stensen Bjørn Fevang. History. Developed since 1994 at the MIT Laboratory for Computer Science Commercial version , Cilk ++, developed by Cilk Arts, Inc. Intel Corporation acquired Cilk Arts in 2009 Released Intel Cilk Plus in 2010. Principle.

dalmar
Télécharger la présentation

Cilk ++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cilk++ Kristoffer StensenBjørn Fevang

  2. History • Developedsince 1994 at the MIT Laboratory for Computer Science • Commercial version, Cilk++, developed by Cilk Arts, Inc. • Intel Corporation acquiredCilk Arts in 2009 • Released Intel Cilk Plus in 2010

  3. Principle • Programmer responsible for exposingparallelism • Run-time environment divides workbetweenprocessors

  4. Three keywords • cilk_spawn • cilk_sync • cilk_for • Faithfullinguisticextensionof C++ • Serial elision: Removalofthecilkkeywords

  5. DAG Model ofMultithreading • Vertices: instructions • Edges: dependenciesbetweeninstructions • x precedes y, x≺y:x must completeybefore y starts • Neither x≺y, nor y≺x:x and y areparallel (x∥y)

  6. The Work Law • Work: The total amountof time spent in all instructions • Equals execution time on 1 processor: T1 TP≥T1/P

  7. The Span Law • Span: The longestpathofdependencies in the DAG • Equals thetheoretically fastest time the DAG could be executed on a computer with an infinitenumberofprocessors: T∞ TP≥ T∞

  8. 1≺2≺3≺6≺7≺8≺11≺12≺18

  9. Parallelism • The ratio ofwork to span: T1/ T∞

  10. Runtime System • Multiprocessor scheduling is NP-complete • Cilk++ employswork-stealing • Provablytightbounds • The runtime system exploits an arbitrarynumberofcoresnearoptimally • Negligible overhead on single core(less than 2%)

  11. PerformanceBounds • Expectedrunning timeTP ≤ T1/P + O(T∞)

  12. PerformanceBounds • Bounds on stackspace

  13. WorkStealing • Runtime system allocates as many operating-system threads (workers) as thereareprocessors • Worker’sstackoperates like a queue • Spawnedsubroutine’sactivationframe is pushedontothebottomofthestack • Popped from thebottomwhenreturned

  14. WorkStealing • Workersthat run outofworkbecomesthieves and stealthe top frame from anothervictim • Stack is a double-endedqueue • Sufficientparellelism leads to infrequentstealing • Negligible communication and synchronizationcosts

  15. WorkStealing • Adaptswell in multiprogrammedcomputingenvironments • Performance-composable programs

  16. Race Detection • Strand: a sequenceofseriallyexecutedinstructionscontainingnoparallelcontrol • Data race: logicallyparallel strands accessthe same shared location, withnolocks in common, and at least one ofthe strands writesto the location

  17. Cilkscreen • Race detectorbased on provablygoodalgorithms • Guarantees to report a race bug if the race bug is exposed • Identifiestheparallelcontrolconstructs in theexecutingapplicationprecisely • Tracksthe series-parallel relationships of strands • Localizesthe race in theapplicationsourcecode

  18. ReducerHyperobjects • Mitigate races on nonlocal variables withoutcreatinglockcontention or requiringcoderestructuring

  19. Locking • May createbottleneck • Candestroyparallelism • Jumbles up the order

  20. Restructuring • Accumulate and concatenate lists • Time-consuming • May require expert skill

  21. ReducerHyperobject • Linguisticconstruct • Strands have different «views» ofthe same object • A strand canaccess and changeit’sview’s state independently • Viewsarecombinedwithreduce()-method

  22. Reference • Charles E. Leiserson, The Cilk++ concurrencyplatform, 2010

More Related