300 likes | 424 Vues
This paper presents a modular approach to implementing parallel data structures using the Bulk-Synchronous Parallel (BSP) model, optimizing data processing in machine learning applications. It discusses the BSML programming language, focusing on dictionaries, sets, and load balancing. The modular design facilitates straightforward maintenance while ensuring that interfaces closely resemble sequential ones. By integrating automatic parallelization into structured parallelism, the work aims to enhance the usability of parallel computing for non-computer scientists and improve performance through effective load balancing techniques.
E N D
A Modular Implementation of Parallel Data Structures in Bulk-Synchronous Parallel ML Frédéric Gava F. Gava, HLPP 2005
Outline • Introduction; • The BSML language; • Implementation of parallel data structures in BSML: • Dictionaries; • Sets; • Load-Balancing. • Application; • Conclusion and futur works. F. Gava, HLPP 2005
Introduction Automatic Parallelization Structured Parallelism Concurrent Programming Algorithmic Skeletons BSP Data Structures Skeletons • Parallel Computing for speed; • To complex for many non-computer scientists; • Need for models/tools of parallelism. F. Gava, HLPP 2005
Introduction (bis) • Observations: • Data Structures also important as algorithms; • Symbolic computations used massively those data structures. • Suggested solution, parallel implementations of data structures: • Interfacesas close as possible to the sequential ones; • Modular implementations to have a straightforwardmaintenance; • Load-balancing of the data. F. Gava, HLPP 2005
BSML Outline: • Introduction; • BSML; • Parallel Data Structures in BSML; • Application; • Conclusion and futur works. F. Gava, HLPP 2005
Bulk-Synchronous Parallelism + Functional Programming = BSML • Advantages of the BSP model: • Portability; • Scalability, deadlock free; • Simple cost model performance prediction. • Advantages of functional programming: • High-level features (higher order functions, pattern-matching, concrete types, etc…); • Savety of the environment; • Programs Proofs (proof of BSML programs using Coq). F. Gava, HLPP 2005
The BSMLLanguage • Confluent language: deterministic algorithms; • Library for the « Objective Caml » language (called BSMLlib); • Operations to access to the BSP parameters; • 5 primitives on a parallel data structure called parallel vector: • mkpar:create a parallel vector; • apply: parallel point-wise application; • put: send values within a vector; • proj: parallel projection; • super: BSP divide-and-conquer. F. Gava, HLPP 2005
A BSML Program f0 g0 f1 g1 … … fp-1 gp-1 Sequential part Parallel part F. Gava, HLPP 2005
Parallel Data Structures in BSML Outline: • Introduction; • BSML; • Parallel Data Structures in BSML; • Application; • Conclusion and futur works. F. Gava, HLPP 2005
General Points • 5 modules: Set, Map, Stack, Queue, Hashtable; • Interfaces: • Same as O’Caml ones; • With some specific parallel functions (squeletons) as parallel reduction; • Pure functional implementationx (for functional data); • Manual or Automatic load-balancing. F. Gava, HLPP 2005
Modules in O’Caml • Interface: module type Compare = sig type elt val compare : elt -> elt -> int end • Implementation: module CompareInt = struct type elt=int let tools = ... let compare = ... end module AbstractCompareInt = (CompareInt : Compare) • Functor: module Make(Ord: Compare) = struct type elt = Ord.elt type t = Empty | Node of t * elt * t * int let mem e s = ... end
Parallel Dictionaries • A parallel map (dictionary) = a mapon each processor: module Make (Ord : OrderedType)(Bal:BALANCE) (MakeLocMap:functor(Ord:OrderedType) -> Map.S with type key=Ord.t) = struct module Local_Map = MakeLocMap(Ord) type key = Ord.t type 'a t = ('a Local_Map.t par) * int * bool type seq_t = Local_Map.t (* operators as skeletons *) end • We need to re-implement all the operations (data skeletons). F. Gava, HLPP 2005
Insert a Binding • add: key 'a 'a t 'a t If rebalanced Otherwise F. Gava, HLPP 2005
Parallel Iterator Let cardinal pmap=ParMap.fold (fun _ _ ii+1) 0 pmap • Foldneed to respect the order of the keys; • Parallel map sequential map; • Too many communications… • async_fold: (key'a'b'b)'a t'b'b par let cardinal pmap=List.fold left (+) 0 (total(ParMap.async fold (fun _ _ ii+1) pmap 0)) F. Gava, HLPP 2005
Parallel Sets • A sub-set on each processor; • Insert/Iteration as parallel maps; • But with some binary skeletons; • Load-balancing of couples of parallel sets using thesuperposition. F. Gava, HLPP 2005
Difference • 3 cases: • Two normal parallel sets; • One of the parallel sets has been rebalanced; • The two parallel sets have been rebalanced; • Imply a problem with duplicate elements. F. Gava, HLPP 2005
Difference (third case) S1 S2 F. Gava, HLPP 2005
Load-Balancing (1) • « Same sizes » of the local data structures; • Better performances for parallel iterations; • Load-Balancing in 2 super-steps (M. Bamha and G. Hains) using a histogram F. Gava, HLPP 2005
Load-Balancing (2) • Generic code of the algorithm: Select « n » messages Data || datas • rebalance: (par) (int g list * ) • ( ) (glist ) • (par ) ( int par) • Union Messages data Datas data || Histogram Data || F. Gava, HLPP 2005
Application Outline: • Introduction; • BSML; • Parallel Data Structures in BSML; • Application; • Conclusion and futur works. F. Gava, HLPP 2005
Computation of the « nth » nearest neighbors atom in a molecule • Code from «Objective Caml for Scientists » (J. Harrop); • Molecule as a infinitely-repeated graph of atoms; • Computation of sets differences (the neighbors); • Replace « fold » with « async_fold »; • Experimentswitha silicate of 100.000 atoms and with a cluster of 5/10 machines (Pentium IV, 2.8 Ghz, Gigabit Ethernet Card). F. Gava, HLPP 2005
Conclusion and Futur Works Outline: • Introduction; • BSML; • Parallel Data Structures in BSML; • Application; • Conclusion and futur works. F. Gava, HLPP 2005
Conclusion • BSML=BSP+ML; • Implementation of some data structures; • Modular for a simple development and maintenance; • Pure functional implementation; • Cost prediction with the BSP model; • Generic Load-balancing; • Application. F. Gava, HLPP 2005
Futur Works • Proof of the implementations (pure functional); • Implementation of another data structures (tree, priority list etc.); • Application to another scientist problems; • Comparison with another parallel ML (OCamlP3L, HirondML, OCaml-Flight, MSPML etc.); • Development of a modular and parallel graph library: • Edges as parallel maps; • Vertex as parallel sets. F. Gava, HLPP 2005