370 likes | 863 Vues
-. Eddy Caron. Lego Team from GRAAL. Anne Beno î t (McF) Eddy Caron (McF) Frédéric Desprez (DR) Yves Caniou (McF) Rapha ë l Bolze (PhD) Pushpinder Kaur Chouhan (PhD) Jean-S é bastien Gay (PhD) Cedric Tedeschi (PhD). Lego Team from GRAAL. Anne Beno î t (McF) Eddy Caron (McF)
E N D
- Eddy Caron
Lego Team from GRAAL • Anne Benoît (McF) • Eddy Caron (McF) • Frédéric Desprez (DR) • Yves Caniou (McF) • Raphaël Bolze (PhD) • Pushpinder Kaur Chouhan (PhD) • Jean-Sébastien Gay (PhD) • Cedric Tedeschi (PhD) E. Caron - Réunion de lancement LEGO - 10/02/06
Lego Team from GRAAL • Anne Benoît (McF) • Eddy Caron (McF) • Frédéric Desprez (DR) • Yves Caniou (McF) • Raphaël Bolze (PhD) • Pushpinder Kaur Chouhan (PhD) • Jean-Sébastien Gay (PhD) • Cedric Tedeschi (PhD) E. Caron - Réunion de lancement LEGO - 10/02/06
DIET Architecture MA MA JXTA MA MA LA FAST library Application Modeling System availabilities LDAP NWS Client Master Agent MA Server front end LA LA LA Local Agent E. Caron - Réunion de lancement LEGO - 10/02/06
Data Management Join work with G. Antoniu, E. Caron, B. Del Fabbro, M. Jan
Data/replica management Client A Server 1 B F B B X Server 2 G Y Client • Two needs • Keep the data in place to reduce the overhead of communications between clients and servers • Replicate data whenever possible • Two approaches for DIET • DTM (LIFC, Besançon) • Hierarchy similar to the DIET’s one • Distributed data manager • Redistribution between servers • JuxMem (Paris, Rennes) • P2P data cache • NetSolve • IBP (Internet Backplane Protocol) : data cache • Request Sequencing to find data dependences • Work done within the GridRPC Working Group (GGF) • Relations with workflow management E. Caron - Réunion de lancement LEGO - 10/02/06
Data management with DTM within DIET • Persistence at the server level • To avoid useless data transfers • Intermediate results (C, D) • Between clients and servers • Between servers • “transparent” for the client • Data Manager/Loc Manager • Hierarchy mapped on the DIET one • modularity • Proposition to the Grid-RPC WG (GGF) • Data handles • Persistence flag • Data management functions E. Caron - Réunion de lancement LEGO - 10/02/06
Performances Performance (A = C * B) E. Caron - Réunion de lancement LEGO - 10/02/06
Performances Performance (C = A * B; D = E + C; A =tA) Performances E. Caron - Réunion de lancement LEGO - 10/02/06
JUXMEM PARIS project, IRISA, France • A peer-to-peer architecture for a data-sharing service in memory • Persistence and data coherency mechanism • Transparent data localization • Toolbox for the development of P2P applications • Set of protocols • One peer • Unique ID • Several communication protocols (TCP, HTTP, …) Peer ID Peer ID Peer ID Peer ID Peer ID Peer ID Peer ID Peer ID Peer Peer Peer Peer Peer Peer Peer Peer Peer TCP/IP Firewall Peer Peer Peer Firewall Peer Peer HTTP E. Caron - Réunion de lancement LEGO - 10/02/06
Visualization Work with Raphaël Bolze
VizDIET: A visualization tool • Current view of the DIET platform • A postmortem analysis from log files is available • Good scalability • We can show : • Communication between agents • State of SeD • Available Services • Persistent Data • Name information • CPU, memory and network load. E. Caron - Réunion de lancement LEGO - 10/02/06
LogService • CORBA communications • Messages ordering and scheduling • Messages filtering • System state E. Caron - Réunion de lancement LEGO - 10/02/06
LogService & DIET LogService Componant LogManager (LM) LogCentral Each LogManager receives information from agent and send them to LogCentral out of DIET structure. VizDiet shows graphicaly all messages from LogService Message transfert from agent using LogManager No disc storage E. Caron - Réunion de lancement LEGO - 10/02/06
VizDIET v1.0 XML: - DIET Agents - DIET Servers - Physical Machines - Physical Storage VizDIET Distributed DIET Deployment LogService GoDIET E. Caron - Réunion de lancement LEGO - 10/02/06
Screenshot : Platform Visualization E. Caron - Réunion de lancement LEGO - 10/02/06
Screenshots: Statistic module E. Caron - Réunion de lancement LEGO - 10/02/06
Platform Deployment Work from E. Caron, P.-K. Chouhan and A. Legrand
GoDIET: A tool for automated DIET deployment • Automate configuration, staging, execution and management of distributed DIET platform • Support experiments at large scale • Faster and easier bulk testing • Reduce errors & debugging time for users • Constraints: • Simple XML file • Console & batch mode • Integrate w/ visualization tools and CORBA tools [wrote in Java] E. Caron - Réunion de lancement LEGO - 10/02/06
DIET usage with contrib services Déploiement distribué de DIET Administration de DIET Traces Sous-ensemble de traces GoDIET LogService XML Sous-ensemble de traces VizDIET E. Caron - Réunion de lancement LEGO - 10/02/06
Launch process • GoDIET follows DIET hierarchy in launch order • For each element to be launched: • Configuration file written local disk[including parent agent, naming service location, hostname and/or port endpoint…] • Configuration file staged remote disk (scp) • Remote command launched (ssh)[PID retrieved, stdout & stderr saved on request] • Feedback from LogCentral used to time launch of next element E. Caron - Réunion de lancement LEGO - 10/02/06
GoDIET Console • java -jar GoDIET.jar vthd4site.xml E. Caron - Réunion de lancement LEGO - 10/02/06
GoDIET: before launch E. Caron - Réunion de lancement LEGO - 10/02/06
GoDIET: after launch • 27 sec launch w/ waiting for feedback E. Caron - Réunion de lancement LEGO - 10/02/06
Grid’5000 DIET deployment • 7 sites / 8 clusters • Bordeaux, Lille, Lyon, Orsay, Rennes, Sophia, Toulouse • 1 MA • 8 LA • 574 SeD E. Caron - Réunion de lancement LEGO - 10/02/06
Scheduling Work with Alan Su, Peter Frauenkron, Eric Boix
The scheduling • Plug-in scheduler • Round robin as default scheduling • Advanced scheduling only possible with more information. • Existing schedulers in DIET use data of FAST and/or NWS. • Limitations: • deployment of appropriate hierarchies for a given grid platform is non-obvious • limited consideration of inter-task factors • non-standard application- and platform-specific performance measures • FAST,NWS : low availability, SeD idles, for NWS no default weighting difficult (possible?). E. Caron - Réunion de lancement LEGO - 10/02/06
Plugin Scheduling • Plugin scheduling facilities to enable • application-specific definitions of appropriate performance metrics • an extensible measurement system • tunable comparison/aggregation routines for scheduling • composite requirements enables various selection methods • basic resource availability • processor speed, memory • database contention • future requests E. Caron - Réunion de lancement LEGO - 10/02/06
CoRI CoRI CoRI - Easy FAST other • Collector: an easy interface to gathering performance and load about a specific SeD. • Two modules (currently): CoRI-Easy and FAST • Possible to extend (new modules): Ganglia, Nagios, R-GMA, Hawkeye, INCA, MDS, … • Using fast and basic functions or simple performance tests. • Keep the independence of DIET. • Able to run on “all” operating systems to allow a default scheduling with basic information. E. Caron - Réunion de lancement LEGO - 10/02/06
Batch and parallel submissions Work with Yves Caniou
Difficulties of the problem agent agent SeD_seq SeD_batch SeD_parallel • Several SeD types • Parallel or sequential jobs • Submit a parallel job (pdgemm,...) • Transparent for the user • General API
SeD_parallel agent agent SeD_parallel • SeD_parallel on the frontal • Submit a parallel job→ system dependant • NFS: copy the code ? • MPI: LAM, MPICH ? • Reservation ? • Monitoring& Perf. prediction Frontal NFS
SeD_batch agent agent SeD_batch GLUE OAR SGE LSF PBS Condor Loadleveler • SeD_batch on the frontal • Submit a parallel job → even more system dependent • Previous mentioned problems • Numerous batch systems→ homogenization ? • Batch sched. behavior→ queues, scripts, etc.
Batch & parallel submissions • Asynchronous, long term production jobs • Still more problems • System dependent, numerous batch systems and their behavior • Performance prediction !→ Application makespan in function of #proc?→ If reservation available, how to compute deadline? • Scheduling problems→ Do we reserve when probing? How long hold it?→ How to manage data transfers when waiting in the queue? • Co-scheduling? • Data & job migration?
Future work • LEGO applications with DIET • CRAL (RAMSES) • CERFACS • TLSE (Update) • Components and DIET • Which architecture ? • Deployment • Link between ADAGE and theoretical solution on cluster [IJHPCA06] ? • Anne Benoît approach • … E. Caron - Réunion de lancement LEGO - 10/02/06
Questions ? http://graal.ens-lyon.fr/DIET