100 likes | 197 Vues
Experiences through Grid Challenge Event. Yoshio Tanaka. Grid Challenge. A competition for programming on a Grid Main objectives For participants: To provide opportunities to use real Grid (for participants) For us:
E N D
Experiences through Grid Challenge Event Yoshio Tanaka
Grid Challenge • A competition for programming on a Grid • Main objectives • For participants: • To provide opportunities to use real Grid (for participants) • For us: • To understand obstacles/problems to make a Grid production level (1000cpus are shared by many users) • To have an opportunity to encourage participants to use our software (e.g. Ninf-G, GXP) • 30 students/graduates were participated in this event • Provide 960cpus testbed for participants • Schedule • Preliminary: Feb. 1 ~ Feb. 28 • Final round: Mar. 5 ~ Mar. 20
Grid Challenge • Two categories • Regular routine • A problem is provided • Graphic image analysis • count the number of objects • Ranked by the performance, i.e. which is the fastest program? • Free routine • Can do anything interesting • Could have experiences on running his own software on real Grid
Software • Software provided by the organizer • ssh • GXP • GT2 & batch & jobmanager • MPICH (p4) • Ninf-G2 • Other software can be installed by participants
Preparation ( ~ Feb. 1) • Administrators installed software in every site • Participants sent ssh public key • Administrators created accounts for all participants • Participants tested each cluster • login • compile • test run • Participants obtained Globus certificates from AIST GTRC CA (if necessary) • Participants sent Subject DN and administrators added their entries to grid-mapfile
Preparation (~ Feb. 1) (cont’d) • AIST provided • A document for obtaining Globus certificate • Test script for Globus • A “how-to” document and sample programs of Ninf-G2 • How to develop Ninf-G apps step-by-step • Obtain certificate • Test globus • Develop and run Ninf-G apps • client configuration file for the Grid challenge environment
Problems • 30 participants shared 960 cpus for one month • Some used ssh for process invocation • Some used GXP for process invocation • Some used Ninf-G2 for process invocation • Need to take care (many) trouble shooting • Some nodes went down • pbs daemon died • students usually made experiments in midnight • Interactive use of backend nodes (via ssh/GXP) was allowed • F32 prohibits interactive use • AIST could not provide F32
Problems (cont’d) • Participants expected that all processes would be launched immediately (co-allocation) • ssh/GXP enables it • Ninf-G2 could not expect • In order to keep fairness, we decided to change the configuration of batch queuing system • For each processor, set the max number of processes per user to 1 • Increased the max number of processes per processor to the number of participants (30) • This is an unusual configuration!!
Insights valuable for PRAGMA • Mixture of batch and interactive use introduce a problem • batch is expected to provide • dedicated environment • load balancing • Interactive use (via ssh) may disturb batch • But some middleware/apps require interactive use • co-allocation / grid-level scheduler is hard to solve • (basically) Applications should not expect all resources are available • Application developers need extra work for this feature • Possible solutions • Make application capable for using only available resources in as-is strategy • Implement co-allocation based on reservation • No grid-level reservation system yet • Should be done manually • Do we have the same problem in PRAGMA routine-basis experiments?