240 likes | 370 Vues
This presentation explores the complexities of grid initiatives in Hungary, focusing on the ClusterGrid project. It addresses significant challenges in a production environment, including the need for clear objectives, system transparency, security, compatibility, and user support. The discussion outlines management issues related to computing resources, storage, and user engagement, as well as monitoring strategies. A look into future challenges highlights the increasing demand for reliable computing and data storage infrastructures, emphasizing the importance of interoperability and international standards in grid systems.
E N D
Production Grid Challenges in Hungary Péter Stefán Ferenc Szalai Gábor Vitéz NIIF/HUNGARNET
Agenda • Brief introduction • Grid initiatives - ClusterGrid • Challenges in a production environment • Generic ClusterGrid operation model • Management issues • User support • Monitoring • ClusterGrid future challenges • Conclusions
collaborative infrastructure computing infrastructure middleware infrastructure networking infrastructure Brief NIIF Introduction • Hungarian NREN. Videoconference, central HA cluster GRID supercomputing VPNs, VoIP, directory service IP, IPv6, MPLS, lambda etc 10G backbone, ~600.000 users, ~750 institutions
Supercomputers • Consists of 2 SUN E15Ks and 2 SUN 10Ks located at two universities, including 276 CPUs, 300 GB of memory. • Used to be in the top 500. • In production since 2001. • Serves more than 200 users, and 100 scientific projects.
Hungarian grid initiatives, MGKK • Hungarian grid initiatives can be classified into grid infrastructure and grid system development projects. • Key role-players formulate grid collaboration: Hungarian Grid Competence Center (MGKK) involving BUTE, ELUB, MTA-SZTAKI, NIIF/HUNGARNET, KFKI, University of Veszprém. • Intensive participation in many national and European grid initiatives: EGEE, NorduGrid, SEE-GRID, etc.
ClusterGrid initiative • It is a pool of 1400 PC nodes throughout the country involving more than 26 clusters. • Production infrastructure since July 2002. • Supercomputer clusters are planned to be involved too. • A rough measurement on the total compute capacity is about 600 Gflops. • Even though it is much smaller than regional, continental grids, in complexity it is at the same range.
Challenges in production environment • Grid definition - set clear objectives what to build • Simplicity - keep the system transparent, usable • Completeness - cover not only application level • Security - using computer networking methods (MPLS, VLAN technologies) • Compatibility - other grids (X509, LDAP) • Manageability - easy-to-maintain • Robustness - fault tolerant behavior • Usability - cover many job classes, user support • Platform independency - to be able to execute on MS
Some new ideas… • MPLS, VLAN connected resources • Web-transaction based resource broker • Dynamic, separated run-time environment
Challenges in production … cont’d … • Management • physical compute resources (supercomputers, clusters), • virtual resources (virtual clusters), • storage nodes, • users, • services • User support • Grid architecture monitoring
Storage management • Low level management of disks and volumes, file systems (cost efficient storage solutions by using ATA over Ethernet - AoE). • Medium level access management (gridFTP, FTPS). • High level data brokering (extended SRM model).
User management • User personal data is kept in an LDAP based directory service separately from authentication data. • Aided by a web registration interface. • Authentication: • X509 certificates, • LDAP based authentication. • No authorization yet.
Service management (experimental) • Relatively new direction. • It is a special service. • It is based on well-established authorization. • Basically helps to start, stop, (re)configure grid services.
User support • Grid service provider gives user support covering: • consultation about the benefits of grid usage, • code porting and optimization, • partial aid in code implementation, • job formation and execution, • generic grid usage. • Not yet covered: • model creation, • formal description, • algorithm creation.
ClusterGrid monitoring • Fluctuation of grid cluster resources between the day-shift and night-shift operation. • Blue line – total; Green area – occupied. • 2-layer hierarchical monitoring system.
Future ClusterGrid (?) challenges • Continuously growing demands for reliable compute and data storage infrastructure. • Grid systems should conform to international standards and MUST interoperate with one another. • Platform-independency is not an issue yet, but will be. • LEGO-based principles are of increasing importance. • Threats: solutions that prevent development; erosion of the belief in the power of “grid”.
Conclusions • One of the first production-level grids have been shown in a nutshell. • With special emphasis on operation, management and user support issues. • Management generally covers grid resource, grid user management and monitoring. • Some remarks regarding future development were also done.
Thanks for your attention! www.clustergrid.hu www.mgkk.hu grid-tech@niif.hu