1 / 28

Optimizing of data access using replication technique

Optimizing of data access using replication technique. Renata Słota 1 , Darin Nikolow 1 ,Łukasz Skitał 2 , Jacek Kitowski 1,2 1 Institute of Computer Science AGH-UST, Cracow 2 ACC CYFRONET AGH, Cracow. Agenda. Motivation of the work Why does today grid computing need replication?

eli
Télécharger la présentation

Optimizing of data access using replication technique

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing of data access using replication technique Renata Słota1, Darin Nikolow1,Łukasz Skitał2,Jacek Kitowski1,2 1 Institute of Computer Science AGH-UST, Cracow 2 ACC CYFRONET AGH, Cracow

  2. Agenda • Motivation of the work • Why doestoday grid computing need replication? • Replication basics • Clusterix Data Management System • Architecture, optimization and replication algorithms • Optimization Example • Replication Example • Summary, conclusions

  3. Site-level vs. Grid-levelreplication • Site-level replication • Replicas in one site • Implementation examples: • RAID • HSM • Grid-level replication • Data management systems • Replicas spread on many sites

  4. MotivationoftheworkWhy doestoday grid computing need replication? • Data protection and availability • Malfunction of one storage does not affect data itself, only performance is affected • Performance • Low level optimization and replication are not sufficient (RAID, HSM) • Limited network bandwidth • Limited storage performance

  5. Replication scenarios • Static replication • Decision made by system administrator or user • Limited system support: replica selection, replica coherency, replica ordering • Dynamic replication • Decision made by dedicated grid component based on current data access pattern of users • Full system support

  6. Replication consequences • Optimal replica selection algorithm • Replica creation and removal algorithm • Cost of replica creation, update and storage • Replica coherency

  7. ClusterixNational Cluster of Linux Systems • Project aim: • To develop set of tools and procedures allowing to build productive Grid environment based on local PC clusters spread in independent supercomputing centers • Network Layer: • Pionier – Polish optical networks

  8. Clusterix Data Management SystemArchitecture

  9. Optimization Algorithm • Selects optimal storage element for: • data accessing • replica creation • Takes under consideration current state of the System • Optimal storage element is one with the maximal weight W(s,d) W(s,d)=min((1-NetLoad(s))bandwidth(s,d), (1-Sload(s))Sbandwidth(s)) s – storage element d – destination node NetLoad(s) – snetwork interface load Bandwidth(s,d) – available bandwidth betweens and d Sload(s) – storage system load Sbandwidth(s) – storage system bandwidth

  10. Automatic replication algorithm • Takes under consideration gain from replication G(), cost of replica creation C(), cost of replicas update U() and administrative factor A(). • Replication profit: P(d,R,S,f)=G(d,R,S,f)+C(d,R,f)+U(d,R,S,f)+A(d,f) d – storage element, which profit is computed for R – set of storage elements containing replicas of f S – statistic data – history of file usage f – considered file

  11. Storage oriented problems Data intensive applications for Clusterix • Simulation of transonic flow past a wings tips • Visualization of complex multidimensional structures • Ecosystem modeling and simulation

  12. Optimization Example F • Node A needs file F stored on SE1, SE2 and SE3 NMS Optimizer F NMS CDMS SE1 JIMS NMS JIMS Node A F SE2 SE3 NMS NMS JIMS F

  13. Optimization Example • Node A sends request to CDMS NMS Optimizer F NMS CDMS SE1 JIMS NMS JIMS Node A F SE2 SE3 NMS NMS JIMS F

  14. Optimization Example • CDMS uses Optimizer to choice optimal SE NMS Optimizer F NMS CDMS SE1 JIMS NMS JIMS Node A F SE2 SE3 NMS NMS JIMS F

  15. Optimization Example W(s3,d)=min((1-NetLoad(s3))bandwidth(s3,d), (1-Sload(s3))Sbandwidth(s3)) W(s2,d)=min((1-NetLoad(s2))bandwidth(s2,d), (1-Sload(s2))Sbandwidth(s2)) W(s1,d)=min((1-NetLoad(s1))bandwidth(s1,d), (1-Sload(s1))Sbandwidth(s1)) • Optimizer is working… NMS Optimizer F NMS CDMS SE1 JIMS NMS JIMS Node A F SE2 SE3 NMS NMS JIMS F

  16. Automatic replication exampleSituation • 3 clusters • 4 storage elements • 2 contain replica of • Set of applications running on these clusters and accessing file F F SE1 SE4 SE2 SE3 F F

  17. Automatic replication example Gain Optimizer F F Cost of rep. Replication Module Sleeping… Working… Cost of update Adm. factor SE1 SE2 SE3 CDMS Statistic Module SE4

  18. F Decision: SE2 SE4 Automatic replication example Optimizer F F Replication Module Working… Sleeping… SE1 SE2 SE3 CDMS Statistic Module F F F SE4 F F F F

  19. Automatic replication example Optimizer F F Replication Module Sleeping… SE1 SE2 SE3 CDMS Statistic Module F SE4

  20. Summary • Architecture of CDMS with Optimization and Replication modules has been designed • Replication and optimization algorithms has been specified • Modules interfaces has been specified Future work • Integration and tests

  21. Conclusions • Simulation of replication vs. real system implementation • Replication should be designed to meet specific Clusterix applications profile • Data availability • Replication drawbacks

  22. Publications • Extended functionality of Virtual Storage System for grid Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek Kitowski Cracow Grid Workshop 2004, poster no. 13 • Application of data replication methods in Clusterix project (in polish) Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek Kitowski Pionier 2004, 19-20 May, Poznań, electronic publication • Implementation of replication methods in the Grid Environment Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek Kitowski Submitted to European Grid Conference

  23. ThankYou!

  24. Clusterix Data Management SystemArchitecture • Replication module • Responsible for: • Automatic replica creation/removal • Implementation • Java • Apache SOAP • Cooperate with: • Optimization module • Statistic module

  25. Clusterix Data Management SystemArchitecture • Optimization Module • Responsible for: • storage element selection for newly created replica, • optimal replica selection. • Implementation • C/C++ • gSOAP • Cooperates with: • Network Monitoring System (NMS) • Information System • JMX-based Infrastructure Monitoring System (JIMS)

  26. Clusterix Data Management SystemArchitecture • Information System (JIMS) • Department of Computer Science, AGH University of Science & Technology • Provides the following information for selected node: • Available storage capacity • Total storage capacity • Network interface load • Network interface bandwidth • Storage system load • Average storage system load • Maximal measured storage bandwidth

  27. Clusterix Data Management SystemArchitecture • Network Monitoring System • Poznan Supercomputing and Networking Center • Provides the following information: • Maximum bandwidth between two network nodes • Current load between two network nodes • Nodes availability

  28. Clusterix Data Management SystemArchitecture Statistic Module Białystok Technical University Responsible for gathering information about past data usage

More Related