170 likes | 179 Vues
Cloud Computing What, why, how?. Noam Bercovici Renata Dividino. Motivation. Count how frequent each words appears in the corpus MEDline (18 millions texts). Motivation. I want to extend my research to another corpus. Need more computing resources. Agenda. Introduction
E N D
Cloud ComputingWhat, why, how? Noam Bercovici Renata Dividino
Motivation • Count how frequent each words appears in the corpus MEDline (18 millions texts)
Motivation I want to extend my research to another corpus Need more computing resources
Agenda • Introduction • Data Grid vs. Computing Grid • Grid Computing • Cloud Computing • Data Grid (HaDoop File System) • Computing Grid (Map Reduce) • Conclusion
Data Grid vs. Computing Grid Grid Computing • Data Grid: • distributed data storage • controlled sharing and management of large amounts of distributed data. • Computing Grid: • Parallel execution • divide pieces of a program among several computers • Data Grid + Computing Grid
Grid Computing Slaves Task Master The Grid
Grid Computing • Motivation: high performance, improving resources utilization • Aims to create illusion of a simple, yet powerful computer out of a large number of heterogeneous systems • Tasks are submitted and distributed on nodes in the grid
Cloud Computing • “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. “ • Larry Ellisonduring Oracle’s Analyst Day
Cloud Computing • Pay-as-you-go • No initial investments • Reduced operation costs • Scalability • Availability
Cloud Computing - Open Issues • Bandwidth and latency • Lack of standard and portability • „Black-box“ implementations • Security and lack of control • Immature tools and framework support • Legal issues (ownership, auditing, etc) • Limited Service Level of Agreements (SLAs)
Data Grid vs. Computing Grid Grid Computing • Data Grid: • distributed data storage • controlled sharing and management of large amounts of distributed data. • Computing Grid: • Parallel execution • divide pieces of a program among several computers • Data Grid + Computing Grid
Data Grid (Hadoop FS - Overview) • Caching of Data Index: Namenode (master node) Metadata (Name, .., ..) Ask specific text … Client Block ops Datanodes (Slave node) Replication
Counting Words in Text Files Split-Operation Map-Operation Reduce-Operation w1: countWords(File) w1: 6 w2: w2: 14 countWords(File) w3: 15 w3: … … w4: 17 countWords(File) w4: … countWords(File) w5: w5: 1
Advantages of Hadoop • Purely written in Java, requires installation of Cygwin under Windows • Available under LGPL and Apache 2.0 license • Usually offers only one implementation for the different features of a grid framework • May also use other file systems than Hadoop FS • Very flexible implementation of MapReduce • For split operation only supports FileSplit out of the box • Better suited for computations where … • … large data collections should be handled • … if reduce-operation is more than a simple aggregation of the map‘s output
Danke! • Questions?