Download
introduction to distributed storage systems n.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Distributed Storage Systems PowerPoint Presentation
Download Presentation
Introduction to Distributed Storage Systems

Introduction to Distributed Storage Systems

32 Vues Download Presentation
Télécharger la présentation

Introduction to Distributed Storage Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction to Distributed Storage Systems Harry Xu CS 239, Fall 2019

  2. Problems and Challenges • Extremely large amounts of data are available these days • FB Social: 721M vertices, 68.7B edges in May 2011 • Google Maps: 20 petabytes of data • Where to put them • Single machine? Servers? • How can we enable applications to easily access them? • What interfaces do they provide? • What guarantees do they provide? • How to enable applications to efficiently access these data? • What should be the right architecture (e.g., master+slave, peer-to-peer, etc)? • What if a machine crashes?

  3. Solution: Distributed Storage Systems • Where to put them? • On a cluster of commodity servers • How to enable applications to easily access them? • Depending on data types (e.g., files, structured data, or unstructured data) • Standard interfaces • What guarantees do they provide? • Consistency guarantees • What if a machine crashes • Fault tolerance: replication + quick recovery • Consistency between replicas

  4. Three Different Kinds of Systems • Distributed File Systems • HDFS -- Yahoo • GFS -- Google • Distributed Structured Data Storage Systems (a.k.a., databases) • Bigtable(wide column DB) • Spanner (NewSQL DB) • A mix of both • Azure

  5. Distributed File Systems • HDFS • One “metadata” server (NameNode) and a set of DataNodes • A file is divided in blocks and each block has several replicas on different DataNodes • File operations are recorded on journals, which are replayed to maintain consistency upon failure • Supports a wide variety of applications including Hadoop and everything on top of Hadoop • GFS • Using a similar architecture • Replicating both file chunks and namespaces • Using checksum to detect data corruption

  6. Data Storage Systems (NoSQL Databases) • Bigtable • Built on top of GFS, available as part of Google Cloud Platform • It is a map (or a wide column store) that maps a row key and column key to a byte array • Designed to scale to petabyte-size data • Each table has multiple dimensions and is divided into a bunch of small tablets for better integration with GFS • No notion of transaction • Spanner • A “NewSQL” database supporting externally consistent transactions • Windows Azure Storage (WAS) • Supports strong consistency and various types of data