1 / 24

14-848: Cloud Infrastructure

14-848: Cloud Infrastructure. Lecture 8 * Fall 2018 * Kesden. Building The Cloud: Where Are We Now?. Providing Services, e.g. Frameworks MRv2, Spark, etc Elasticity and Provisioning YARN, Spark, etc Computing Caching Storage Storage devices Files and File Systems

Télécharger la présentation

14-848: Cloud Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 14-848:Cloud Infrastructure Lecture 8 * Fall 2018 * Kesden

  2. Building The Cloud: Where Are We Now? • Providing Services, e.g. Frameworks • MRv2, Spark, etc • Elasticity and Provisioning • YARN, Spark, etc • Computing • Caching • Storage • Storage devices • Files and File Systems • Relational Databases • Keys and Values • Other NoSQL Databases • The Network We’re working from the bottom up

  3. Where are we now? • Storage • Storage devices • Files Storage • Network-based storage • General Purpose Distributed File Systems • Special Purpose Distributed File Systems • Relational Databases • Key Value Storage • Other NoSQL Databases

  4. How Is Storage Used Within A Cloud • As a service, provided directly to users • As a service, underlying services provided to users • As part of the fabric of the cloud, itself • Storage ultimately related to user data • Storage related to managing the cloud, e.g. accounts, configurations, etc. • Storage related to understanding the cloud, e.g. data collected about activity

  5. Storage Devices • Disk Drives • Solid State Devices (SSDs) • Storage Arrays, e.g. Redundant Arrays of Independent Disks (RAID) • Network Attached Storage (NAS) • General purpose • Appliance (Much more common) • Storage Area Networks (SANs)

  6. Storage Devices: Hard Disk Drives (HDDs) • Venerable unit devices • Organized as platters with circular tracks on each side • Heads read data, one head per platter surface, move together • Circumference of tracks is different, so area is different. • Modern drives increase number of sectors/track as tracks get larger • Old drives provided physical geometry. Now, just “logical geometry” • Because tracks are large, logical geometry isn’t a terrible performance model • Significant internal caching, e.g. track-level • Random access, but not uniform performance

  7. Storage Devices: Hard Disk Drives, Performance • Latency = Rotational + Seek + Transfer • Seek: ~5ms average • Rotational: 7200 rpm drive makes 1 rotation every 1/7200 minutes = 8.3ms per rotation or 4.15ms average • Transfer: What percentage of the track are we reading? 8.3ms/rotation. • But, gets much more nuanced: • Head switch time: ~1ms to retune DSP, bump head, etc to switch heads. • Track-to-track seek time: ~0.5ms to “bump” head • Full stroke seek: ~10ms • Etc.

  8. Storage Devices: Solid State Storage Devices (SSDs) • NAND based flash memory, i.e. non-volatile RAM, • Integrated circuits that storage data by storing charge • Durability • Persistent, but because current can drain away over years, so maybe not enough for archival purposes • Limited number of writes, but perhaps too much is made of this – even mechanical systems wear out and internal “wear leveling” algorithms try to relocate data among blocks to prevent wearing any one part out first. • Performance • True random access. Write penalties are now minimal.

  9. SSDs vs HDDs in the Cloud • HDDs: • Cheaper • Longer lifetime without power • SSDs: • Faster • More energy efficient • More reliable, except maybe over the really long haul • Trade-offs favor a layered approach w/RAM at the top, then SSDs, then HDDs

  10. Will SDDs Replace HDDs in the Cloud? • Not until SDDs are cheaper than HDDs per byte (.) • There is a ton of storage out there and any amount of savings matters • Right now HDDs are cheaper per byte, and that seems likely to be true for a while • There are tradeoffs between OpEx and CapExw.r.t. energy and lifetime

  11. Disk ArraysRedundant Arrays Of Independent Disks • Historic Goal: • Assemble less reliable, less expensive disks into an array that is larger, cheaper, and better performing than more expensive enterprise-class devices. • Use redundancy to improve reliability • Use parallelism to improve performance • Present Goal: • Approximately the same, except the emphasis is on performance and reliability, not really cost • There aren’t really unreliable disks these days

  12. Disk Arrays:RAID Levels • Level 0: Stripe (Parallelize data across drives) • Decreases robustness as each drive can fail independently and is necessary • Level 1: Mirroring • Improves robustness, but at a high cost • Level 4: Parity Block • Uses a parity drive to allow system to work through 1 drive failure • Works because drives have ECCs and EDCs, so model is drive returns good data or none at all. • Level 5: Rotated Block Parity • Same as level 4, except parity blocks aren’t stored on the same drive, but rotated among the drives. • Prevents parity disk from becoming bottleneck • Level 6: Double parity • Same as Level 5, but 2 parity blocks so it can handle 2 disk failures • Level 10: Level 1 + Level 0 (Stripe across two units, each of which mirrors) • Pricey, but fast writes.

  13. Network Attached Storage (NAS) • Back in the day, servers had their own storage attached to them • These days, that is probably minimal • These days, storage is probably on devices dedicated to providing storage and connected via the network. • Simplest model is NAS (We’ll talk about SANs, next) • Servers with storage device(s) charged principally with for other servers • Originally general purpose computers configured this way • These days, more likely purpose-built appliances • Note: NAS devices are usually attached to the same LAN as the devices that use them. • May be co-located for performance

  14. Network Attached Storage (NAS) NETWORK

  15. Storage Area Networks (SANs) • Networks purpose built to host storage • Basically, like a network of NAS devices • But, most likely a specialized stack • Network hardware, protocols, virtualization etc. • Makes it easier for storage to effectively be shared by multiple servers than independent NAS devices • Because stack is specialized from the bottom up, generally higher performance than NASs at scale.

  16. Network Attached Storage (NAS) DC NETWORK SA NETWORK Specially designed for storage Usual DC fabric Storage appliances (or storage servers) Servers

  17. Files and File Systems • What is a file? • Named unit of user data • What is a file system? • System for maintaining files • Naming • Allocation • Protection • Robustness • Access Model • Etc.

  18. General Purpose Vs Special Purpose fileSYSTEMS • General Purpose – As we know and love • Open, close, read, write, create, delete, append, and protect files • Special Purpose – Compromises certain common attributes in order to improve others • Disallow random writes to improve appends • Eliminate directory structure and protections to minimize metadata and fit it all in RAM to improve latency • Use a massive block size to decrease management overhead and allow greater scale • Etc.

  19. Distributed File Systems • Like traditional file systems, except logical structure has distributed implementation • Can be special purpose or general purpose • Motivation • Scale, by size • Scale, by # of requests • Scale, by volume of data requested • Distributed Users vs Locality vs Performance • Robustness (local risk mitigation, e.g. natural, political, etc)

  20. Relational Databases, e.g. SQL • Venerable database technology, dominant since the 1980s • Basically structures data into tables called relations • Operations can be performed across rows or columns • Hallmark operation may be the join, which logically combines tables by performing a product of the two tables based upon a shared key • Many types of join • Data is often stored in B-tree type data structures • Operations are optimized through the use of hash and tree based indexes

  21. Relational Databases and clouds • Used to manage metadata within clouds • Provided as a service from clouds • Very difficult to scale to “Big Data” • Breaking up horizontally scatters records • Breaking up vertically scatters attributes

  22. Key-Value Stores • Simple storage model • Trades away most of the functionality of file systems and relational databases • Key operations are basically put and get • Benefit is scalability • Much easier to distributed to achieve significant scale • Much easier to achieve higher write throughput • Think of these as distributed hashes with high-throughput nodes

  23. Other NoSQL Databases • Structure data in ways that allows for “Big Data” scale • Massively size • Massively parallel • But that provides more structure than key-value stores • Tune structure and operations to goal • Column-oriented • Row-Oriented • Document-Oriented • Etc.

  24. Leaning Forward:In-Memory Storage • Caching is critical to performance • At global scale, if interactive, most results probably come from cache • Static Content • And, dynamic content • Search engines cache results • Web sites cache messages, alerts, etc.

More Related