1 / 38

Data Storage and RAID Today

Data Storage and RAID Today. Brandon Krakowsky Jeffrey Doto. Presentation Topics:. Who Relies on Data Storage? Why is data storage so important? Sarbanes-Oxley and HIPAA. Hard Disk Failure. What is a RAID? Different types of RAID and their uses. Enterprise vs. Consumer Storage.

tavon
Télécharger la présentation

Data Storage and RAID Today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Storage and RAID Today Brandon Krakowsky Jeffrey Doto

  2. Presentation Topics: • Who Relies on Data Storage? • Why is data storage so important? • Sarbanes-Oxley and HIPAA. • Hard Disk Failure. • What is a RAID? • Different types of RAID and their uses. • Enterprise vs. Consumer Storage. • Demonstration.

  3. Information Overload • What the heck is an exabyte? • 1 billion gigabytes • The world generated 161 exabytes of digital information last year • IDC estimates that this will grow to 988 exabytes in 2010 • almost 1 zettabyte! • 185 exabytes of storage available last year • IDC estimates that this will grow to 601 exabytes in 2010 • We need more storage!

  4. Proliferation of the Internet • How many web pages are there? • If you “Google” anything, you’ll get at least a billion choices • Web pages used to be just text and graphics • Now, audio & video clips are prevalent • Hosting companies need to deal with data storage on a whole new level

  5. On-Demand Audio & Video • What about companies who specialize in On-Demand audio/video delivery? • YouTube • Google Video • They make it so easy to upload content • How do these companies deal with managing all of this data?

  6. Digital Audio • Remember Napster? • Who buys CDs anymore? • What about companies who provide downloadable audio content? • iTunes • Rhapsody • MP3.com • Most of these companies provide video as well! • Also, Podcasting & Vodcasting are becoming more popular

  7. Photographic Content • Everybody is a photographer these days! • Camera Phone • Digital Camera • Hosting companies allow users to upload photos easily • Flickr • Photobucket • Where are all these photos stored?

  8. Database Driven Applications • Database driven websites rely heavily on data integrity • Companies like Amazon, eBay, and Citizens Bank all have huge backbones • They rely on storage! • National Security Agency has a database of phone call records of “tens of millions” Americans • Blogs & Wikis • Is this data backed up? • Can you imagine if you lost your MySpace account?

  9. E-Mail • Most popular mode of communication • When you send a message, where does it go? • If you’re like most, e-mail is a lifeline • For large companies, email backup is a must!

  10. Sarbanes-Oxley and HIPAA • George W. Bush signed into office in 2002 in the wake of the Enron scandal. • Changed the way publicly-held businesses were responsible for data retention. • Enormously profitable for storage industry.

  11. Financial Impact of SOX • Estimated annual compliance spending up to $17-28.8 billion • Great for storage industry • Data retention: • Net Effect: Double the length retention and number of copies = a lot more storage! • Source: The Economist March 4th, 2004, Information Week, March 2, 2006

  12. SOX: A Boon and A Burden. • While it has been a great source of financial gain for storage and IT vendors, it has been a huge headache for CIOs and IT staff. • Estimated man hours: countless. • New York Times: dedicated 200 employees in 2003, 105 full time on compliance project. • Washington Post: Spent $5 Million on outside consultants, created 10 full time positions. • CISA: Certified Information Systems Auditor • Source: Information week, March 2, 2006; Business Matters, March 2005.

  13. HIPAA: Human Insurance Portability and Accountability Act • Signed into office in 1996. • Desired effect was to promote EDI, or Electronic Data Interchange among various healthcare bodies. • Protect Patient Privacy • Protect Security of Patient Information

  14. Data Management for the User • As a user, why do I care? • Where do you store all of that music you illegally downloaded? • Again, sites like YouTube and Flickr allow you to upload your own media content • Where do you store all of your home-grown movies? • How do you backup your photo library? • Hard drives fill up fast!

  15. Microprocessor Technological Advances • As microprocessor technology improves, so does memory size • How does this benefit overall computer performance? • It doesn’t unless secondary storage progresses at the same rate • Increased microprocessor speed opens the door to newer processor-intensive applications • Users need more space

  16. Magnetic Disk Technology • MTTF: Mean Time to Failure. • Not a question of “will it fail, but when will it fail”. • Current drives run at speeds from 5,400 to 15,500 RPMs. • Electromechanical parts: spindle motor, actuator arm both prone to failure; magnetism can wear out. • Discuss enterprise vs. consumer storage later.

  17. RAID • “Redundant Array of Independent Disks” • First proposed in the paper, “A Case for Redundant Arrays of Inexpensive Disks”, published in 1988 • Method of combining several disk drives into one “Logic Unit Number” (LUN) • Appears as a single storage unit to the host system

  18. 2 Most Important Features • Reliability • RAID makes use of “redundancy” • Data is redundantly distributed over all or some of the disks providing fault tolerance and data protection • Performance • Disk performance is enhanced because multiple disks are working in parallel

  19. RAID Level 0 • No Redundancy • Uses a technique called “striping” • Data is broken down into blocks • Each block is written to a separate disk • Provides excellent write performance • Data is spread out • No data protection • If one disk fails, they all fail

  20. RAID Level 1 • Uses a technique called “mirroring” • All data is written to at least two separate disks • If one disk fails, there’s a copy • Provides 100% data protection • Write performance is compensated • All data is written twice • Read performance is better than RAID 0 • Data can be read from multiple disks at once

  21. RAID Level 2 • Uses a technique similar to “striping” • Words are split at the bit level • Each bit is written to a separate disk • Hamming codes are generated for each word • Spread across separate Error Correcting Code disks • Data is cross-referenced with codes to insure data integrity • Write performance is compensated since Hamming codes need to be calculated each time • No commercial implementation • Too expensive

  22. RAID Level 3 • Uses a technique called “bit-parity interleaving” • Words are split at the bit level • Each bit is written to a separate disk • Parity bits are generated for each word • Stored on a separate parity disk • Read and write performance is compensated since all the disks are used for every operation

  23. RAID 4: Block Interleaved Parity • Writes data in blocks instead of bits. • Advantage: high read performance. • Disadvantages: Dedicated Parity Drive causes severe write bottleneck, requires complex hardware controller. • Requires 3 disks to implement.

  24. RAID 5: Block Interleaved Distributed Parity • Solves RAID 4 bottleneck. • Parity distributed over all drives. Allows multiple read / writes which increases efficiency. • Advantage: most versatile overall; file, web, database, internet servers all can use. • Disadvantage: requires a complex controller. • Requires at least 3 disks to implement.

  25. RAID Level 6: Block interleaved Striping with Dual Error Protection • Advantage: Implements both Parity (P) Reed-Solomon Codes (Q) to protect against multiple drive loss. • Can think of as an extension to RAID 5. • Disadvantage: requires more complex controller with high overhead; requires N+2 disks.

  26. Hybrid RAID: X+Y vs. Y+X • RAID 0+1 : • Mirror Striped Set: minimum of 4 drives = $ • Good for imaging / general file server / an area where highest reliability not a concern. • RAID 1+0: • Striped Mirror Set: minimum of 4 drives = $ • Good for databases. • RAID 5+1: • Mirrored RAID 5 for the truly paranoid.

  27. RAID Z • Uses 128-Bit ZFS file system from Sun’s Solaris OS 10 • Available on OSX Leopard • Advantage: OS calculates parity, no need for external controller, can correct mistakes impossible to correct in RAID 5. • Disadvantage: Could take a performance hit if storage close to full.

  28. Enterprise vs. Consumer Storage • Enterprise quality storage requires much more engineering • Environment plays a big role: • Chassis vibration, humidity, volatile solvents, heat, constant use…

  29. Enterprise vs. Consumer Storage Seagate Barracuda 7200 RPM 250 GB SATA II Drive $75.00 Seagate Cheetah 15,500 RPM 147 GB SCSI Drive $1,100 SATA Connector 80 Pin SCSI Cable

  30. Demonstration • Old Sun Software based RAID unit. • Employs Fibre-Channel Connection. • Houses 22 SCSI disks. Hard Drive Demonstration. See arm move over disk while writing large file.

More Related