1 / 14

Data Deduplication in Virtualized Environments

Data Deduplication in Virtualized Environments. Marc Crespi, ExaGrid Systems. http://blog.exagrid.com Twitter: @ExaGrid. About the speaker. Marc has over 20 years of software and hardware experience in the high technology sector He is part of the ExaGrid team that drives

marci
Télécharger la présentation

Data Deduplication in Virtualized Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Deduplicationin Virtualized Environments Marc Crespi, ExaGrid Systems http://blog.exagrid.com Twitter: @ExaGrid

  2. About the speaker • Marc has over 20 years of software and hardware experience in the high technology sector • He is part of the ExaGrid team that drives product strategy and execution and is responsible for managing product operations. • Prior to joining the company, Marc was director of product management for security management products at Altiris.

  3. Objective of This Program • What is Deduplication? • Why Use Deduplication in Backup and Recovery? • Challenges of Deduplication in Virtualized Environments • Deduplication approaches (two camps) • Summary ‒Deduplication’sRole in Data Protection and Disaster Recovery

  4. Why Use Deduplication in Backup and Recovery? • Enhanced Speed/Performance • Faster backup times due to lower volume of data to be backed up • Data lands faster because it is targeted at disk • Dramatic Savings in Disk Costs • 20:1 Reduction in amount of disk space required to store backups • Scalability • Backup higher data volumes while maintaining backup window • Offsite Disaster Recovery • Efficient use of bandwidth via WAN-efficient replication

  5. Reduced storage footprint with deduplication Reduce total amount of storage by as much as 1000:1 Store only the bytes that change in your VMware virtual servers Eliminate redundancy of typical VMware backups Restore quickly from most recent VMware backup • Eliminate Redundancies • for More Efficient Virtual Server Backups VM VM VM VM VM VM VM VM VM VM VM VM VM • Each virtual server image gets backed up • in its entirety • Large amount of storage consumed • Deduplicate backups to changed bytes • Dramatic savings in disk and bandwidth • Integrated Replication

  6. Specific Challenges of Backups/Restores in Virtualized Environments • Management of backups • Growing number of virtual machines/ sprawl • Inability to monitor backups on individual virtual machines • Handling the volume of backup data efficiently • More data to store as virtual machines proliferate • Each change means entire virtual server is backed up • Example:10guest OS instances x50GB = 500GB of backed-upvirtualimagesdaily These challenges are driving a need for better tools to more reliably and easily back up and restore virtual machines

  7. How Dedupe Works: Store Only Changed Bytes Data Deduplication Standard Disk Most Recent Backup Stored Optimized for Read VM 50GB 2.5GB Most Recent Backup VM 50GB VM 100MB VM 50GB VM 100MB VM 50GB 100MB VM VM 50GB 100MB VM VM 500GB 3.4GB 100MB 50GB VM VM 100MB 50GB VM VM 100MB 50GB VM VM 100MB 50GB VM VM 100MB 50GB VM Oldest Backup Oldest Backup VM Total 500GB Total 3.4GB

  8. Where to Deploy Deduplication Source Based Data Reduction Removes data redundancies before transmission to the backup target Target Based Data Reduction Removes data redundancies after transmission to the backup target • PROS • Reduces impact on VM • Shortens BU window/less data • Reduced bandwidth needed • to the backup target • Reduction in storage usage • CONS • Can be slower for large • (multipleTB) amounts of data • Increased workload on servers • PROS • Shortens BU window/less data • Reduced replication bandwidth • Reduction in storage usage • CONS • Must transfer the entire dataset • to the device • Don’t get reduced bandwidth • needed to the backup target

  9. Using Both Deduplication Techniques • Provides Complementary Benefits Source Based PLUSTarget Based Data Deduplication Removes data redundancies before and after transmission to the backup target • Achieves an additional 80% data reduction (98% total) • Further reduction in bandwidth • Further reduction in storage usage • Further reduction in backup window • Integrated replication of virtual servers

  10. Architectural Considerations Legacy Architecture - Single Controller Scalable GRID Architecture One Deduplication Engine Multiple Deduplication Engines Deduplication Engine Deduplication Engine Deduplication Engine Deduplication Engine Deduplication Engine Deduplication Engine Deduplication Engine X TB/hr 10 TB X TB/hr 2X TB/hr 20 TB Disks X TB/hr 20 TB Disks X TB/hr 30 TB 3X TB/hr 30 TB Disks X TB/hr 40 TB Disks X TB/hr 4X TB/hr 50 TB 40 TB Disks X TB/hr 60 TB Backup Window 5X TB/hr 50 TB 6X TB/hr 60 TB Backup Window

  11. Architectural Considerations Legacy Architecture – Single Controller Scalable GRID Architecture Multiple Deduplication Engines One Deduplication Engine Deduplication Engine Deduplication Engine Legacy Architecture – Appliance Sprawl Scalable GRID Features • Linear performance as data grows, • stable backup window • Capacity is virtualized across nodes • Deduplication is shared across nodes • Simplified management through single UI • System can be right-sized • to current data size • Avoids forklift upgrades Individual appliances

  12. GRID Architecture for Deduplication Performance Node 1 – System Capacity – RAID6 Backup Servers Benefits • One-time division of data during installation (15 to 30 minutes) • GRID software manages placement of data • Revisit only during expansion (additional 15 to 30 minutes) • Eliminates the challenges of monolithic, primary storage like architectures Repository Landing Zone Backup Job VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM Wire Speed Load Balancing DeduplicationProcess Node 2 – System Capacity – RAID6 Wire Speed Landing Zone Backup Job

  13. What We Covered • What is Deduplication? • Why Use Deduplication in Backup and Recovery? • Challenges of Deduplication in Virtualized Environments • Overview Diagram of Major Components • Deduplication approaches (two camps) • Summary ‒Deduplication’s Role in Data Protection and Disaster Recovery

  14. Enjoy and share this material • Feel free to promote this material • Recommend your peers to pass certification • Blog, Tweet and share this material and your experience on Facebook • You’re an Expert? We will be happy to have you as Backup Academy • contributor. Apply here. Web: http://www.backupacademy.com E-mail: feedback@backupacademy.com Twitter: BckpAcademy Facebook: backup.academy

More Related