1 / 23

2010 EMEA CUSTOMER SUPPORT CONFERENCE

2010 EMEA CUSTOMER SUPPORT CONFERENCE. Supporting Our Customers’ Journey to the Private Cloud. De-duplication Technologies. Backup & Recovery Architectures Are In Transition from Tape to Disk. Tape. Tape. Backup Software. VTL. VTL/Tape. Backup Software. Backup Software.

ronna
Télécharger la présentation

2010 EMEA CUSTOMER SUPPORT CONFERENCE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2010 EMEA CUSTOMER SUPPORT CONFERENCE Supporting Our Customers’ Journey to the Private Cloud De-duplication Technologies

  2. Backup & Recovery Architectures Are In Transition from Tape to Disk Tape Tape Backup Software VTL VTL/Tape Backup Software Backup Software Deduplication storage VM Application Backup Clients Onsite Backup Storage Disaster Recovery Storage Backup/Media Manager Backup/Recovery Architecture Conventional(Tape-centric) DB Transformational (Disk-centric) Home Deduplication backup software and system Data Protection Management Software on premise off premise

  3. Backup & Recovery Architectures Are In Transition from Tape to Disk Tape Tape NetWorker Disk Library VTL/Tape NetWorker NetWorker Data Domain VM Application Backup Clients Onsite Backup Storage Disaster Recovery Storage Backup/Media Manager Backup/Recovery Architecture Conventional(Tape-centric) DB Transformational (Disk-centric) NetWorker Avamar Home Avamar Data Protection Advisor on premise off premise

  4. Product Timeline

  5. De-duplication Theory • Main purpose is to save space and time • De-duplication AKA global compression • De-duplication vs compression • Block vs file level deduplication • Fixed vs variable block size de-duplication • Hashing • Common choice is SHA-1 • What does not de-duplicate well: • Multiplexed data streams • Does not apply to source-side de-duplication • Client-side compressed files • Encrypted files

  6. Source vs Target Side De-duplication Source • Client software agents identify repeated sub-file data segments at the source • Backup application sends only new, unique segments across the network to storage device (compressed) • Benefits • Improved backup windows • Reduced virtual infrastructure stress • Reduced backup client-server bandwidth • Cons • Rip & replace solution as it cannot be integrated with existing backup targets Target • Backup application sends native data to a target storage device • Data is de-duplicated as it reaches the target • Benefits • Plug and Play with existing Backup s/wDe-duplication is transparent • High throughput for large datasets & copy to tape • Protocol support: VTL, NAS, BOOST • Cons • High network load

  7. Primary De-duplication Technologies EMC Product Portfolio: • Data Domain • Target: variable-block level chunking, hashing, compression • Avamar • Client: variable-block level chunking, compressing , hashing • Celerra • Filer: file-level uniqueness, compressing • Disk Library with De-duplication [DL3D] • Based on Quantum product line • Quantum acquired it through Adic and RocksoftBlocklets technology • Now EOL, existing install base replaced by Data Domain

  8. De-duplication Factor De-duplication ratio depends on: • Efficiency of commonality factoring • Near-perfect for modern variable-block algorithms • Number of backup cycles inside retention period • Frequency of full vs incremental backups • Some products behave as always-full, so de-duplication ratio appears higher than with products that use incremental backups although end result is similar. • Daily data change rate • Commonality factoring is non-existent on new data. • Data pattern – how well does it compress

  9. De-duplication Performance • Inline de-duplication vs Policy-based de-duplication • For inline appliances, aggregate performance is determined by specification of appliance • For policy-bases peak intake is higher, but requires additional post-processing which eventually saturates the engine at lower rates • Source vs Target de-duplication • Source based de-duplication performance is limited by host performance.Increased CPU usage on the host, not noticed for low-change rates • Target based de-duplication is limited by host->target bandwidth.Note that I/O movement itself can consume a lot of CPU resourcesThis is valid for all backup types, not specific to de-duplication technology. • Mixed-mode (such as DD Boost) performs commonality factoring on host, but compression on target.Result is compromise bandwidth utilization and host impact. • Conclusion • De-duplication improves performance in almost all but extreme high I/O cases or cases with very high change rate

  10. Data Domain • Hardware appliance only • Using RedHat Linux kernel • VTL using FC SAN or CIFS/NFS target • Note that space is reclaimed only when volume is relabeled, not when its expired • Target side by default • Source & Middleware with DD Boost • Over IP networks only • Replication over IP between appliances • New in latest versions • Global de-duplication across several DD’s

  11. Data Domain • SISL (stream informed segment layout) technique to minimize disk accesses and be capable of doing in line deduplication • Data Invulnerability Architecture protects against data integrity issues during backup and data lifecycle • Shelves are attached through SAS in RAID-6 • DataDomain terminology : • Global compression = deduplication • Local compression = 4 types : LZ, gzfast, gz, none

  12. Avamar • Software (AVE) or hardware (ADS) appliance • Source de-duplication side only • Replication over IP • Hardware uses RAIN Architecture • Redundant Array of Independent Nodes • Minimal 2 + 1 active nodes (plus utility and spare node) • GSAN versus OS capacity view • GSAN is 60 % of OS capacity • Blackout window • Requires daily maintenance windows, not suitable for 24/7 backup operations • Can be skipped, but expect problems if skipped multiple days in a row • Checkpoint rollbacks • In case of server-side problems must perform checkpoint rollback • Causes DL from the point of checkpoint

  13. Avamar cont… • Uses client side cache to minimize server lookups during backup • Fully loaded in memory at backup start • F_cache: hash of metadata and is used to determine backup candidates • P_cache: contains hashes of data chunks sent to the Avamar server • NDMP backups • Require NDMP Accelerator node • First backup full, then incremental forever • Server Enhancements in Avamar 5.0 • Rolling HFSCHECK • Manage blackout and maintenance windows from Avamar UI and command line • Uses ConnectEMC to communicate securely with EMC Support • NAT support for multi-node data stores • Stripe pre-allocation on 3.3 TB nodes

  14. Celerra Filesystem De-duplication • Uses Avamar and RecoverPoint technology • Files selected based on age, size, extensions • Compression takes place (no chunking) • Compared for uniqueness (SHA-1 hash) and stored or linked • Backup considerations: • NDMP-VBB (NVB) supports only full destructive restore • Latest versions backup in compressed format • MPFS falls back to CIFS/NFS if file is de-duplicated

  15. Current Points of Integration • NetWorker + Data Domain • Current via VTL or CIFS/NFS • As of NW 7.6 SP1 preferred via BOOST • NetWorker + Avamar • NetWorker + DL3D • VTL, now EOL • Avamar + Data Domain • Future • Data Domain + 3rd party backup software vendors • VTL • CIFS/NFS • BOOST

  16. EMC NetWorker & Data Domain Integration • Using VTL for FC/SAN backups • Using AFTD with NFS or CIFS device as target • Deduplication done on the DD appliance • DD Device with DD BOOST in NetWorker • NW 7.6 SP1 ships with DD Boost 2.2.2 library • DD Boost performs DSP functions (Distributed Signal Processing) • Deduplication done on the NW Server or SN • Fragments sent then to the DD appliance • Index stored in NetWorker DB • Transparent to the application

  17. EMC NetWorker & Data Domain Integration EMC NetWorker version 7.6 SP1 NMC NW Server ManagesNW Seamless, optimized integration Optimized Storage Node-to-DD System interface Unique to NetWorker and Data Domain New Data Domain device type from NMC Improved User Experience and User Interface for management of the Data Domain System Wizards to help simplify the configuration of the optimized DD System Disk-centric views of media & targets Unique monitoring and data collection reports for Data Domain Optimizeddata path Manages the DDR Storage Node Clients & Apps Data Domain Device Type Clients & Apps

  18. EMC NetWorker & Avamar Integration • Avamar installed as a new type of NetWorker storage node: • Deduplication node • Seamless integration of de-duplication into save and recover. • NetWorker Client package is required on Avamar server. • Data is on Avamar de-dupe node NMC • Hash-IDs are on NW storage node • Use AFTD and do not stage it out • Clone that AFTD, if this metadata is lost, it cannot be recreated NW Server ManagesNW Avamar Reports upon Avamar data store Storage Node De-duplicated data Saveset metadata File systems & Applications

  19. EMC NetWorker & Avamar Integration • For file-system backups • It uses standard NetWorker Full/Incremental/Level backups • NetWorker walks the filesystem,Avamar component only performs de-duplication • Supports NetWorker directives, parallelism, browsing, policies • Maximum parallelism per client is limited to 4 • Non-ASCII dataset are supported for de-dupe backups. • NDMP backups are not supported • Application module support: • NetWorker Module for Databases and Applications (NMDA) • NetWorker Module for Microsoft (NMM)

  20. EMC NetWorker & Avamar Integration Data expiration is under exclusive NW control Requires that NW and Avamar node are in-sync Can cause problems in cases of NW disaster recovery or Avamar rollbacks Avamar Replication is supported for de-dupe clients Limited to one replica Save set is registered as a de-dupe save set via an extended attribute Creates separate p_cache and f_cache per each client/saveset ADT: Avamar Data Transport for transport to tape

  21. Future “De-duplication is a feature, not a product”

  22. Q & A

More Related