330 likes | 354 Vues
Backing Up Network Appliance Filers using VERITAS NetBackup and NDMP. Kelly Wyatt SAS Institute Inc. Kelly.Wyatt@SAS.com. SAS Institute Inc. SAS Institute Inc. Who We Are. World’s largest privately held independent software vendor
E N D
Backing Up Network Appliance Filers using VERITAS NetBackup and NDMP Kelly Wyatt SAS Institute Inc. Kelly.Wyatt@SAS.com
SAS Institute Inc. Who We Are • World’s largest privately held independent software vendor • 98 of the Fortune 100 and 90 percent of the Fortune 500 top companies are customers of SAS Institute Inc. • 3.5 million users world wide • Customers include retail, chemical, banking, pharmaceuticals, government and education • Products centered around Information Delivery: Data Warehouse / Decision Support Solutions
SAS Institute Inc. Corporate Infrastructure • NT on desktop, UNIX and NT in data center • 300 UNIX Servers and 125 NT servers • 17 Network Appliance file servers • 50GB NFS, 2TB AFS, 2.5TB NT, 1.5TB Network Appliance • Focus of this presentation is backing up the Network Appliance filers
Network Appliance Filer What is it? • High performance, high availability network “appliance” optimized for file serving • Multiprotocol -- NFS and NTFS file systems, shared or separate • Supports HTTP and FTP • Up to 1.4 TB per filer, multiple volumes • ONTAP “operating system” with under 50 commands • More information at www.netapp.com
NDMP ProtocolWhat is it? • Developed by Network Appliance and Intelliguard Software (formerly PDC Software) • Open standard protocol for enterprise wide backup of heterogeneous network-attached storage • NDMP Data Server reads data and creates an NDMP data stream • NDMP Tape Server reads or writes NDMP data stream to/from tape • Minimal code on client • More information at www.ndmp.org
Selection CriteriaScalability • Backup 400GB incremental and 1.5TB full within backup windows • 12 hour incremental window • 56 hour full window • allow time for 3 tries • Minimum 15GB per hour throughput • Minimum 35GB per tape • Meet 4 hour restore requirement
Selection CriteriaFunctionality • Maintain NT and UNIX file attributes • Unattended operation • Hot swap tape drives • Automatic retry of failed backups • End user restore ability • Use of same solution for NT, UNIX and AFS backups a strong plus
Filer BackupsAlternate Solution • Network Appliance solution: use filer dump and restore utilities to locally attached tape drive • Eliminates bar-coded tape use • “Grow your own” solution to manage backup media and images • Use another backup software vendor’s solution • Initial evaluation indicated it would not scale in our environment
SAS Institute Inc.Filer Backup Solution • Use VERITAS NetBackup with NDMP • Sun master server (NDMP not available on HP at that time) • StorageTek 9710, 10 DLT 7000 drives, 240 tape slots for main data center • Two tape drives per filer, directly attached • Limit volume sizes where possible
Network Appliance Backups Main Data Center NetApp F760 SCSI Attached StorageTek 9710 DLT 7000 F1 F2 F3 Network NDMP Protocol F4 F5 SCSI Controller RF1 Master
Network Appliance BackupsAuxiliary Data Center SCSI Attached NetApp F760 DLT 7000 StorageTek 9730 RF1 Network NDMP Protocol SCSI Controller Master
Criteria Met • Automatic retry of failed backups • Throughput average of 19GB per hour • Average 59GB per tape • Maintain NT and UNIX file attributes
Criteria Not Met • Hot swap failed drives • End user restore
20/20 Hindsight • One class per volume • Stagger backups and create “BPFSMAP_TMPDIR” • Increase CLIENT_TIMEOUT • Always run logs • Plan for database space • Attach tape drive to master for database backups
One class per volume Classes define the groupings of Network Appliance file system volumes for the purpose of backups • Originally: • All volumes divided into two classes • Classes “pre-populated” with volume names • Problem: • If any volume backup fails, entire stream is failed and retried • Pre-populated volumes failed entire stream with one release of ONTAP • Confusion over which image for restores • Resolution: • One class per volume • Additional benefit of load balancing tape drives
Stagger backups / BPFSMAP_TMPDIR NDMP implementation appears to stage file/directory information in inomap files on master server; this information then moves into the database. WE SUSPECT! ANYONE REALLY KNOW? • Originally: • All backups scheduled for 18:30 • Problem: • Huge “inomap” files fill /tmp • Backups fail trying to create inomap files • Resolution: • Stagger backups • Create /usr/openv/netbackup/BPFSMAP_TMPDIR.
Increase CLIENT_TIMEOUT Related to the inomap file creation, the time between connecting to the filer and writing to the tape requires a much longer timeout value. • Originally: • Use NetBackup default timeouts (300 seconds) for both CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT • Problem: • Backups fail due to connection timeouts • Generating list of files to backup takes a long time • Resolution: • Set CLIENT_CONNECT_TIMEOUT to 1800 • Set CLIENT_READ_TIMEOUT to 3600 • No connection failures
Always use logging Tons & tons of useful data is written to the logs - ALWAYS USE THEM, ALWAYS PRUNE THEM! • Originally: • Logging is not automatic • Problem: • Any problem to Veritas requires logs • Resolution: • Logging turned on • Most verbose setting in bp.conf (verbose = 5) • Logs backed up • Logs are deleted after 5 days (default = 28 days)
Plan for Database space Databases are CRITICAL to your backups and restores! • Originally: • 23Gb disk for backups • Use NetBackup defaults for image compression, log and job information retention • Problem: • Databases get large • Logs get very large • Job information cumulates quickly • Resolution: • Compress images after 15 days • Keep job information 9 days • 60Gb raid array for databases
Tape Drive on Master for Database Backups Databases are critical - back ‘em up! • Originally: • Backup through Network Appliance “not supported” but roughly 50% successful • Problem: • Databases are important!!! • Resolution: • Attach tape drive to master server
Plans for the Future -- Short Term • Build in robot redundancy -- attach filers to two drives in separate robots. Easiest with two tape adapters, which requires ONTAP 5.3. Underway. • Larger library for auxiliary data center.
Concerns • Size of databases -- disk space, database backups can’t span tapes • Future of NDMP, enhancements (next page), industry acceptance • Qualifying NetBackup on ONTAP releases. (There has been a history of problems with new releases and NetBackup.)
Living on the Edge • Evaluating problems/enhancements is complicated since it’s not clear whether it’s a function of NDMP or NetBackup • Many features of NetBackup are not available with NDMP
Enhancement Requests • Fragments to decrease restore time • Restore to another filer • Checkpoint restart • User initiated restores without breach in data security • Use of meta-characters in file list • Use of exclude and include lists • Fibre channel tape drives/tape adapter • Tape drive sharing between filers • Allow filer attached tape drives to be media servers for other backups
Miscellaneous Implementation Details • Job information fed into SAS to provide Web reports • Offsiting scripts generate lists of tapes to Operations • Changing retention levels on 1st full of month from two months to 18 months • Implementing Web based tape management tool by feeding media lists, offsite lists, etc to SAS.
Web reports • Job statistics are generated daily via cron. • “bpdbjobs” is run with a specific format file on each master server. • The output is massaged by a Perl program to create a comma-delimited list for input to SAS. • SAS reads the data and updates existing observations and adds new observations. • Reporting is via the web.
Offsiting • Perl script to list all tapes used for full backups from the “previous Friday 6PM through Monday 6PM” • Uses vmquery and bpimagelist to create a robot sorted list of tapes and slot numbers • Uses available_media report to provide list of scratch tapes • One script for all master servers
Changing Retention Levels • We run weekly full backups with a two month retention level • Desire monthly full backups with an 18 month retention level • Originally ran two fulls during the weekend -- weekly full and monthly full • Too many tapes used, too many images taking space • Implementing PERL script to change retention level of 1st weekly full backup of the month to 18 month retention level
Tape Management • Web based tape management system under development • NetBackup medialist fed to SAS • Help Desk can enter tape request for restores • Operators can manage tape requests • Operators can manage tape rotation
Summary • VERITAS NetBackup with NDMP very successfully backs up Network Appliance Filers • Help Desk pleased with ease of use and throughput of restores • Use of NetBackup expanded to NT and UNIX backups • AFS evaluation forthcoming