540 likes | 859 Vues
Analysis of Virtual Tape Subsystems Ned Diehl The Information Systems Manager, Inc. ned.diehl@perfman.com www.perfman.com 610-865-0300 NCACMG Vienna, Virginia 5 June 2002. ISM PerfMan IBM Parallel Sysplex RMF OS/390. z/OS Magstar VSM StorageTek HSC Nearline.
E N D
Analysis of Virtual Tape Subsystems Ned DiehlThe Information Systems Manager, Inc.ned.diehl@perfman.comwww.perfman.com610-865-0300NCACMGVienna, Virginia5 June 2002
ISM PerfMan IBM Parallel Sysplex RMF OS/390 z/OS Magstar VSM StorageTek HSC Nearline Trademarks (Omissions are unintentional)
Objectives • Discuss sources of performance analysis and capacity planning data for virtual tape subsystems (VT) in an OS/390 environment • Primary focus on hardware implementations • IBM VTS • StorageTek VSM • Present graphical examples • System rather than application focus
Contents • Introduction • Virtual Tape Structure • Data Sources • Key Performance Metrics • Recommendations • Summary • References
Introduction • Solves many traditional tape problems • Small data sets • Low activity open data sets • Allocation wait for tape drives • Weakest with “good” tape activity • Full volumes • High transfer rates • Exploits high capacity tapes • VT helps remove implementation impediments
Introduction • Consider disaster recovery • VTS Peer-to-Peer • Clustered VTSS Configuration • Mount management considerations • Deferred mount should generally be avoided • Premount can be desirable • Not yet everything for everyone – but getting there • Fewer problem datasets with current implementations
VT StructureGeneral • Looks (almost) like 3490E to host • Combination of hardware and software • RISC processor • Library of high capacity tapes • RAID DASD buffer or CACHE • Tape drives • Amount of host software varies with implementation
MVC VTV MVC VTV MVC VTV VTV VTV VT StructureVSM - Physical Architecture HSC VTCS VSM Control & Data Path Control Path CDS Migrate/ Recall VTSS • Physical tape drives • Physical tape volumes • Physical Libraries & slots • Virtual tape drives • Virtual tape volumes
Data Sources • Application • SMF 30, 72, 14, & 15 • Hardware • SMF 94 - tape library statistics • STK user SMF record • SMF 14 & 15 – data set activity • SMF 21 – error statistics by volume • RMF 73 - channel path activity • RMF 74-1 - device activity • RMF 78-3 - I/O queuing • Real time controller interface
Data SourcesSMF 94 • Easy to work with though not flexible • Hourly summary from VTS or Library • Difficult to synchronize (do not use SMF94HHI) • Identical data to all attached recording images • Base segments reflect all library statistics • ATL, mount, dismount, eject, & insert • Native and VTS activity • Identical data in multiple records with a different serial number if VTS and native drives attached • VTS segments reflect single VTS • VTS, import / export, enhanced statistics
Data SourcesSMF 94 • One record produced for each • Library with native drives, identified by ATL serial (SMF94SNO). Contains base segments. • VTS, identified by VTS serial (SMF94SNO & SMF94VLS). Contains base and VTS segments. • With peer-to-peer, three (local) or four (remote) VTSs involved • Only two have physical tape • No easy way to associate them • Generally consistent with RMF • RMF allows finer detail of comparable metrics
Data SourcesSMF 94 Data Issues • Hour index (SMF94HHI) is interval end • Drives available (SMF94VTA) not always reset after service • Drives used (SMF94VTV, VTN, & VTX) can be greater than available • Alignment problem with early Import/Export statistics (SMF94ACA & SMF94ACB) • Duplicate library data with a different serial number if VTS and native drives attached • Average age in cache (SMF94VCA) has had changed definitions
Data SourcesSMF 94 Data Issues • Tape data transfer (SMF94VTR & SMF94VTW) reported as zero in some recent samples • Max cache volume age (S94MTVCA) recorded as seconds, documented as minutes • Backstore compression ratio (S94BSRAT) sometimes reported as less than one • Recall throttle percent (S94RCPRT) sometimes greater than 100
Data SourcesSMF 94 Data Issues • Reference Flash 10124 dated 20 November 2001. Recalls might be recorded as hits. Effected metrics: • S94MAXCH Maximum Cache Hit Mount Time • S94AVGCH Average Cache Hit Mount Time • SMF94VMH Number of Cache Hit Mounts • S94MAXRM Maximum Recall Mount Time • S94AVGRM Average Recall Mount Time • SMF94VMS Number of Recall Mounts • SMF94VPS Number of Physical Mounts for Recall
Data SourcesSTK User Record • Complex to work with • Many subtypes • Repeating segments • Allows detail analysis • Configuration info • CU and device busy • Volume level data • VTSS interval of 15 minutes • Grouping and summarization required • Some logical metrics require multiple subtypes
Data SourcesSTK User Record • Interval metrics by VTSS (not image) to each recording OS/390 • Subsystem, channel interface, and RTD performance • Values vary with OS/390 recording time • Logically identically but physically different • Event metrics recorded once • VTV mount, dismount, delete, migrate, recall, movement, replicate • RTD mount, dismount, vary • MVC status
Data SourcesSTK Data Issues • Must process event records from all recording images • Mount events not necessarily recorded to original requesting image • Dismount events not necessarily recorded to same image as mount • Might be extra mounts or dismounts • Calculated times might be very large or negative • Some time values in complex formats
Data SourcesSTK Data Issues • RMF consistency varies • Most metrics close across multiple samples • I/O Rates • Connect time • Allocated time • Total mount time • Significant variation with mount counts and thus average mount time • Probably a code level issue
Key Performance Metrics • Application • Service levels • Hardware • Data transfer • Device usage • Virtual mount time • Mount miss (recall) rate • Storage usage • Tape volume age (in DASD buffer or cache)
Key Performance Metrics Application Performance • Should have reasonable and measurable performance objectives • Objectives will vary with • Installation • Application • VT • Time & day
Key Performance Metrics Data Transfer • Good for reporting work performed • Mounts and I/O rates also options • Saturation varies with environment • SMF 14, 15, and 21 provide read and write • RMF 73 provides read and write for FICON. • VTS provides data transfer • Host read (SMF94VBR) and write (SMF94VBW) are good throughput indicators • Read (SMF94VTR) and write (SMF94VTW) between cache and real tapes
Key Performance Metrics Data Transfer • VTSS provides many data transfer metrics but assumptions are required • RTD bytes read (SMF20BTR) and written (SMF20BTW) • RTD connect (SMF20DCT) and utilization (SMF20DUT) • Host and RTD channel interface busy (SMF11CUB) • VTV dismount has VTV size (SMF14VSZ) • Probably best throughput indicator
Key Performance Metrics Device Usage • RMF 74-1 provides virtual by device or group. • Allows separation of mount and allocation components (e.g. connect, pend, wait). • VTS provides minimum, maximum, average, and configured for both virtual and physical devices. All values are integers. • VTS average (SMF94VTV) and maximum (SMF94VTX) physical are good for trending • While not normally a problem, virtual device use should not be ignored.
Key Performance Metrics Device Usage • Difficult to calculate VTSS minimum, maximum, average, and available for either virtual or physical devices • Mount, dismount, and vary subtypes must be grouped and summarized • RTDs can be statically shared (operator command) by multiple VTSSs and OS/390 • VTSS RTD connect and utilization, which were previously discussed, are useful
Key Performance Metrics Virtual Mount Time • RMF 74-1 provides average mount time by device or group • VTS provides minimum, maximum, and average virtual and physical mount times • Average virtual (SMF94VRA) is good for trending • High maximum virtual (SMF94VRX) often correlates with problems
Key Performance Metrics Virtual Mount Time • VTSS mount time calculated from mount event records (SMF13MET - SMF13MST) • Min, max, average and counts require grouping and summarization • Average might be distorted by extra mounts • VTSS provides several potentially interesting identifiers • Job, step, DSN, VTV management class • Scratch, existing
Key Performance Metrics Virtual Mount Time • Mount times tend to increase with: • Reduced tape volume age on DASD (SMF94VCA) • Increased mount miss rate (SMF94VMS) • Increased average real drives mounted (SMF94VTV) • Published targets: • Daily average less than 30 seconds • Hourly average less than 300 seconds • Maximum less than 900 seconds
Key Performance Metrics Mount Miss Rate • VTS provides mounts by type • Fast ready (SMF94VFR) • Specific mount hits (SMF94VMH) • Specific mount misses (SMF94VMS) • Related to average time on DASD (SMF94VCA) and application cycles • Published target of miss percentage less than 20% • Miss % = 100 * Misses / Total Virtual Mounts • Miss % = 100 * VMS / (VFR + VMH + VMS)
Key Performance Metrics Mount Miss Rate • VTSS mount hit indicator added in recent update (SMF13RCI) • Scratch mounts indicated (SMF13VMT) • If older data, could assume a hit if calculated mount time less than a threshold
Key Performance Metrics Storage Usage • VTS provides estimate of data on stacked volumes (SMF94VBA) and available capacity on empty cartridges (SMF94VEC) • Reconciliation will probably free space • High recalls (SMF94VMS) and low tape volume age on DASD (SMF94VCA) indicate need for more DASD • More DASD can relieve saturated physical tape drives
Key Performance Metrics Storage Usage • VTSS provides cache and DASD metrics • Configuration info • DASD capacity (SMF10TCP), cache & NVS size, channels • Used (SMF10FCP) and contiguous (SMF10CFP) free DASD space • No easy way to report VTSS MVC metrics • Must produce & process MVC status records
Key Performance Metrics Tape Volume Age • VTSS delete reference age can sometimes be related to application cycles • SMF delete time stamp – SMF15LTR • Consider selective summarization • Separate migrate immediate delete • Subtype 18 with SMF18MTI set, closely followed by related Subtype 15 (same virtual VOLSER) • Occasionally very high values