750 likes | 1.07k Vues
Smart Storage and Linux An EMC Perspective. Ric Wheeler ric@emc.com. Why Smart Storage?. Central control of critical data One central resource to fail-over in disaster planning Banks, trading floor, air lines want zero downtime Smart storage is shared by all hosts & OS’es
E N D
Smart Storage and LinuxAn EMC Perspective Ric Wheeler ric@emc.com
Why Smart Storage? • Central control of critical data • One central resource to fail-over in disaster planning • Banks, trading floor, air lines want zero downtime • Smart storage is shared by all hosts & OS’es • Amortize the costs of high availability and disaster planning over all of your hosts • Use different OS’es for different jobs (UNIX for the web, IBM mainframes for data processing) • Zero-time “transfer” from host to host when both are connected • Enables cluster file systems
Data Center Storage Systems • Change the way you think of storage • Shared Connectivity Model • “Magic” Disks • Scales to new capacity • Storage that runs for years at a time • Symmetrix case study • Symmetrix 8000 Architecture • Symmetrix Applications • Data center class operating systems
Traditional Model of Connectivity • Direct Connect • Disk attached directly to host • Private - OS controls access and provides security • Storage I/O traffic only • Separate system used to support network I/O (networking, web browsing, NFS, etc)
Shared Models of Connectivity • VMS Cluster • Shared disk & partitions • Same OS on each node • Scales to dozens of nodes • IBM Mainframes • Shared disk & partitions • Same OS on each node • Handful of nodes • Network Disks • Shared disk/private partition • Same OS • Raw/block access via network • Handful of nodes
New Models of Connectivity FreeBSD VMS Linux • Every host in a data center could be connected to the same storage system • Heterogeneous OS & data format (CKD & FBA) • Management challenge: No central authority to provide access control Solaris Shared Storage IRIX DGUX NT HPUX MVS
Magic Disks • Instant copy • Devices, files or data bases • Remote data mirroring • Metropolitan area • 100’s of kilometers • 1000’s of virtual disks • Dynamic load balancing • Behind the scenes backup • No host involved
Scalable Storage Systems • Current systems support • 10’s of terabytes • Dozens of SCSI, fibre channel, ESCON channels per host • Highly available (years of run time) • Online code upgrades • Potentially 100’s of hosts connected to the same device • Support for chaining storage boxes together locally or remotely
Longevity • Data should be forever • Storage needs to overcome network failures, power failures, blizzards, asteroid strikes … • Some boxes have run for over 5 years without a reboot or halt of operations • Storage features • No single point of failure inside the box • At least 2 connections to a host • Online code upgrades and patches • Call home on error, ability to fix field problems without disruptions • Remote data mirroring for real disasters
Symmetrix Architecture • 32 PowerPC 750’s based “directors” • Up to 32 GB of central “cache” for user data • Support for SCSI, Fibre channel, Escon, … • 384 drives (over 28 TB with 73 GB units)
Prefetch is Key • Read hit gets RAM speed, read miss is spindle speed • What helps cached storage array performance? • Contiguous allocation of files (extent-based file systems) preserve logical to physical mapping • Hints from the host could help prediction • What might hurt performance? • Clustering small, unrelated writes into contiguous blocks (foils prefetch on later read of data) • Truly random read IO’s
Symmetrix Applications • Instant copy • TimeFinder • Remote data copy • SRDF (Symmetrix Remote Data Facility) • Serverless Backup and Restore • Fastrax • Mainframe & UNIX data sharing • IFS
“Race to Sunrise” 2 am 6 am Business Continuance Problem“Normal” Daily Operations Cycle Online Day BACKUP / DSS 4 Hours of Data Inaccessibility* Resume Online Day
TimeFinder • Creation and control of a copy of any active application volume • Capability to allow the new copy to be used by another application or system • Continuous availability of production data during backups, decision support, batch queries, DW loading, Year 2000 testing, application testing, etc. • Ability to create multiple copies of a single application volume • Non-disruptive re-synchronization when second application is complete BUSINESS CONTINUANCE VOLUME PRODUCTION APPLICATION VOLUME Sales Backups Decision Support Data Warehousing Euro Conversion PRODUCTION APPLICATION VOLUME BUSINESS CONTINUANCE VOLUME PRODUCTION APPLICATION VOLUME BUSINESS CONTINUANCE VOLUME BCV is a copy of real production data
Business Continuance Volumes • A Business Continuation Volume (BCV) is created and controlled at the logical volume level • Physical drive sizes can be different, logical size must be identical • Several ACTIVE copies of data at once per Symmetrix
M1 M2 BCV Using TimeFinder • Establish BCV • Stop transactions to clear buffers • Split BCV • Start transactions • Execute against BCVs • Re-establish BCV
UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED Re-Establishing a BCV Pair Split BCV Pair • BCV pair “PROD” and “BCV” have been split • Tracks on “PROD” updated after split • Tracks on ‘BCV’ updated after split • Symmetrix keeps table of these “invalid” tracks after split • At re-establish BCV pair, “invalid” tracks are written from “PROD” to “BCV” • Synch complete BCV M1 PROD M1 Re-Establish BCV Pair BCV PROD
UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED UPDATED Restore a BCV Pair Split BCV Pair • BCV pair “PROD” and “BCV” have been split • Tracks on “PROD” updated after split • Tracks on “BCV” updated after split • Symmetrix keeps table of these “invalid” tracks after split • At restore BCV pair, “invalid” tracks are written from “BCV to PROD” • Synch complete BCV PROD Restore BCV Pair BCV PROD
M1 M2 BCV 1 BCV 2 BCV 3 Make as Many Copies as Needed • Establish BCV 1 • Split BCV 1 • Establish BCV 2 • Split BCV 2 • Establish BCV 3 4 PM 6 PM 5 PM
The Purpose of SRDF • Local data copies are not enough • Maximalist • Provide a remote copy of the data that will be as usable after a disaster as the primary copy would have been. • Minimalist • Provide a means for generating periodic physical backups of the data.
Synchronous Data Mirroring • Write is received from the host into the cache of the source • I/O is transmitted to the cache of the target • ACK is provided by the target back to the cache of the source • Ending status is presented to the host • Symmetrix systems destage writes to disk • Useful for disaster recovery
Semi-Synchronous Mirroring • An I/O write is received from the host/server into the cache of the source • Ending status is presented to the host/server. • I/O is transmitted to the cache of the target • ACK is sent by the target back to the cache of the source • Each Symmetrix system destages writes to disk • Useful for adaptive copy
Backup / Restore of Big Data • Exploding amounts of data cause backups to run on too long • How long does it take you to backup 1 TB of data? • Shrinking backup window and constant pressure for continuous application up-time • Avoid using production environment for backup • No server CPU or I/O channels • No involvement of regular network • Performance must scale to match customer’s growth • Heterogeneous host support
Fastrax Overview Fibre Channel PtP Link(s) Tape Library Fastrax Data Engine SCSI BCV2 UNIX R1 R2 SCSI STD2 STD1 BCV1 Linux Tape Library UNIX Location 2 Location 1 Fastrax EnabledBackup/RestoreApplications SYMAPI
Fastrax Symmetrix Tape Library Host Host to Tape Data Flow
RAF RAF DM DM SRDF DM RAF DM RAF Fastrax Fastrax Performance • Performance scales with the number of data movers in the Fastrax box & number of tape devices • Restore runs as fast as backup • No performance impact on host during restore or backup
InfoMover File System • Transparent availability of MVS data to Unix hosts • MVS datasets available as native Unix files • Sharing a single copy of MVS datasets • Uses MVS security and locking • Standard MVS access methods for locking + security
Minimal Network Overhead -- No data transfer over network! -- MVS Data IFS Implementation • Mainframe • IBM MVS / OS390 • Open Systems • IBM AIX • HP HP-UX • Sun Solaris ESCON Channel Parallel Channel FWD SCSI Ultra SCSI Fibre Channel Symmetrix with ESP
Symmetrix API Overview • SYMAPI Core Library • Used by “Thin” and Full Clients • SYMAPI Mapping Library • SYMCLI Command Line Interface
Symmetrix API’s • SYMAPI are the high level functions • Used by EMC’s ISV partners (Oracle, Veritas, etc) and by EMC applications • SYMCLI is the “Command Line Interface” which invoke SYMAPI • Used by end customers and some ISV applications.
Basic Architecture User access to the Solutions Enabler is via the SymCli or Storage Management Application Other Storage Management Applications Symmetrix Command Line Interpreter (SymCli) Symmetrix Application Programming Interface (SymAPI)
Client Host Storage Management Applications Server Host SymAPI Client SymAPIlibrary SymAPI Server Thin Client Host Storage Management Applications Thin SymAPI Client Client-Server Architecture • Symapi Server runs on the host computer connected to the Symmetrix storage controller • Symapi client runs on one or more host computers
SymmAPI Components Initialization InfoSharing Gatekeepers Calypso Controls Discover and Update Optimizer Controls Configuration DeltaMark Functions Device Groups SRDF Functions Statistics TimeFinder Functions Mapping Functions Base Controls
Data Object Resolve RDBMS Data File File System Logical Volume Host Physical Device Symmetrix Device Extents
File System Mapping • File System mapping information includes: • File System attributes and host physical location. • Directory attributes and contents. • File attributes and host physical extent information, including inode information, fragment size. I-nodes Directories File extents
Solaris & Sun Starfire • Hardware • Up to 62 IO Channels • 64 CPU’s • 64 GB of RAM • 60 TB of disk • Supports multiple domains • Starfire & Symmetrix • ~20% use more than 32 IO channels • Most use 4 to 8 IO channels per domain • Oracle instance usually above 1 TB
HPUX & HP 9000 Superdome • Hardware • 192 IO Channels • 64 CPU’s cards • 128 GB RAM • 1 PB of storage • Superdome and Symm • 16 LUNS per target • Want us to support more than 4000 logical volumes!
Solaris and Fujitsu GP7000F M1000 • Hardware • 6-48 I/O slots • 4-32 CPU’s • Cross-Bar Switch • 32 GB RAM • 64-bit PCI bus • Up to 70TB of storage
Solaris and Fujitsu GP7000F M2000 • Hardware • 12-192 I/O slots • 8-128 CPU’s • Cross-Bar Switch • 256 GB RAM • 64-bit PCI bus • Up to 70TB of storage
AIX 5L & IBM RS/6000 SP • Hardware • Scale to 512 Nodes (over 8000 CPUs) • 32 TB RAM • 473 TB Internal Storage Capacity • High Speed Interconnect 1GB/sec per channel with SP Switch2 • Partitioned Workloads • Thousands of IO Channels
IBM RS/6000 pSeries 680 AIX 5L • Hardware • 24 CPUs • 64-bit RS64 IV • 600MHz • 96 MB RAM • 873.3 GB Internal Storage Capacity • 53 PCI slots • 33 – 32bit/20-64bit
Really Big Data • IBM (Sequent) NUMA • 16 NUMA “Quads” • 4 way/ 450 MHz CPUs • 2 GB Memory • 4 x 100MB/s FC-SW • Oracle 8.1.5 with up to 42 TB (mirrored) DB • EMC Symmetrix • 20 Small Symm 4’s • 2 Medium Symm 4’s
Windows 2000 on IA32 • Usually lots of small (1u or 2u) boxes share a Symmetrix • 4 to 8 IO channels per box • Qualified up to 1 TB per meta volume (although usually deployed with ½ TB or less) • Management is a challenge • Will 2000 on IA64 handle big data better?