SAN Disk Metrics

SAN Disk Metrics Measured on Sun Ultra & HP PA-RISC Servers, StorageWorks MAs & EVAs, using iozone V3.152

Current Situation • UNIX External Storage has migrated to SAN • Oracle Data File Sizes: 1 to 36 GB (R&D) • Oracle Servers are predominantly Sun “Entry Level” • HPQ StorageWorks: 24 MAs, 2 EVAs • 2Q03 SAN LUN restructuring using RAID 5 only • Oracle DBAs continue to request RAID 1+0 • Roadmap for future - needed

Purpose Of Filesystem Benchmarks • Find Best Performance • Storage, Server, HW options, OS, and Filesystem • Find Best Price/Performance • Restrain Costs • Replace “Opinions” with Factual Analysis • Continue Abbott UNIX Benchmarks • Filesystems, Disks, and SAN • Benchmarking began in 1999

Goals • Measure Current Capabilities • Find Bottlenecks • Find Best Price/Performance • Set Cost Expectations For Customers • Provide a Menu of Configurations • Find Simplest Configuration • Satisfy Oracle DBA Expectations • Harmonize Abbott Oracle Filesystem Configuration • Create a Road Map for Data Storage

Preconceptions • UNIX SysAdmins • RAID 1+0 does not vastly outperform RAID 5 • Distribute Busy Filesystems among LUNs • At least 3+ LUNs should be used for Oracle • Oracle DBAs • RAID 1+0 is Required for Production • I Paid For It, So I Should Get It • Filesystem Expansion On Demand

Web serving:Small, integrated system CPU Database/CRM/ERP: Storage Oracle Server Resource Needs in 3D Memory I/O

Sun Servers for Oracle Databases • Sun UltraSPARC UPA Bus Entry Level Servers • Ultra 2, 2x300 MHz Ultra SPARC-II, Sbus, 2 GB • 220R, 2x450 MHz Ultra SPARC-II, PCI, 2 GB • 420R, 4x450 MHz Ultra SPARC-II, PCI, 4 GB • Enterprise Class Sun UPA Bus Servers • E3500, 4x400 MHz Ultra SPARC-II, UPA, Sbus, 8 GB • Sun UltraSPARC Fireplane (Safari) Entry Level Servers • 280R, 2x750 MHz Ultra SPARC-III, Fireplane, PCI, 8 GB • 480R, 4x900 MHz Ultra SPARC-III, Fireplane, PCI, 32 GB • V880, 8x900 MHz Ultra SPARC-III, Fireplane, PCI, 64 GB • Other UNIX • HP L1000, 2x450 PA-RISC, Astro, PCI, 1024 MB

Oracle UNIX Filesystems • Cooperative Standard between UNIX and R&D DBAs • 8 Filesystems in 3 LUNs • /exp/array.1/oracle/<instance> binaries & config • /exp/array.2-6/oradb/<instance> data, index, temp, etc… • /exp/array.7/oraarch/<instance> archive logs • /exp/array.8/oraback/<instance> export, backup (RMAN) • Basic LUN Usage • Lun1: array.1-3 • Lun2: array.4-6 • Lun3: array.7-8 (Initially on “far” Storage Node)

StorageWorks SAN Storage Nodes • StorageWorks: DEC -> Compaq -> HPQ • A traditional DEC Shop • Initial SAN equipment vendor • Brocade Switches resold under StorageWorks label • Only vendor with complete UNIX coverage (2000) • Sun, HP, SGI, Tru64 UNIX, Linux • EMC, Hitachi, etc… could not match UNIX coverage • Enterprise Modular Array (MA) – “Stone Soup” SAN • Buy the controller, then 2 to 6 disk shelves, then disks • 2-3 disk shelf configs have led to problem RAIDsets which have finally been reconfigured in 2Q2003 • Enterprise Virtual Array (EVA) – Next Generation

MA 8000

EVA

2Q03 LUN Restructuring – 2nd Gen SAN • “Far” LUNs pulled back to “near” Data Center • 6 disk, 6 shelf MA RAID 5 RAIDsets • LUNs are partitioned from RAIDsets • LUNs are sized as multiples of disk size • Multiple LUNs from different RAIDsets • Busy filesystems are distributed among LUNs • Server and Storage Node SAN Fabric Connections mated to common switch

Results – Generalizations • Read Performance - Server Performance Baseline • Basic Measure of System Bus, Memory/Cache, & HBA • Good evaluation of dissimilar server I/O potential • Random Write - Largest Variations in Performance • Filesystem & Storage Node Selection • Dominant Variables • Memory & Cache – Important • Processor Cache, System I/O Buffers, Virtual Memory • All boost different data stream size performance • More Hardware, OS, & Fsys selections • To be evaluated

IOZONE Benchmark Utility • File Operations • Sequential Write & Re-write • Sequential Read & Re-read • Random Read & Random Write • Others are available: • record rewrite, read backwards, read strided, fread/fwrite, pread/pwrite, aio_read/aio_write • File & Record Sizes • Ranges or individual sizes may be specified

IOZONE – Output: UFS Seq Read

IOZONE – UFS Sequential Read

IOZONE – UFS Random Read

IOZONE – UFS Sequential Write

IOZONE – UFS Random Write

Results – Server Memory • Cache • Influences small data stream performance • Memory - I/O buffers and virtual memory • Influences larger data stream performance • Large Data Streams need Large Memory • Past this limit => Synchronous performance

Results – Server I/O Potential • System Bus • Sun: UPA replaced by SunFire • Peripheral Bus: PCI vs. SBus • Sbus (Older Sun only) • Peak Bandwidth (25 MHz/64-bit) ~200 MB/sec • Actual Thruput ~50-60 MB/sec (~25+%) • PCI (Peripheral Component Interconnect) • Peak Bandwidth (66 MHz/64-bit) ~530 MB/sec • Actual Thruput ~440 MB/sec (~80+%)

Server – Sun, UPA, SBus

Server – Sun Enterprise, Gigaplane/UPA, SBus

Server – Sun, UPA, PCI

Server – HP, Astro Chipset, PCI

Server – Sun, Fireplane, PCI

Results – MA vs. EVA • MA RAID 1+0 & RAID 5 vs. EVA RAID 5 • Sequential Write • EVA RAID 5 is 30-40% faster than MA RAID 1+0 • EVA RAID 5 is up to 2x faster than MA RAID 5 • Random Write • EVA RAID 5 is 10-20% slower than MA RAID 1+0 • EVA RAID 5 is up to 4x faster than MA RAID 5 • Servers were SunFire 480Rs, using UFS+logging. • EVA: 12 72 GB FCAL Disk RAID 5 partitioned LUN • MA: 6 36 GB SCSI Disk RAIDset

RAID 0 RAID 1

RAID 3 RAID 5

RAID 1+0 RAID 0+1

Results – MA RAIDsets • Best: 3 mirror, 6 shelf RAID 1+0 • 3 mirror RAID 1+0 on 2 shelves only yield 80% of 6 shelf version • 2 disk mirror (2 shelves) yields 50%

Results – MA RAIDsets • Best: 3 mirror, 6 shelf RAID 1+0 • 6 disk, 6 shelf RAID 5: • Sequential Write: 75-80% • Random Write: 25-50% (2 to 4 times slower) • 3 disk, 3 shelf RAID 5: • Sequential Write: 40-60% • Random Write: 25-60% • Can outperform 6 disk RAID 5 on random write

Results – LUNs from Partitions • 3 Simultaneous Writers • Partitions of same RAIDset • Write performance (S or R) • Less than 50% of no-contention performance • No control test performed: • 3 servers write to 3 different RAIDsets of same Storage Node • Where is the Bottleneck? • RAIDset, SCSI channels, or Controllers?

Results – Fabric Locality • In production, “far” LUNs underperform • Monitoring “sar” disk data, “far” LUN filesystems are 4 to 10 times slower. • Fabric-based service disruptions are drawn into the server when any LUNs are not local. • This round of testing did not show wide variations in performance whether the server was connected to it’s Storage Node’s SAN Switch, or 3 / 4 hops away.

Results – UFS Options • Logging • The journaling UFS Filesystem • Advised on large filesystems to avoid long running “fsck”. • Under Solaris 8, logging introduces a 10% write performance penalty. • Solaris 9 advertises its logging algorithm is much more efficient. • Forcedirectio • No useful testing without an Oracle workload

Results – UFS Tuning • Bufhwm: • Default 2% of memory, Max 20% of memory • Extends I/O Buffer effect • improves write performance on moderately large files • Ufs:ufs_LW & ufs:ufs_HW • Solaris 7 & 8: 256K & 384K bytes • Solaris 9: 8M & 16M bytes • More data is held in system buffer before being flushed. • Fsflush() effect on “sar” data: large service times

Results – VERITAS VxFS • Outstanding Write Performance • VxFS only on MA 6-disk RAID 5 • UFS on MA 6-disk RAID 5 • Sequential Write VxFS is 15 times faster • Random Write VxFS is 40 times faster • UFS on MA 6-disk RAID 1+0 • Sequential Write VxFS is 10 times faster • Random Write VxFS is 10 times faster • UFS on EVA 12-disk RAID 5 • Sequential Write VxFS is 7 times faster • Random Write VxFS is 12 times faster

Results –Random Write • Hardware-only Storage Node Performance • MA 1+0 = EVA RAID 5 • EVA RAID 5 pro-rata cost similar to MA RAID 5 • RAID 1+0 is Not Cost Effective • Improved Filesystem is Your Choice • Order-of-Magnitude Better Performance • Less expensive • Server Memory • Memory Still Is Important for Large Data Streams

Random Write: UFS, MA, RAID 5

Random Write: UFS, MA, RAID 1+0

Random Write: UFS, EVA, RAID 5

Random Write: VxFS, MA, RAID 5

Closer Look: VxFS vs. UFS • Graphical Comparison: • Sun Servers provided with RAID 5 LUNs • UFS EMA UFS EVA • VxFS EMA VxFS EVA • File Operations • Sequential Read • Random Read • Sequential Write • Random Write

Sequential Read

Random Read

Sequential Write

Random Write

Results – VERITAS VxFS • Biggest Performance gains • Everything else is of secondary importance • Memory Overhead for VxFS • Dominates Sequential Write of small files • Needs further investigation • VxFS & EVA RAID 1+0 not measured • Don’t mention what you don’t want to sell

Implications – VERITAS VxFS • Where is the Bottleneck? • Changes at Storage Node • Modest Increases in Performance • Changes within Server • Dramatically Increase Performance • The Bottleneck is in the Server, not the SAN • The relative cost is just good fortune • Changing the filesystem is much less expensive

Results – Bottom Line • Bottleneck Identified • It’s the Server, not Storage • VERITAS VxFS • Use it on UNIX Servers • RAID 1+0 is Not Cost Effective • VxFS is much cheaper – Tier 1 servers • Server Memory • Memory is cheaper than Mirrored Disk • Operating System I/O Buffers • Configure as large as possible

SAN Disk Metrics

SAN Disk Metrics

Presentation Transcript

Metrics

Metrics

Metrics

Disk Utility Encrypted Disk Image

Metrics

Disk Clearing and Disk Sanitization

DISK

Disk Storage (E)CKD VS SAN

Metrics

SAN Disk Metrics

Metrics

Metrics

Software Metrics/Quality Metrics

Metrics

METRICS

Metrics

Metrics

Buy San Disk 16gb Memory Card From Mobansp

Metrics

Metrics

SAN Disk Metrics

SAN Disk Metrics

Presentation Transcript

Metrics

Metrics

Metrics

Disk Utility Encrypted Disk Image

Metrics

Disk Clearing and Disk Sanitization

DISK

Disk Storage (E)CKD VS SAN

Metrics

SAN Disk Metrics

Metrics

Metrics

Software Metrics/Quality Metrics

Metrics

***METRICS***

Metrics

Metrics

Buy San Disk 16gb Memory Card From Mobansp

Metrics

Metrics

METRICS