1 / 14

Experience with RAC/Linux on no-brand hardware

Experience with RAC/Linux on no-brand hardware. DB Service meeting, 13th June Jacek Wojcieszuk , Luca Canali CERN. Agenda. Motivation s Architecture Implementation Lessons learned. Goal. Main goal: to build a highly available and scalable database service minimizing cost Why?

lcopeland
Télécharger la présentation

Experience with RAC/Linux on no-brand hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience with RAC/Linux on no-brand hardware DB Service meeting, 13th June Jacek Wojcieszuk, Luca Canali CERN

  2. Agenda • Motivations • Architecture • Implementation • Lessons learned Jacek Wojcieszuk, Luca Canali, CERN

  3. Goal • Main goal: to build a highly available and scalable database serviceminimizing cost • Why? • High demand for resources from the experiments • Huge data volume expected • Need for high availability • Some DBs are mission critical • Reduce DB administration and HW costs Jacek Wojcieszuk, Luca Canali, CERN

  4. Software Architecture • Operating system -> Linux (RedHat ES) • Recommended choice for x86-based hardware and Oracle • Picking up momentum for many Oracle shops • Good ROI and stability • Sysadmin team on campus provide Linux expertise and support • Database software -> Real Application Cluster 10g • Database cluster solution from Oracle • Few years of in-house positive experiences with version 9i • 10g R2 – added stability and performance Jacek Wojcieszuk, Luca Canali, CERN

  5. Software Architecture (2) • Volume manager and cluster filesystem -> Automatic Storage Management (ASM) • Included in the Oracle distribution • Seems to be main Oracle direction in the volume management area • Well integrated with Oracle RDBMS and RAC • Provides all the indispensable functionality for HA and performance (striping + mirroring) • But it is a new piece of software – still sometimes troublesome • Not yet widely used Jacek Wojcieszuk, Luca Canali, CERN

  6. Hardware Architecture General idea: clusters of many small HW components load balanced by software • Elonex/Megware ‘mid-range’ servers • 2 CPU x86 compatible @ 3GHz, 4GB RAM • High perfomance/cost compared to other platforms • Infortrend/Transtec • 16 and 8 SATA disks disk arrays with FC controller • High capacity/cost + reasonable performance • SAN infrastructure (HBAs, FC switches) from Qlogic • 3COM and HP Gb Ethernet switches Jacek Wojcieszuk, Luca Canali, CERN

  7. Maintenance and responsibilities • Hardware: • Installation in racs and cabling: FIO + CS • FC switches initial configuration: FIO + PSS • Disk arrays initial configuration: PSS • FC switches and disk arrays re-configuration: PSS • Failure handling: FIO + vendors • Software: • OS - basic configuration, patches and problem handling: FIO • OS - cluster configuration: PSS • Time consuming and error-prone part • Oracle software installation, patching and problem handling: PSS Jacek Wojcieszuk, Luca Canali, CERN

  8. Implementation – hardware layout • Cluster nodes and storage arrays are added to match experiments demand. Servers SAN Storage Jacek Wojcieszuk, Luca Canali, CERN

  9. Implementation – ASM configuration Mirroring Striping Striping RAC1 RAC2 RAC1 RAC2 RAC2 RAC1 RAC2 RAC1 • ASM is a volume manager and cluster filesystem for Oracle DB files • Implements S.A.M.E. (stripe and mirror everything) • Similar to RAID 1 + 0: good for performance and HA • Online storage reconfiguration (ex: in case of disk failure) • Ex: ASM ‘filesystems’ -> disk groups: DataDiskGrp RecDiskGrp Jacek Wojcieszuk, Luca Canali, CERN

  10. Positive experiences • High Availability • We took care to install and test RAC services to avoid single points of failure • Notably: • on-disk backups to leverage on the high capacity of SATA • Multipathing on Linux with Qlogic driver successfully implemented • Rolling Oracle CPU patches help for HA • Perfomance and scalability • PVSS optimization proves that CPU bounded applications can scale almost linearily up to 6 nodes and more • Very good results also on IO performance • IO subsystem scales well at least up to 64 disks • ~800MB/s for sequential read for a 4 node + 64 disks RAC • ~100 random IOs/s per disk, 8000 small random IOPS shown in a benchmark on compass RAC Jacek Wojcieszuk, Luca Canali, CERN

  11. Positive experiences • Installation of clusters, although time consuming, is pretty straightforward • See installation procedure on wiki • Reliability • Majority of the hardware we use seems to be reliable enough • Mid-range servers – very few problems • FC switches and HBAs – very few problems • Disks - ~1 failure per month (~600 disks in use) • Disk array controlles – the weakest point of the infrastructure • Ethernet switches – no problems so far • Support and Oracle patching • Few tickets opened of RAC issues since 10.2 • The fact that the hardware we use is not ‘Oracle validated’ is not an issue for Oracle Support Jacek Wojcieszuk, Luca Canali, CERN

  12. Open issues • High availability • Cluster software failures can bring the system down. A few ‘freeze’ have been observed and fixed with partial or total cluster reboot (rare) • Full cluster interconnect failure can bring the system down to single node • ASM • Serious issues with ASM in version 10gR1 • Much better with 10.2, but storage reconfiguration still not as straightforward as we would like it to be. Especially annoying are failures of disk array controllers (ASM architectural constraint) • Cannotapply OS kernel upgrade or Oracle patchsets in rolling fashion Jacek Wojcieszuk, Luca Canali, CERN

  13. Open issues (2) • Hardware failures handling: • We experienced a few cases where repeated HW failures could not be proactively diagnosed by sysadmins till they escalated to broken HW • Room for improvement • Vendor calls usually take at least few days • Necessity to keep spare hardware handy • Fixing problems with disks and disk array controllers is time consuming and troublesome • A lot of manual work and error prone Jacek Wojcieszuk, Luca Canali, CERN

  14. Conclusions • Physics database services currently run: • ~50 mid-range servers and ~50 disk arrays (~600 disks) • In other words: 100 CPUs, 200GB of RAM, 200 TB of raw disk space • Half of the servers are in production, monitored 24x7 • Positive experience so far • a big step forward from the previous production architecture (IDE ‘diskservers’) • Can more easily grow to meet the demands of the experiments during LHC startup • Low-cost hardware + Oracle 10g RAC on ASM can be used to build highly available database services with very good perfomance/price ratio More info: • http://www.cern.ch/phydb/ • https://twiki.cern.ch/twiki/bin/view/PSSGroup/HAandPerf Jacek Wojcieszuk, Luca Canali, CERN

More Related