Linux-HA Release 2

Alan Robertson Project Leader – Linux-HA project alanr@unix.sh IBM Linux Technology Center Linux-HA Release 2 - World-Class Open Source HA Software

Agenda • What is High-Availability (HA) Clustering? • What is the Linux-HA project? • Linux-HA applications and customers • Linux-HA release 1 / Release 2 /Feature Comparison • Release 2 Details • DRBD – an important component • Thoughts about cluster security

What Is HA Clustering? • Putting together a group of computers which trust each other to provide a service even when system components fail • When one machine goes down, others take over its work • This involves IP address takeover, service takeover, etc. • New work comes to the remaining machines • Not primarily designed for high-performance

High Availability Through Redundancy and Monitoring • Redundancy eliminates Single Points Of Failure (SPOF) • Monitoring determines when things need to change • Reduces cost of planned and unplanned outagesby reducing MTTR(Mean Time To Repair)

Failover and Restart • Monitoring detects failures (hardware, network, applications) • Automatic Recovery from failures (no human intervention) • Managed restart or failover to standby systems, components

The HA Continuum Single node HA system (monitoring w/o redundancy) • Provides for application monitoring and restart • Easy, zero-cost entry point – HA system starts init scripts instead of /etc/init.d/rc (or equivalent) • Addresses Solaris / Linux functional gap Multiple Virtual Machines – Single Physical machine • Adds OS crash protection, rolling upgrades of OS and application – good for security fixes, etc. • Many possibilities for interactions with virtual machines exist Multiple Physical Machines (“normal” cluster) • Adds protection against hardware failures Split-Site (“stretch”) Clusters • Adds protection against site-wide failures (power, air-conditioning, flood, fire)

What Can HA Clustering Do For You? • It cannot achieve 100% availability– nothing can. • HA Clustering designed to recover from single faults • It can make your outages very short • From about a second to a few minutes • It is like a Magician's (Illusionist's) trick: • When it goes well, the hand is faster than the eye • When it goes not-so-well, it can be reasonably visible • A good HA clustering system adds a “9” to your base availability • 99->99.9, 99.9->99.99, 99.99->99.999, etc. • Complexity is the enemy of reliability!

Lies, Damn Lies, and Statistics Counting nines – downtime allowed per year

The Desire for HA systems Who wants low-availability systems? • Why are so few systems High-Availability?

Why isn't everything HA? • Cost • Complexity

How Does HA work? Manage redundancy to improve service availability • Like a cluster-wide-super-init with monitoring • Even complex services are now “respawn” • on node (computer) death • on “impairment” of nodes • on loss of connectivity • for services that aren't working (not necessarily stopped) • managing complex dependency relationships

Single Points of Failure (SPOFs) A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service Good HA design adds redundancy to eliminate single points of failure Non-Obvious SPOFs can require deep expertise to spot

The “Three R's” of High-Availability Redundancy Redundancy Redundancy If this sounds redundant, that's probably appropriate... Most SPOFs are eliminated by redundancy HA Clustering is a good way of providing and managing redundancy

Redundant Data Access • Replicated • Copies of data are kept updated on more than one computer in the cluster • Shared • Typically Fiber Channel Disk (SAN) • Sometimes shared SCSI • Back-end Storage (“Somebody Else's Problem”) • NFS, SMB • Back-end database

The Linux-HA Project • Linux-HA is the oldest high-availability project for Linux, with the largest associated community • The core piece of Linux-HA is called “Heartbeat”(though it does much more than heartbeat) • Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites • Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others • Linux-HA is shipped with every major Linux distribution except one.

Linux-HA Release 1 Applications • Database Servers (DB2, Oracle, MySQL, others) • Load Balancers • Web Servers • Custom Applications • Firewalls • Retail Point of Sale Solutions • Authentication • File Servers • Proxy Servers • Medical Imaging Almost any type server application you can think of – except SAP

Linux-HA customers • FedEx – Truck Location Tracking • BBC – Internet infrastructure • The Weather Channel (weather.com) • Sony (manufacturing) • ISO New England manages power grid using 25 Linux-HA clusters • MAN Nutzfahrzeuge AG – truck manufacturing division of Man AG • Karstadt, Circuit City use Linux-HA and databases each in several hundred stores • Citysavings Bank in Munich (infrastructure) • Bavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City • Emageon – medical imaging services • Incredimail bases their mail service on Linux-HA on IBM hardware • University of Toledo (US)– 20k student Computer Aided Instruction system

Linux-HA Release 1 capabilities • Supports 2-node clusters • Can use serial, UDP bcast, mcast, ucast communication • Fails over on node failure • Fails over on loss of IP connectivity • Capability for failing over on loss of SAN connectivity • Limited command line administrative tools to fail over, query current status, etc. • Active/Active or Active/Passive • Simple resource group dependency model • Requires external tool for resource (service) monitoring • SNMP monitoring

Linux-HA Release 2 capabilities • Built-in resource monitoring • Support for the OCF resource standard • Much Larger clusters supported (>= 8 nodes) • Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP) • XML-based resource configuration • Coming in 2.0.x: • Configuration and monitoring GUI • Support for GFS cluster filesystem • Multi-state (master/slave) resource support Initially - no external IP, SAN monitoring

Linux-HA Release 1 Architecture

Linux-HA Release 2 Architecture(add TE and PE)

Resource Objects in Release 2 • Release 2 supports “resource objects” which can be any of the following: • Primitive Resources • Resource Groups • Resource Clones – “n” resource objects • Multi-state resources

Classes of Resource Agents in R(resource primitives) • OCF – Open Cluster Framework – http://opencf.org/ • Heartbeat – R1-style heartbeat resources • LSB – Standard LSB Init scripts • Stonith – Node Reset Capability

An OCF primitive object <primitive id=”WebIP” class=”ocf” type=”IPaddr” provider=”heartbeat”> <instance_attributes> <attributes> <nvpairname=”ip”value=”192.168.224.5”/> </attributes> </instance_attributes></primitive> Attribute nvpairs are passed in environment to resource agent

An LSB primitive resource object(i. e., an init script) <primitive id=”samba-smb-rsc” class=”lsb” type=”smb”> <instance_attributes> <attributes/> </instance_attributes></primitive>

Resource Groups Resource Groups provide a shorthand for creating ordering and co-location dependencies • Each resource object in the group is declared to have linear start-after ordering relationships • Each resource object in the group is declared to have co-location dependencies on each other • This is an easy way of converting release 1 resource groups to release 2 <group id=”webserver”> <primitive/> <primitive/></group>

Resource Clones • Resource Clones allow one to have a resource object which runs multiple (“n”) times on the cluster • This is useful for managing • load balancing clusters where you want “n” of them to be slave servers • Cluster filesystem mount points • Cluster Alias IP addresses • Cloned resource object can be a primitive or a group

Multi-State (master/slave) Resources(coming in approx. 2.0.1) • Normal resources can be in one of two stable states: • running • stopped • Multi-state resources can have more than two stable states. For example: • running-as-master • running-as-slave • stopped • This is ideal for modeling replication resources like DRBD

Basic Dependencies in Release 2 • Ordering Dependencies • start before (normally implies stop after) • start after (normally implies stop before) • Mandatory Co-location Dependencies • must be co-located with • cannot be co-located with

Resource Location Constraints • Mandatory Constraints: • Resource Objects can be constrained to run on any selected subset of nodes. Default depends on setting of symmetric_cluster. • Preferential Constraints: • Resource Objects can also be preferentially constrained to run on specified nodes by providing weightings for arbitrary logical conditions • The resource object is run on the node which has the highest weight (score)

Advanced Constraints • Nodes can have arbitrary attributes associated with them in name=value form • Attributes have types: int, string, version • Constraint expressions can use these attributes as well as node names, etc in largely arbitrary ways • Operators: • =, !=,<, >, <=, >= • defined(attrname), undefined(attrname), • colocated(resource id), notcolocated(resource id)

Advanced Constraints (cont'd) • Each constraint is associated with particular resource, and is evaluated in the context of a particular node. • A given constraint has a boolean predicate associated with it according to the expressions before, and is associated with a weight, and condition. • If the predicate is true, then the condition is used to compute the weight associated with locating the given resource on the given node. • Conditions are given weights, positive or negative. Additionally there are special values for modeling must-have conditions • +INFINITY • -INFINITY

rsc_location information • We prefer the webserver group to run on host node01 <rsc_location id=”run_Webserver” group=”webserver”> <rule id=”rule_webserver” score=100> <expression attribute=”#uname” operation=”eq” value=”node01”/> </rule></rsc_location>

DRBD – RAID1 over the LAN • DRBD is a block-level replication technology • Every time a block is written on the master side, it is copied over the LAN and written on the slave side • Typically, a dedicated replication link is used • It is extremely cost-effective – common with xSeries • Worst-case around 10% throughput loss • Recent versions have very fast “full” resync

Security Considerations • Cluster: A computer whose backplane is the Internet • If this isn't scary, you don't understand... • You may think you have a secure cluster network • You're probably mistaken now • You will be in the future

Secure Networks are Difficult Because... • Security is not often well-understood by admins • Security is well-understood by “black hats” • Network security is easy to breach accidentally • Users bypass it • Hardware installers don't fully understand it • Most security breaches come from “trusted” staff • Staff turnover is often a big issue • Virus/Worm/P2P technologies will create new holes especially for Windows machines

Security Advice • Good HA software should be designed to assume insecure networks • Not all HA software assumes insecure networks • Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication • Crossover cables are reasonably secure – all else is suspect ;-)

References • http://linux-ha.org/ • http://linux-ha.org/download/ • http://linux-ha.org/SuccessStories • http://linux-ha.org/Certifications • http://linux-ha.org/NewHeartbeatDesign • www.linux-mag.com/2003-11/availability_01.html

Legal Statements • IBM is a trademark of International Business Machines Corporation. • Linux is a registered trademark of Linus Torvalds. • Other company, product, and service names may be trademarks or service marks of others. • This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.

Linux-HA Release 2 - World-Class Open Source HA Software