High Availability and Fault-Tolerance in Real-Time Databases Jan Lindström University of Helsinki Department of Computer Science
Overview • The causes of the downtime • Availability solutions • CASE 1: Clustra • CASE 2: TelORB • CASE 3: RODAIN
The Causes of Downtime • Planned downtime • Hardware expansion • Database software upgrades • Operating system upgrades • Unplanned downtime • Hardware failure • OS failure • Database software bugs • Power failure • Disaster • Human error
Traditional Availability Solutions • Replication • Failover • Primary restart
CASE 1: Clustra • Developed for telephony applications such as mobility management and intelligent networks. • Relational database with location and replication transparency. • Real-Time data locked in main memory and API provides precompiled transactions. • NOT a Real-Time Database !
How Clustra Handles Failures • Real-Time failover: Hot-standby data is up to date, so failover occurs in milliseconds. • Automatic restart and takeback: Restart of the failed node and takeback of operations is automatic, and again transparent to users and operators. • Self-repair: If a node fails completely, data is copied from the complementary node to standby. This is also automatic and transparent. • Limited failure effects
How Clustra Handles Upgades • Hardware, operating system, and database software upgrades without ever going down. • Process called “rolling upgrade” • I.e. required changes are performed node by node. • Each node upgraded to catch up to the status of complementary node. • When this is completed, the operation is performed to next node.
CASE 2: TelORB Characteristics • Very high availability (HA), robustness implemented in SW • (soft) Real Time • Scalability by using loosely coupled processors Openness • Hardware: Intel/Pentium • Language: C++, Java • Interoperability: CORBA/IIOP, TCP/IP, Java RMI • 3:rd party SW: Java
TelORB Availability • Real-time object-oriented DBMS supporting • Distributed Transactions • ACID properties expected from a DBMS • Data Replication (providing redundancy) • Network Redundancy • Software Configuration Control • Automatic restart of processes that originally executed on a faulty processor on the ones that are working • Self healing • In service upgrade of software with no disturbance to operation • Hot replacement of faulty processors
reloading Automatic Reconfiguration
Software upgrade • Smooth software upgrade when old and new version of same process can coexist • Possibility for application to arrange for state transfer between old and new static process (unless important states aren’t already stored in the database)
17 17 22 22 A 18 18 21 21 20 20 19 19 B A B Partioning: Types and Data
Advantages • Standard interfaces through Corba • Standard languages: C++, Java • Based on commercial hardware • (Soft) Real-time OS • Fault tolerance implemented in software • Fully scalable architecture • Includes powerful middleware: A database management system and functions for software management • Fully compatible simulated environment for development on Unix/Linux/NT workstations
CASE 3: RODAIN • Real-Time Object-Oriented Database Architechture for Intelligent Networks • Real-Time Main-Memory Database System • Runs on Real-Time OS: Chorus/ClassiX (and Linux)
shared disk Rodain Database Node Database Primary Unit User Request Interpreter Subsystem Object- Oriented Database Management Subsystem Watchdog Subsystem Distributed Database Subsystem Fault-Tolerance and Recovery Subsystem Database Mirror Unit Fault-Tolerance and Recovery Subsystem Object- Oriented Database Management Subsystem Distributed Database Subsystem Watchdog Subsystem User Request Interpreter Subsystem
shared disk RODAIN Database Node II Database Primary Unit User Request Interpreter Subsystem Object- Oriented Database Management Subsystem Watchdog Subsystem Distributed Database Subsystem Fault-Tolerance and Recovery Subsystem Database Mirror Unit Fault-Tolerance and Recovery Subsystem Object- Oriented Database Management Subsystem Distributed Database Subsystem Watchdog Subsystem User Request Interpreter Subsystem
ORD Architechture Index OCC Data TRP ORD DDS FTRS
Fault-Tolerance • Based on logs and mirroring • Logs send to Mirror • Mirror stores the logs on disk in SSS • Mirror maintains copy of main-memory database • Mirror makes disk copies of its database image
Recovery • Based on role switching • When Primary fails • Mirror updates its MMDB up to date • Mirror starts acting as new Primary • Active transactions are restarted or lost • When Mirror fails • Primary stores logs directly to SSS
Recovery II • During recovery the failed Node • always starts as a mirror node • loads most recent database image from disks in SSS • updates the log tail to loaded image • receives the logs from primary node • continues as normal mirror node
Further reading • Bratsberg, Humborstad: Online Scaling in a Highly Available Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp 451-460, 2001. • Clustra Database: Technical Overview, http://www.clustra.com • Björnerstedt, Ketoja, Sintorn, Sköld: Replication between Geographically Separated Clusters - An Asynchronous Scalable Replication Mechanism for Very High Availability, Proceedings of the International Workshop on Databases in Telecommunications II, LNCS vol 2209, pp. 102-115, 2001. • Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Main-Memory Database for Telecommunications, Proceedings of the International Workshop on Databases in Telecommunications, LNCS vol 1819, pp 158-173, 2000.