290 likes | 390 Vues
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication. Linas Virbalas and Alex Alexander Continuent, Inc. Introductions PG 9 Hot Standby and Streaming Replication What is Tungsten? Tungsten + PostgreSQL Hot Standby and Streaming Replication Demo!
E N D
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication Linas Virbalas and Alex Alexander Continuent, Inc.
Introductions PG 9 Hot Standby and Streaming Replication What is Tungsten? Tungsten + PostgreSQL Hot Standbyand Streaming Replication Demo! Questions and Comments Agenda
About Continuent Our Business: Continuous Data Availability Our Solution Continuent Tungsten (Master/Slave Database Replication) Our Value: Ensure data are available when and where you need them TCO less than 20% of comparable solutions Our Technical Expertise Database replication Database cluster management Application connectivity Software-as-a-Service (SaaS)
PostgreSQL 8.4 Warm Standby Master Standby PostgreSQL PostgreSQL Continuous recovery rsync to standby WAL Files Archived WAL Files WAL Files pg_xlogs Directory Archive Directory pg_xlogs Directory pg_standby
Limitations of Warm Standby Utilization -- Cannot open the standby To bring up the standby for queries you must end recovery Standby hardware is idle Difficult to track state of recovery since you cannot query log position Data Loss -- Warm standby transfers only full WAL files Can bound loss using archive_timeout Low values create large numbers of WAL files; complicate point-in-time recovery Workarounds using DRBD, etc. are complex
Introducing Hot Standby Allows users to connect to standby in read-only mode Allowed: SELECT, SET, LOAD, COMMIT/ROLLBACK Disallowed: INSERT, UPDATE, DELETE, CREATE, 2PC, SELECT … FOR SHARE/UPDATE, nextval(), LISTEN, LOCK, No admin commands: ANALYZE, VACUUM, REINDEX, GRANT Can come out of recovery while queries are running Thanks to Simon Riggs for this description
Introducing Log Streaming Master Standby PostgreSQL PostgreSQL Recovery WAL Sender WAL Receiver Archiving Archived WAL Files Continuous replication to standby Archive Directory
Configuration and Usage Log streaming layers on top of existing warm standby log shipping Multiple standby servers allowed Failure of one standby does not affect others Management is not simple - must coordinate provisioning & WAL shipping to set up/restart
What Is Tungsten? Tungsten implements master/slave clusters to: Protect data Maintain high availability Improve resource utilization Raise performance Install and set up in a few minutes Integrated backup/restore and data integrity checks Efficient failover operations Distributed, rule-driven management No/minimal application changes Highly pluggable No specialized hardware requirements
What’s Inside Tungsten? Replication - Making copies Tungsten Replicator -- Database-neutral, platform independent master/slave replication Connectivity -- Finding databases Tungsten Connector -- Fast MySQL/PostgreSQL client to JDBC proxying Tungsten SQL Router --JDBC wrapper for high-performance and transparent failover, load-balancing, and partitioning (no proxy required) Management -- Administering the database Tungsten Manager -- Distributed administration with autonomic, rule-based configuration and no single point of failure Tungsten Monitor -- Track resource status and
Tungsten Clustering In Action Application Server Application Server SQL Router/Connector SQL Router/Connector Replicator Monitor Manager Management Client Management Client Manager Manager Replicator Monitor Manager Master DB Slave DB Master Host Slave Host
Distributed Rule-Based Management Local Services Local Services Local Services Business Rules Admin Client Manager Group Communications Broadcast commands and monitoring data Manager (Coordinator) Admin Client Manager Admin Client
Multiple Routes to Databases Java App Server PHP Application Tungsten SQL Router libpq.a PostgreSQL JDBC Driver Admin & Monitoring Admin & Monitoring Virtual IP Address Tungsten Connector Tungsten Cluster
Moving Tungsten to PostgreSQL Problem: We can’t read PostgreSQL logs (yet) Tungsten solution is to manage: Warm Standby + WAL Shipping(for PostgreSQL <9) Good basic availability/fast failover Slaves open up for reads only after failover No load balancing/scaling Hot Standby + Streaming Replication(for PostgreSQL >=9) Slaves opened up for reads = Tungsten scaling facilities work Add Streaming Replication = minimal delay in replicating data A fully fledged clustering solution
What is Tungsten’s Added Value To PostgreSQL? 15 minute cluster installation Single commands to: View cluster status Provision a new standby Confirm liveness of replication Switch servers safely for maintenance Failover a dead server to most current replica Automatic discovery of new database replicas Automatic failover when databases fail Simple procedures for provisioning Transparent application routing Easy scaling
Streaming Replication Setup (By Hand) Configure master postgresql.conf and reboot archive_mode = on max_wal_senders = 10 recovery_connections = on archive_command =‘rsync -cz $1 ${STANDBY}:${PGHOME}/archive/$2 %p %f' archive_timeout = 60 Set up standby recovery.conf standby_mode = 'on’ primary_conninfo = 'host=${MASTER} port=5432 user=postgres’ trigger_file = '/tmp/pgsql_stop_recovery' Provision standby psql# select pg_switch_xlog(); psql# select pg_xlogfile_name(pg_start_backup('base_backup')); rsync –azv --delete --exclude=*pg_xlog*--exclude=postgresql.conf ${PGHOME}/ $STANDBY:$PGHOME/archive psql# select pg_xlogfile_name(pg_stop_backup()); Start standby, recovery starts
Manual Failover (By Hand) Standby’s postgresql.conf should be prepared from the start to act as a master when needed: archive_mode = on max_wal_senders = 10 Touch /tmp/pgsql_stop_recovery on a standby to snap out of recovery mode Wait for recovery to finish ERROR: recovery is in progress …LOG: database system is ready to accept connections Reroute applications to the new master And to return to original cluster’s state: Fix the failed master. Bring it online as a standby Switch master with standby roles
Gotchas (1/2) Wait for standby to finish initial recovery before routing applications to it “psql: FATAL: the database system is starting up” – bad check for that. E.g.: try under French locale After a switch/failover, need to wait for the new master to come up from recovery too rsync of master data folder might return a non-zero exit code: 24 - “Partial transfer due to vanished source files” Exclude pg_xlog, pg_log and postgresql.conf from rsync! Careful: queries might get canceled on a standby! (E.g. DROP TABLE on a master doesn’t wait for SELECT queries to finish on a standby)
Captchas (2/2) Need to know current progress? Use pg_current_xlog_location() on a master(“ERROR: recovery is in progress" if used on a standby),pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on a standby. Using pg_standby? Ensure it is available in the path for the user you’re running If it’s for the root, ‘sudo which pg_standby’ is not enough! Running Debian? Check for PostgreSQL status before starting it (it fails, if server is already running): sudo /etc/init.d/postgresql-8.4 startStarting PostgreSQL 8.4:pg_ctl: another server might be running; trying to start server anywaypg_ctl: could not start server Etc.
Ultimately Hot Standby & Streaming Replication is awesome, but Set-up/management is harder than it looks. Monitoring is critical. Transparent application rerouting is essential.
How Tungsten covers all this? Tungsten Manager Replicator JMX Interface Monitor Replication State Model DBMS Checker Plugin Open Script Plugin pg_dump/ pg_restore Plug-In Backup Storage Plugin Pg-wal Scripts DBMS postgresql.conf recovery.conf pg_standby rsync
Contact Information HQ and Americas 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel (866) 998-3642 Fax (408) 668-1009 e-mail: alex.alexander@continuent.com, linas.virbalas@continuent.com EMEA and APAC Lars Sonckin kaari 1602600 Espoo, FinlandTel +358 50 517 9059Fax +358 9 863 0060 Continuent Web Site: http://www.continuent.com