DM208 Global Peer-to-Peer Data Replication Utilizing Sybase Replication Server

DM208 Global Peer-to-Peer Data Replication Utilizing Sybase Replication Server • Mich Talebzadeh • Consultant • Peridale Systems, • mitch.talebzadeh@db.com

Full Familiarity with Sybase ASE Working knowledge of Sybase's Replication Server Products Knowledge of UNIX and shell scripting useful. Aimed At Audience with

Reasons for Peer-to-Peer Replication Project Issues to be Addressed Planning for a Replication System Issues to be Addressed Peer-to-Peer Replication Implementation Topics Covered

Tuning Replication Server for Better Performance Monitoring the Latency and Delivery of Data Use Replication Server to Measure the Volume of Replication Traffic on a Daily Basis Topics Covered

Need to Provide Business Solutions to a Community Scattered Around the Globe Limitations of Traditional Techniques Using a Central Database Repository in a Major Location with Applications Connected Remotely are: Limited Application Performance Because of the Geographical Distance Degradation in Network Response When the Traffic over the WAN Gets Heavy. For Example, Remote DSS Versus Local Transaction Inputs Reasons for Peer-to-Peer Replication

The Data Server Becomes a Bottleneck As a Larger Business Community Contends for Access Data Becomes Unavailable When There is a Network Failure. Impact on Maintenance Tasks When Remote Users Logged In. Reasons for Peer-to-Peer Replication

A typical third-party application may not access data in the most efficient way. For example, if the Application Makes a Large Number of Discreet Queries to the Database, Connection Latency between the Application and Data Server could Cause Start-Up Delays If the Application were to Crash, the System Becomes Unusable to Traders and Has an Unacceptable Business Impact. Reasons for Peer-to-Peer Replication

The Package Eligibility for Replication The Availability of Required Band-width The Technical Knowledge of the Package or Application to Be Replicated Local DBA Support for Sybase Replication Project Issues to Be addressed

Start A Pilot Project with One Way Replication Between Two Sites (Not to be confused with Sybase Warm Standby) Establish the volume of data growth and Deltas on a Daily Basis on the Replicated Database Establish the Bandwidth between the Two Replicate Sites. May have to Improve the Bandwidth Identifies External Feeders to the Database Planning For A Replication System

Best Option to set up the Pilot on the Local and the Remote Sites, otherwise use two different Servers Locally linked via a WAN Simulator Decide on What to Replicate Replicate Tables and Turn off Triggers for Replicated Transactions Not Replicate Tables and Let Triggers Insert the Records Replicate Stored Procedures Planning For A Replication System

If Table Replication Which Tables you are Replicating Does the Table Being Replicated Meets Criteria for Replication Does the Table Have Primary Key How is the Unique ID on the Table Generated Determine Rows and Columns to Be Replicated Issues to Be Addressed

Time Zones Dependency and Handling Time Stamps. Discuss and Agree with Business on How to Handle These. Remember Remote Locations Mean Different Time Zones. Serious Business Implication Managing Conflicts in Peer-to-Peer Replication Managing Inserts Conflicting Updates Let Us Take a Look at These Site Specific Solutions. Issues To Be Addressed

Managing Inserts A typical local table will include inserts from local application in addition to inserts delivered via replicated transactions. In a peer-to-peer set up both tables are bi-directional. Designers tend to use unique IDs to uniquely identify records in a table. Primary keys or unique clustered indexes are usually built on the unique ID. Issues To Be Addressed

The unique ID for a given table tends to be a monolithically incrementing number, a 32-bit integer, stored in a table, the so-called table_next_row_id and retrieved from this table. Inter site conflicts occur when rows are inserted in a local table and distributed to the remote table. If the remote table has already a record with the same unique ID, the replicated insert will be rejected, and the data at the remote table will be inconsistent with the local table. Possible solutions are: Issues To Be Addressed

Add a location key to the tables if not already there. Include the location key in the primary key for replicated tables. This is useful if the application is at the design stage. As replication implementation is normally an after thought, this approach may not be possible without a substantial change to database schema and code. Beware of Maintenance and Retrofitting. Localise the table_next_row_id and DO NOT replicate it. Allocate ranges for next_row_id column for each location. A 32-bit integer provides ability to store up to 2 billion unique values. For a lifetime of a typical application this range is far more than enough to cater for all sites. Issues To Be Addressed

For example, you can allocate the following ranges: Location Reserved range for next_row_id Column London 1-100,000,000 Hong Kong >100,000,000 and <= 150,000,000 Tokyo > 150,000,000 and <= 200,000,000 Issues To Be Addressed

Conflicting Updates The best way to handle conflicting updates from different sites is to construct the application and the environment so that conflicts cannot happen. However, in majority cases one needs to rely on business rules to reduce/eliminate the conflicts. These rules are application specific. For example for a trading system these could be: Issues To Be Addressed

On performing simultaneous new tradeon the same holding on different site, problem will occur on calculated field, such as quantity, P&L, Furthermore, there are no signals, which warn the users, when the problem occurs. The adopted solution is to recalculate these fields (nightly) so that the following day when the portfolio is loaded, they will have the correct figure. However, we have not yet encountered such problem. Issues To Be Addressed

On simultaneous update on the same order. This could happen due to a mistake. The business rule is whoever created the order should be the one who updates the order. If this happens the quantity (P&L) data will be out of synch. Again there will be no warning message to indicate this and it will be very difficult for IT support to detect it. The traders/ users will inform Application support that P&L or quantity is wrong. The Application support group needs to check the historical order and amend it appropriately. Once this is carried out, the correct details will be replicated to other sites and the databases will be in sync again. Issues To Be Addressed

Make Sure that the IT and Business Managers plus Application Support are Aware of the Fact that Technology is Not a Substitute for Business Rules and Ultimately they are Responsible For Consistency of the Data. Issues To Be Addressed

For Simplicity We Illustrate the Peer-to-Peer Implementation Among Three Sites. Although This Model Can be Expanded Further to Include Additional Sites We Have the Following: Peer-to-Peer Replication Implementation

Peer-to-Peer Replication Implementation

Peer-to-Peer Replication Set UP

Basics Create Replication Servers on all Three sites. Each Replication Server Will Handle the Local RSSD and the Local Database to Be Replicated Choose the ID Server for Replication Where the Replication and Support Knowledge is Highest Ideally All Sites Should Have Sybase “sa” and root Access to all Servers. As a Minimum the ID Server Site Should Have Access to All These Peer-to-Peer Replication Implementation

Create the Diagnostics Run Files for Replication Servers so the DBA May Observe Each Replicated Transaction Performed by the Server (Invaluable in Identifying Problems). This is Achieved by Replacing the Repserver Binary with the repserver.diag Binary in the Run File and Adding the Following Entry to the Replication Server .cfg File trace=DSI,DSI_CMD_DUMP Peer-to-Peer Replication Implementation

Create error class rs_sqlserver_error_class (default sql server error class) in ID replication server only. This will Handle Sql Server Errors in Replication Server. The default error action for all errors returned by Sql Server is to Stop Replication! You Can Assign Action to the Created Error Class etc. Very Important! Error Actions are Stored in Table rs_erroractions. Use rs_helperror error_no, v to Get Information About the Errors. Peer-to-Peer Replication Implementation

Turn off trigger settings For Each Replicated Database in the Repserver Controlling that Database. Use configure connection command with ‘dsi keep triggers’ option set to ‘off’. For example in lon_rep_server run the following command: configure connection to london_sql_server.db set dsi_keep_triggers to ‘off’ Create Interfaces Files with All Sql Server and Repserver Entries for All Sites. Ensure that the Application Servers also have All the Relevant Information for All Sql Servers Peer-to-Peer Replication Implementation

Creating Direct Routes In our triangle diagram, we need to create routes in order for our three replication servers to send messages to destination replication servers. A route is a one-way message stream that sends requests from one replication server to another, carrying data modification commands, replicated functions and stored procedures. In this design the routes are created as follows: Peer-to-Peer Replication Implementation

Route type Source destination Direct lon_rep_server hk_rep_server Direct lon_rep_server tyo_rep_server Direct hk_rep_server lon_rep_server Direct hk_rep_server tyo_rep_server Direct tyo_rep_server lon_rep_server Direct tyo_rep_server hk_rep_server Peer-to-Peer Replication Implementation

For example in lon_rep_server run the following command to create route to hk_rep_server and tyo_rep_server respectively: create route to hk_rep_server set username hk_rep_server_rsi set password hk_rep_server_rsi_ps create route to tyo_rep_server set username tyo_rep_server_rsi set password tyo_rep_server_rsi_ps Peer-to-Peer Replication Implementation

Where hk_rep_server_rsi is the RSI username already created by rs_init when you created the hk_rep_server. hk_rep_server_rsi_ps is the default password for such user etc. Use rs_helproute in any RSSD to check the status of the routes created. Keep all passwords below 30 characters. If you want to change password for RSI user etc, do so when creating the repserver. Makes life easier later. Peer-to-Peer Replication Implementation

Loading the Database to Be Replicated In order to perform the initial load of the database to be replicated, you may consider the following steps: Decide where you are going to load your initial database. In our case we chose London as the starting point. Otherwise you may have to merge databases. Requires a good migration plan dbcc the database and perform update statistics in London. Review all the primary keys for tables to be replicated. Peer-to-Peer Replication Implementation

Turn off all replication flags in the user tables. Use sp_setreptable table_name, false Do dbcc settrunc(ltm,ignore) on the database Dump transaction with truncate_only Dump database to the dump directory. Ftp the dump file to the remote servers. May consider zipping the file etc. On the BCP site load the database Peer-to-Peer Replication Implementation

Load the database from the dump file in remote locations Localise the so-called local tables. For example if you have table_next_row_id, set next_row_id column to the appropriate starting values for location etc Adding databases is quite straightforward. For example: Add the London Production database to the replication system using lon_rep_server Add the Hong Kong database to the replication system using hk_rep_server Add the Tokyo database to the replication system using tyo_rep_server Peer-to-Peer Replication Implementation

Creating Replication and Subscription Definitions All the Information you need is in the database to be replicated Use the Local Repserver and SQL Server to Create Replication Definitions for the Database to be Replicated You will need to extract information from system tables: sysobjects, syscolumns systypes and syskeys, provided that you have defined primary keys for your tables! Peer-to-Peer Replication Implementation

I have prepared shell scripts which will automatically generate replication definitions for all tables. This script takes the format: genrepdef.ksh -R <REP_SERVER_NAME> -S <SQL_SERVER_NAME> -D <DATABASE_NAME> where the parameters refer to the repserver controlling the database, the name of the SQL server and the database name respectively. In the same way, you can create subscription definitions using script gensubdef.ksh -R <REP_SERVER_NAME> -S <SUBSCRIBER_SERVER> -D <DATABASE_NAME> Peer-to-Peer Replication Implementation

Where the parameters refer to the repserver in which subscriptions are defined, the name of the SQL server which subscribes to the replication definitions and the database name respectively. Use activate.ksh, validate.ksh and checksub.ksh to complete the subscription process. Peer-to-Peer Replication Implementation

In summary for replications we have: Peer-to-Peer Replication Implementation

Likewise for subscriptions we have Peer-to-Peer Replication Implementation

At the end of subscription definitions you should have two subscription definitions for each replication definitions. In other words, doing sp_helpsub for each table should give you 3x2 subscriptions = 6 lines. This should be shown in any RSSD database. Use of Function Strings to Apply Local Timestamps Replication Server converts functions to commands for destination data servers, and submits them to these data servers. Peer-to-Peer Replication Implementation

For example, a new row inserted in the source table causes Replication Server to distribute an rs_insert function specific to that table to the subscriber databases. A possible solution for applying local timestamps at replicate database would be to modify rs_insert for a given source table to invoke an RPC at the destination database. The RPC in turn inserts local timestamp to the subscribed table as shown below: Peer-to-Peer Replication Implementation

alter function string tran_history_db_rd.rs_insert for rs_sqlserver_function_class output rpc 'execute ins_tran_history_sp @time_stamp = ?time_stamp!new?, @action = ?action!new?, @table_name = ?table_name!new?, @row_id = ?row_id!new?, @host_name = ?host_name!new?, @user_name = ?user_name!new?, @pid = ?pid!new?’ go Peer-to-Peer Replication Implementation

create procedure ins_tran_history_sp ( @time_stamp datetime = null, @action int = null, @table_name varchar(24) = null, @row_id int = null, @hos_tname varchar(24 ) = null, @user_name varchar(24) = null, @pid numeric(35) = null) as begin transaction insert into tran_history ( timestamp, action, table_name, row_id, host_name,user_name, pid) Peer-to-Peer Replication Implementation

values ( getdate(), @action, @table_name, @row_id, @hostname, @user_name, @pid) commit transaction go Peer-to-Peer Replication Implementation

There are some configuration parameters that can be altered in order to get better performance from the Replication Servers init_sqm_write_delay - stable queue manager waits for at least init_sqm_write_delay milliseconds for a block to fill before it writes the block to the correct queue on the stable device (default is 1000). Try decreasing this parameter. init_sqm_max_write_delay - a flush to the queue is guaranteed to happen after waiting for init_sqm_max_write_delay, if the DSI or RSI thread reading the queue is unable to connect to the target or has been suspended (default 10000). Decrease this parameter if required. Tuning Replication Server for Better Performance

sqt_max_cache_size - need to increase this value if there are a lot of open transactions and or large transactions. Memory for sqt_max_cache_size is taken from the global memory pool (default is 131072 bytes) batch_sz - the larger the batch_sz the less often the truncation point is updated (default 1000 commands). Try increasing this value Tuning Replication Server for Better Performance

It is a common practice for DBAs to set up a DBA specific table in the replicated database and try to check the latency and health of the replication system by updating data in this table. In its simplest form one can insert or update records in this table and see if the data is being replicated to the other sites. The time taken for data to get to the remote site will give an indication of latency. Monitoring the Latency and Delivery of Data

Although the above method can be used as a simple monitoring tool it suffers from the following drawbacks: It is not an indicator of whether the business transactions arrive in remote sites in a timely manner DBA table is normally used by one application, for example a cron job via a simple transaction. User applications create a larger number of transactions and the data delivery is impacted by the following: Monitoring the Latency and Delivery of Data

Concurrency Size of the user tables Locking mechanism employed by the application It should be noted here that the maintenance user trying to deliver the replicated data could be blocked by local users. If the statistics on the user tables are not current then the replicated data may take a longer time to be delivered resulting in remote users being blocked waiting for locks to be released. Monitoring the Latency and Delivery of Data

It is important to get a more realistic estimate of replication delivery. A possible solution would be to look at the entries in an audit or transact history table in the replicate database and check the delivery timestamp. By comparing the timestamps for records delivered and adjusting for servers clock difference, the latency can be estimated. Monitoring the Latency and Delivery of Data

DM208 Global Peer-to-Peer Data Replication Utilizing Sybase Replication Server