Quantcast
Channel: SCN: Message List - SAP Replication Server
Viewing all articles
Browse latest Browse all 876

Re: Delayed replication of 4 hours and no delay?

$
0
0

Mark A. Parsons wrote:

 

I thought I'd seen Luc Van der Veurst do a write-up on an environment where he'd replaced all of his warm standby configs with MSA configs ... and how he was able to perform his switches faster with MSA than with warm standby.  I can't recall/find where that post is (assuming I'm not imagining the write-up) so perhaps Luc will see this thread and jump in, eh ...

 

I mentioned it in a thread about admin quisce_force_rsi command earlier this year, but that thread has been deleted (http://scn.sap.com/message/14764886 ). I find it still difficult to find information on the sap website :-), I also thought I wrote something about it recently, but can't find it.

 

I posted the following on the ISUG SIG-Replication mailing list, in oktober 2012 :

 

____________________________________________________________________

 

I used to have 5 warm-standby pairs, one had 52 replicated databases.

I defined 4 replication servers for that one server, each replicating 13 databases, so that I could do a switch to standby in parallel.

 

It took some investigation to switch to standby as fast as possible because trying to do things in parallel was causing deadlocks in the rssd_db and then manual intervention was necessary, causing extra minutes downtime.

 

Putting sleeps in between commands increased the time to switch too much, so I examined what happened in the system tables and wrote a script that checks some status columns in a loop and continues when a certain value is reached.

 

Switching to standby took 2 to 3 minutes, nerve breaking minutes because there is always something that can go wrong.

    

About 2 years ago, I replaced all warm-standby pairs by MSA bi-directional replication.

We still use it in an active-passive setup, so there are no conflicts that could raise like in an active-active situation.

 

Switching to standby now takes at most 9 seconds, and nothing can go wrong since there is no switch command that needs to be executed.

 

The actions that are performed during a switch are :

 

- locking the users on the active server

- remove the logical ip address from the active server

- kill all user connections on the active server

- insert a row in a table in all replicated databases in the active server and wait until the rows arrived at the standby server

- unlock the users on the standby server

- add the logical ip address to the standby server, now becoming the active server.

 

Almost all 9 seconds go to the test that all statements in the queues are replicated.

(I also do this test before the switch so that the replication server has a connection to all databases and therer is no delay in replicating).

 

The 9 seconds switch time are for the server with 52 replicated databases.

Switching a server with only 10 replicated databases takes about 6 seconds.

 

Our application servers reconnect to the database server automatically, so a user who doesn't do a database request within those 9
seconds doesn't notice that database servers were switched.

I work in a hospital where we have 4 time windows of 3 hours per year to do maintenance.

If a host needs maintenance, I can now make the host free in seconds, so this can be done without warning the users, thanks to MSA.

 

There are some disadvantages.

    

- There are more statements to set it up.

 

- Replication definitions, necessary to define a primary key, now require all columns, so each time you alter a table and add/change/modify a column, you also need to modify the replication definitions (you need 2, one at both sides)

 

- We have DML replication set on, so I have to be carefull to set replication off when I want to do some maintenance on the standby server

 

- We also have table replication from one msa-pair to another, this also requires more setup since after a switch, the server receives last commit information from another server. To adderss this, I copy rs_lastcommit information from active to standby during the switch process.

 

- ...

 

But I value the advantages much higher :

 

- faster switch, without stress :-), since basically, it's just an IP adres that needs to be moved to another system.

 

- disadvantage of having to define all columns in a replication definition, becomes an advantage when you want to remove columns from a table (that can be done at the standby site first after removing the columns from the replication definition)

 

- and it's possible to add a 3rd (4th, ...) server to the setup which makes it possible to test new versions of ASE with the ability to return to the previous version without breaking your MSA setup.

 

Our current situation is that we have 6 pairs of 2 12.5.4 ASE servers.

We have 2 datacenters and each pair has one server running in each datacenter.
To each pair, I added a 3rd 15.7 server with bi-directional replication to the other 2.

 

Any of the 3 can become the active server within at most 9 seconds and will then replicate to the other 2.

 

_________________________________________________________________________________

 

I'm still happy with this setup :-).

 

We don't have automatical fail-over (we also didn't have it when we were using warm-standby).

There are too many unknown factors: is the server really down ? what's the status of the replication system ? will there be data loss ? ...

We prefer to make the decission wether to restart the server on the same host, or to switch to the standby system ourselves.

 

 

Luc.


Viewing all articles
Browse latest Browse all 876

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>