Quantcast
Channel: SCN: Message List - SAP Replication Server
Viewing all articles
Browse latest Browse all 876

Re: Transaction log becomes full on primary

$
0
0

re: picture/description of how RS works ...

 

I'd probably start with the Design Guide and then the Admin guides.  The Troubleshooting guide may be of some help in matching an issue with how replication works.  Obviously (?) taking a class for Repserver DBAs would probably help if you can get the boss to free-up the resources to allow you to take such a class.

 

re: PDB log filling up ...

 

First some background ...

 

The repagent reads the PDB log and for every record if finds that's marked for replication it sends said record to the PRS.

 

The repagent maintains its own truncation point in the PDB log.  Periodically the PRS notifies the repagent that a group of *committed* transactions has been processed and that the repagent can move its truncation point in the log.  The repagent truncation point shows up in master..syslogshold with a name of '$replication_truncation_point' (sp?).

 

Just like a long-running transaction that has an open transaction can keep the PDB log from being truncated, the repagent truncation point keeps the log from being truncated past said truncation point (in both cases the trunc point provides a means of recovery from an outage).  Net result is that for a PDB you can have 2 different truncation points that keep the log from being truncated past said point.  [NOTE: RS 15.7.1 has introduced multipath replication which can include the use of multiple repagents in the PDB, and thus more multiple replication truncation points in the PDBs log ... all of which are capable of blocking the truncation of the log past their individual trunc points).

 

Like the PDB log, the RS queues have the concept of a truncation point which is based on the oldest open transaction.  So while transactions are normally applied to the DSI in commit order, the truncation of the RS queues can be delayed while waiting for a long-running/open transaction to complete in the PDB (and of course the sending of said transaction's commit/rollback marker from the PDB repagent).

 

The RS queues can also back up for reasons having to do with the DSI and/or RDB (eg, DSI suspended, RDS and/or RDB down, DSI connection is blocked in the RDB, RDB log is full, RDB processing is experiencing performance degradations).  If the RDB was loaded from a PDB dump and the DBA forgot to clear the repagent truncation point (it's brought along with the dump) then the RDB log can fill up due a static/non-moving repagent truncation point.  If the RDS is 15.x and is not configured to use statement cache (w/ literal autoparam=1) and the RRS is not configured to use dynamic SQL, then DSI/RDB query processing could be lagging due to the excessive overhead of compiling the thousands/millions of DML statements coming from the RRS. 

 

So in regard to your questions ...

 

You cannot truncate the PDB log beyond any truncation points you see in master..syslogshold ... whether it be a long-running user transaction or a repagent's replication truncation point.

 

The PDB log can fill up if the repagent's truncation point keeps the log from being truncated.

 

The PDB's repagent truncation point can fail to move if a) the repagent is down, b) the repserver is down (or the repserver's repagent thread is down/suspended), c) the repserver queues are full and thus have no room for accepting new log records from the repagent.

 

The RS queues could be full due to a) waiting for a long-running user txn to commit in the PDB, b) RS queues sized too small, c) down DSI thread, d) blocked DSI/RDB thread, e) slow/poorly-performing DSI/RDB connection, f) some other process filling the queue (eg, bulk materialization of one or more largish tables, etc).

 

If you're using a route (ie, PRS + one or more RRSs) then any route outage would cause the PRS queues to back up, too.

 

The DSI/RDB connection could be slow to drain the RS queues due to any of a number of performance reasons (eg, missing index on RDB table causing table scans for DELETE/UPDATE, triggers enabled and poorly performing triggers, blocking, lack of statement cache for RDS=ASE 15.x, etc).

 

While I rarely see it used, it's possible to configure route and DSI queues with a save interval whereby already-played txns are kept around for some  period of time in case they need to be re-applied to the downstream target (ie, RRS or DSI/RDB).  So if you have a (relatively) long save interval and the queues are too small ... then you can obviously fill up the RS queues ... which in turn can backup into the PDB log.

 

So, assuming no outages along the way (ie, repagent up, RS and threads up, DSI up, RDS/RDB up), good RS performance thus processing queues in a timely manner, RS queues are sized adequately, no large bulk materializations going on, and good performance for the DSI/RDB connection, I can think of 2 worst case scenarios:

 

1 - a full RDB log causes the RRS queues to fill up, which in turn causes the PRS queues to fill up, which in turn can causes the PDB log to fill up

 

2 - a long running user transaction in the PDB keeps the PRS from truncating its queues, the PRS queues fill up thus leaving no room for new txns to be sent by the PDB repagent, which in turn means the repagent cannot a) process/drain the PDB log and b) cannot move its truncation point in the PDB log, which means the PDB log fills up

 

Clear as mud? ;-)


Viewing all articles
Browse latest Browse all 876

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>