Friday, 19 April 2013

Setting up Enqueue Replication Server Failover

Summary
This document contains  steps to be performed to install, configuring and testing  an ERS instance failover.
Applies to NetWeaver Web AS Java 2004s onwards (Unix system)



Introduction


The standalone enqueue server (SAP Central Services - SCS) is used in NetWeaver Web AS Java to provide a locking service based on the enqueue function. The enqueue clients (SAP application servers) and the enqueue server communicate directly, that is, the work process has a TCP connection to the enqueue server. They no longer communicate via the dispatchers and the message server.
The enqueue server keeps critical data (that is, all locks currently in use by users in the system) in the lock table in the main memory. If the host fails, this data is lost and cannot be restored even when the enqueue server is restarted. All transactions that have held locks must therefore be reset.
For this reason, the enqueue replication server (ERS) is started on another machine which together with the standalone enqueue server (SCS) provides a high availability solution.
This document contains all the steps to be performed for preparing, configuring, testing and trouble-shooting an ERS instance.




1. Preparation



Profile Parameters for the Standalone Enqueue Server (SCS)


Activate replication by setting parameter enque/server/replication = true in the in the instance profile of the standalone enqueue server (<SID>_SCS<Instance_no>_hostname).

Profile Parameters for the Enqueue Clients


Set the parameter enque/deque_wait_answer = TRUE for the enqueue clients (application server instances) in the default profile.
The parameter enque/deque_wait_answer determines whether dequeue (removal of locks) is done synchronously or asynchronously. The parameter can have the following values:
TRUE:  Waits for response from the enqueue server (synchronous)
FALSE: Does not wait for response (asynchronous)
  
 

2. Setting Up the Replication Server


Under administrator user <sid>adm perform the following steps on the both physical servers.

Identify Executable Directory

  • The directory determined by DIR_CT_RUN for the system with DIR_CT_RUN in the SCS instance profile. The DIR_CT_RUN for the ERS instance profile to be created should then be the same as for the SCS instance.
  • The directory determined by DIR_EXECUTABLE for the system without DIR_CT_RUN in the SCS instance profile. The DIR_CT_RUN for the ERS instance profile to be created should then be the DIR_EXECUTABLE of the SCS instance.

Create directory structure


Create the following directory structure on the enqueue replication server (ERS):

/usr/sap/<SID>/ERS<inst.no>
                  +--- exe
                  |     +--- servicehttp
                  |               +----- sapmc
                  +--- log
                  +--- data
                  +--- work
Where:
  • ERS: prefix for the enqueue replication server
  • <SID>: System ID of the SAP system to which the new instance belongs
  • <inst.no>: Instance number of the new instance (here 11)

Copy executable files

  1. Copy the following files from the executable directory /sapmnt/<SID>/exe into directory /usr/sap/<SID>/ERS11/exe:
  • enqt
  • enrepserver
  • ensmon
  • libicudata.so.30
  • libicui18n.so.30
  • libicuuc.so.30
  • libsapu16_mt.so
  • librfcum.so
  • sapcpe
  • sapstart
  • sapstartsrv
  • sapcontrol

The files can have different extensions on different UNIX platforms. Depending on the platform and whether Unicode is used, there may not be as many files.

  1. Copy the following files from the executable directory /sapmnt/<SID>/exe/servicehttp/sapmc into directory /usr/sap/<SID>/ERS11/exe/servicehttp/sapmc:
  • sapmc.jar
  • sapmc.html
  • frog.jar
  • soapclient.jar

  1. Create a sapcpe list file with the name ers.lst in directory /usr/sap/<SID>/ERS11/exe. This file has the following content:

enrepserver
ensmon
enqt
libsapu16_mt.so
libsapu16.so
libicuuc30.a
libicui18n30.a
libicudata30.a
  1. librfcum.o
sapcpe
sapstartsrv
sapstart
sapcontrol
servicehttp




Create ERS Start & Instance profiles




  1. Create a new start profile (not a symbolic link to a common file – it does not work) for the ERS instance in the profile directory.

The SCS instance could also have the following parameters:
SAPSYSTEMNAME = <SID>
SAPSYSTEM = <Instance_no>
INSTANCE_NAME = SCS<Instance_no>
SAPLOCALHOST = <SCS_Host_name >

The corresponding replication instance might look like this:
ERS SAPSID         = <SID>
ERS INST.NO.     = 11
ERS HOST             =  ERS_host_name



The corresponding start profile START_ERS11_<Host_name> will look like:

SAPSYSTEM = <Instance_no>
SAPSYSTEMNAME = <SID>
INSTANCE_NAME = ERS11

#--------------------------------------------------------------------
# Special settings for this manually set up instance
#--------------------------------------------------------------------
SCSID = <Instance_no>
DIR_EXECUTABLE = $(DIR_INSTANCE)/exe
DIR_CT_RUN = /usr/sap/<SID>/SYS/exe/run
SETENV_00 = PATH=$(DIR_INSTANCE)/exe:%(PATH)
SETENV_01 = LD_LIBRARY_PATH=$(DIR_EXECUTABLE)
_PF = $(DIR_PROFILE)/<SID>_ERS11_<Host_name>

#-----------------------------------------------------------------------
# Copy SAP Executables
#-----------------------------------------------------------------------
_CPARG0 = list:$(DIR_EXECUTABLE)/ers.lst
Execute_00 = immediate $(DIR_EXECUTABLE)/sapcpe$(FT_EXE) $(_CPARG0) pf=$(_PF)

#--------------------------------------------------------------------
# start enqueue replication server
#--------------------------------------------------------------------
_ER = er.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_01 = immediate rm -f $(_ER)
Execute_02 = local ln -s -f $(DIR_EXECUTABLE)/enrepserver $(_ER)
Restart_Program_00 = local $(_ER) pf=$(_PF) NR=$(SCSID)


If you are using a local profile directory  insert the following profile parameter into the start profile of the ERS instance: DIR_PROFILE = $(DIR_INSTANCE)/profile



2. Create an instance profile (not a symbolic link to a common file – it does not work) in the profile directory.
Parameters for the SCS instance (example):
SCS SAPSID               = <SID>
SCS INST.NO.           = 04
SCS HOST  = <SCS_Host_name >

Parameters for the replication instance (example):
ERS SAPSID               = <SID>
ERS INST.NO.           = 11
ERS HOST  = <ershostname>

Associated instance profile <SID>_ERS11_ershostname:
SAPSYSTEM = 11
SAPSYSTEMNAME = <SID>
INSTANCE_NAME = ERS11

#--------------------------------------------------------------------
# Special settings for this manually set up instance
#--------------------------------------------------------------------
DIR_EXECUTABLE = $(DIR_INSTANCE)/exe
DIR_CT_RUN = /usr/sap/<SID>/SYS/exe/run

#--------------------------------------------------------------------
# Settings for enqueue monitoring tools (enqt, ensmon)
#--------------------------------------------------------------------
enque/process_location = REMOTESA
rdisp/enqname = $(rdisp/myname)

#--------------------------------------------------------------------
# standalone enqueue details from (A)SCS instance
#--------------------------------------------------------------------
SCSID = 04
SCSHOST = <scshostname>
enque/serverinst = $(SCSID)
enque/serverhost = $(SCSHOST)


If you are using a local profile directory insert the following profile parameter into the instance profile of the ERS instance: DIR_PROFILE = $(DIR_INSTANCE)/profile


Configure the control mechanism for the replication server


The following options are available:

  1. Self-control with HA Polling
Here the replication server uses the HA software to periodically request information about the physical host on which the SCS instance is running. Depending on this information the ERS instance is activated or deactivated. To do this a tool (a script or a library from the HA hardware partner) is needed.

If you use an enqtest.sh script in directory DIR_EXECUTABLE you must insert the following lines in the instance profile.
#--------------------------------------------------------------------
# HA polling
#--------------------------------------------------------------------
enque/enrep/hafunc_implementation = script
enque/enrep/poll_interval = 10000
enque/enrep/hafunc_init =
enque/enrep/hafunc_check = $(DIR_EXECUTABLE)/enqtest.sh


  1. HA Software Control
With this solution the HA software will start the ERS instance, whenever required. The monitoring of the ERS instance and subsequent, if required, is handled by HA software.


Configure the persistency of the replication table in the file system


This has the advantage that the replication table can be distributed across several hosts by means of the cluster software (shared file system), and that following a failover the enqueue server does not necessarily have to be restarted on the same host on which the active replication was running beforehand.

To also save the replication table in a shadow file in the file system, insert the following lines in the instance profile:

#--------------------------------------------------------------------
# replica table file persistency
#--------------------------------------------------------------------
enque/repl/shadow_file_name = /usr/sap/<SID>/ERS11/data/SHM_FILESYS_BACKUP


Note: Storing the replication table in the file system can lead to a severe drop in enqueue server performance.  Beforehand, you should always check whether performance will be sufficient with this option.

Parameter for preventing automatic ERS restart


It was noticed that, on SCS failover, the SCS process would read the replication table and terminate the ERS process, as expected. However, within 15 seconds, the ERS process would restart on its own. Based on an analysis of the process ids, it was concluded that the ERS process was being restarted by the sapstart process.

If a program managed by sapstart is restarted within 10 minutes, an internal counter is incremented. By default, sapstart no longer starts the program as soon as this counter is larger than 5 . This value can also be changed using the parameter 'Max_Restart_Program=xx' where xx represents the number of restarts.

To prevent this automatic restart of ERS, the parameter Max_Restart_Program=00 was added to the ERS start profile parameter.


Start the ERS instance



Start the ERS server with below command
<sid>adm$ startsap ERS11

Check the startup log with the command
<sid>adm$ cat /home/<SID>adm/startsap_ERS11.log

Check the ERS processes at the OS level
<sid>adm$ grep –ef| grep –I ERS11
It should show you some processes .

Setup automatic start of ERS instance


To start the ERS instance automatically when the system is rebooted, insert the following line in file /usr/sap/sapservices based on the UNIX shell used by user root:
setenv LIBPATH /usr/sap/<SID>/ERS11/exe:$LIBPATH;        (CSH)
LIBPATH=/usr/sap/<SID>/ERS11/exe:$LIBPATH; export LIBPATH;                                (BSH)
/usr/sap/<SID>/ERS11/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/START_ERS11_<ershostname> -D -u <SID>adm

This only works if:

NetWeaver AS version 7.00 has already been installed on this host
(Under UNIX) you have already performed the steps described in SAP Note 823941 (Configuring SAP Start Services as UNIX Daemons)

Repeat these steps for all the physical hosts in the HA failover cluster. If you want to make another SCS instance more fail-safe, you have to set up a separate set of ERS instances.




3. Checking the Installation (Replication Server)


Once the Replication Server has been setup, check that it functions properly to be sure that the replication server will work correctly if the enqueue server fails. The following tests are performed on the host where the replication server is running.

Check the status of connection between the enqueue server and the replication server

The SCS instance of the SAP system has been started. Start program ensmon on the host on which the replication server is installed. To determine the replication server enter the following command:
ensmon pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11_<hostname>

If your ERS replication is running this connand will give below information

  • General information (configuration info)
  • Runtime statistics for replication thread:
  • Information from the replication server  side:
  • replication table statistics:
  • request statistics:
  • Network parameter and statistics:

If the connection is OK, the output would look like:
Try to connect to host <Virtual (A)SCS host> service sapdp01 get replinfo request executed successfully
Replication is enabled in server, repl. server is connected
Replication is active
...


If the connection is not OK, the output would look like:
Try to connect to host <Virtual (A)SCS host> service sapdp01 get replinfo request executed successfully
Replication is enabled in server, but no repl. server is connected
...


If the connection is not ok, first check whether the replication server has been started at all (using the operating system or the cluster software.)

If the replication server has been started, check files dev_enqrepl on the enqueue server or dev_enrepsrv on the replication server (in the work directory of the SCS or ERS instance). Use the error messages and profile files here to narrow down the cause of the problem.

Monitoring the Lock Table at Failover


Use program enqt to check the fill level of the lock table and the failover ID. The SCS instance of the SAP system has been started.

Start program enqt on the server on which the replication server is installed. Then use the enqt options described here. Otherwise you could damage the content of the lock table.

Monitoring the Fill Level of the Lock Table at Failover

...
  1. 1. Enter the command below to fill the lock table of the enqueue server with 20 locks: enqtpf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11__<hostname> 11 20


  1. Monitor the fill level of the lock table by executing the command:
enqt pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11_<hostname> 20 1 1 9999

This command permanently reads the content of the lock table and shows the number of lock entries on the
  1. console.


Monitoring Lock Table ID at Failover

...
  1. Enter the following command on the ERS host to output the lock table ID before the failover:
enqt pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11__<hostname> 97


  1. Trigger a failover of the SCS instance.

  1. Enter the command

enqt pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11__<hostname> 97

The output for the row containing EnqTabCreaTime/RandomNumber should be exactly same before and after failover should be different .


  1. Make sure that the lock table ID is the same before and after the failover. If it isn’t, the replica has not been copied .



4. Commands to check status of ERS/SCS instances


To monitor the status of ERS/SCS instances, the commands described below may be used (for example in HA scripts):

ERS Commands


To determine the replication server enter the following command:
  ensmon pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11_<hostname> 2
If replication server is active and connected to SCS, the following message is displayed in the first few lines of the output:

  Replication is enabled in server, repl. server is connected
  Replication is active

The status may also be checked with the command "startsap check", however, this command seems to check only the process at the OS level, not for the functioning of the process.

SCS Commands


Send a dummy request to the server, to check if it is alive:

  ensmon –H < scshostname> pf=/sapmnt/<SID>/profile/<SID>_SCS04_< scshostname>  1
Last line of output contains the message "Dummy request executed successfully with rc=0

The status may also be checked with the command "startsap check", however, this command seems to check only the process at the OS level, not for the functioning of the process.







5. Log Files


Enqueue Server


You can find the following information in the enqueue server files:

  • dev_enqsrv: This file is written only when the enqueue server is started up. All problems occurring when the enqueue server is started (for example, when the replica is read or the lock table created) are analyzed with this file.
  • dev_enqio_*: The threads that handle communication with enqueue clients write to this file (there may be more than one file - they are numbered sequentially).
  • dev_enqwork: Problems arising from the actual enqueue processing are written to this file.
  • dev_enqrepl: Communication with the replication server is set up from this thread; replication problem messages can be found here.
  • dev_enqsig: This file does not exist in Windows. Here the processing of asynchronous signals is recorded by the operating system. Shutdowns triggered from an external source can be found in this file.

There may be further dev* files that are not usually important. You should deliver these files to support anyway when you open a problem message.


Replication Server


You can find the following information in the replication server files:

  • dev_enrepsrv: Almost all components of the replication server that do not write to one of the files specified below write to this file.
  • dev_enrepsig: As with the enqueue server this file does not exist in Windows. Asynchronous signals from the operating system are processed here too.
  • dev_enrepha: All events of HA pollings (more information: Polling Concept) are written to this file.

  

No comments:

Post a Comment