Procedure
1) Check if the endpoints are reachable (validate service)
2) Check if the container is running
If there are problems, the best is to move to another node:
1) Warn users about the problem
2) Check if a spare machine is ready to production (validate endpoint other machine)
3) Perform the DNS switch
4) Notice users the service is back
Validate service
http://cern.ch/phydb/documents/old/Validate_service_.php
DNS Switch
http://phydb.web.cern.ch/phydb/documents/old/DNS_switch.php
Using command line tools
This operations can be done logged in the machine as oracle user.
dcmctl start -co VO
dcmctl stop -co VO
dcmctl start -ct ohs
- List Applications of a VO
dcmctl listapplications -co VO
- Get status of the containers
dcmctl getstate -d -v
nohup Xvfb :1 -pn &
export DISPLAY=localhost:1
emctl start
sudo /etc/init.d/oracle_ias
stop
There will be no services running if there are no opmn
and java or httpd entries as result of command:
ps -aux
To stop everything, start to stop EM:
emctl stop
and then:
dcmctl stop
opmnctl stopall
If not enough, just do:
sudo kill -9 opmn
sudo kill -9 java
sudo kill -9 http
- Start the service manually
- Log in as oracle
- Check that all the processes linked to the Application server are not running
- Start the X virtual Frame buffer
- $ nohup Xvfb :1 -pn &
- $ export DISPLAY=localhost:1
- Start the Enterprise Manager
- Go to http://<machine_name>:1810

- Start the HTTP server component
- Start the component with the name of VO
- Stop the service manually
- Changing the IP address of a running 9iAS instance
RLS Problem Diagnosis
In order to diagnose the problem, the following logs are produced:
- Logs for the applications are put into the
/ORA/dbs00/oracle/log/ias/j2ee/$VO/
directory. Here the $VO is the same as the container name
used in 9iAS, i.e. cms, atlas, lhcb,
etc. Inside those directories there are two logs per application PER
DAY (i.e. edg-replica-location-index, edg-replica-metadata-catalog,
edg-local-replica-catalog)
- application-name_log: the debug logs are
placed here. The debug log will be not too verbose if debug logging is
not enabled. See the intervention
on changing the logging level for more details.
- application-name_calls The call log simply
logs each call that was executed with the proper time and the caller
host.
- The daily logs of previous days are simply appended by the date, YYYY-MM-DD.
The log files with no date are the logs of the current day.
- OC4j Server logs are placed in a directory below
/ORA/dbs01/oracle/product/ias9.0.3.0/j2ee/home/log/default_island_1/
The logs are
- default-web-access.log: shows all accesses to the
application server, along with response codes and number of bytes returned.
- server.log: contains information on the history
of interventions on the container.
- The logs for opmn are placed in the directory
/ORA/dbs01/oracle/product/ias9.0.3.0/opmn/logs/
The relevant logs are in the files
- ons.log
- ipm.log
- $VO.default_island.1 This contains
the standard output from the application server. If logging is not
configured correctly, log messages may end up here!
Dataguard for RLS
http://cern.ch/phydb/documents/old/Data_Guard.html
Change logging level of Oracle9iAS
The logging properties can be changed during run-time without shutting down
or restarting the service. The log properties are reloaded automatically when
a change in configuration was detected.
Logging properties file locations
The applications have a log4j properties file residing at
/ORA/dbs00/oracle/log/ias/j2ee/$VO/<application-shorthand>-log4j-server.properties
where application-shorthand can be rmc, lrc, rli.
The logs are being produced in the very same directory.
There are two logfiles per application,
<application>_log.DATE
<application>_calls.DATE
where application can be one of edg-local-replica catalog, edg-replica-metadata-catalog and edg-replica-location-index.
The DATE is appended for previous days, the current log has no DATE extension.
The call log simply lists the time and date of each call that
has been made and the log has all the detailed information
if DEBUG logging is enabled.
Changing the log level
The level of detail can be tuned by editing the log4j properties file and
setting the log levels for the different domains. The relevant domain for the
application is set using the
log4j.logger.org.edg.data
property. No restart is necessary, the log4j log files are being reloaded automatically
when a change is detected. By default, the log level is set to INFO.
Set the log level to DEBUG for more verbose logging in the
application log (_log).
Reconfigure datasource using a script
This document describes how to change the database backend that a deployed application uses using an script.
Index
- Getting the script files.
- Using the script
1. Obtaining and preparing the script files
The scripts are in CERN's CVS system in lcg-orat1 project under the Applications/RLS/as-deployment/ directory.
- Logon to the target machine as the oracle user
- Copy the following two scripts to a working directory on the target machine * change-data-sources * Dcmctl.pm
- Set the scripts' execution mode to 755
2. Execute the script
The datasource configuration for a given VO can be checked manually by looking at the datasources configuration file: $ORACLE_HOME/j2ee/test/config/data-sources.xml
The automatic script will change the datasources by modifying the datasources.xml configuration file. The database host and database SID will be replaced with the new ones for the given VO. The script then will restart the container.
The script usage is as follows:
usage: ./change-data-source --sid=SID --db_host=DBHOST --vo=VO [-v]
Options:
vo name of the virtual organization (atlas, cms, ...)
db_host hostname of the database server of LRC and RMC
sid SID of the database of LRC and RMC
v verbose mode
Example:
./change-data-source --sid=certrls4 --db_host=lxshare333d --vo=atlas
Checking the update
You can check that the update was successful by looking at the datasource now configured in the datasources.xml file
Deploy old end points of CMS
CMS needs for temporary timescale that old endpoints (before the last convention was established) are also deployed, so they can get the RLS services from hard-coded endpoints.
The fastest way is to deploy via command line AFTER normally deploy the present RLS services endpoints.
1. Get the files
2. Deploy the applications
3. Change the URL binding
4. Redeploy the applications
5. (Re)Start the container
1. Get the files
The necessary files are the .ear files for the corresponded version, which can be found in:
/afs/cern.ch/project/grid/wp2/oracle-deployment/edg-local-replica-catalog/
/
/afs/cern.ch/project/grid/wp2/oracle-deployment/edg-replica-metadata-catalog//
For version 2.0.2 you can just run these commands:
scp @lxplus
:/afs/cern.ch/project/grid/wp2/oracle-deployment/edg-local-replica-catalog/2.0.2/*.war .
scp @lxplus
:/afs/cern.ch/project/grid/wp2/oracle-deployment/edg-replica-metadata-catalog/2.0.2/*.war .
2. Deploy the applications
To deploy the applications run the following commands:
dcmctl deployApplication -file ./edg-local-replica-catalog-2.0.2.war -a edg-local-replica-catalog -co cms -rc /edg-replica-location -d -v
dcmctl deployApplication -file ./edg-replica-metadata-catalog-2.0.2.war -a edg-replica-metadata-catalog -co cms -rc /edg-replica-metadata-catalog -d -v
3. To start or restart the container now just do:
to start:
dcmctl start -co cms
to restart:
dcmctl restart -co cms
to start the http server:
dcmctl start -ct ohs
4. Validation
To validate and check if everything went well, go to the URLs and ping the services:
http://
.cern.ch:7777/edg-replica-location/
http://
.cern.ch:7777/edg-replica-metadata-catalog/
Retrieve backup data from TSM for RLS
This page describes the necessary steps to retrieve files required for a restore and recovery of the RLS1 database from TSM tape storage. At the moment, archived files must be first retrieved from TSM tape storage.
These steps should be executed as 'orapdm' on the PDB backup server, pdb-backup1, which currently is an alias to lxshare077d.
Create temporary directories for the retrieval:
Create a temporary directory to stagein files from TSM to pdb-backup1 if one does not already exist:
> sudo mkdir /data/pdb-backup1/restore/rls1
> sudo mkdir /data/pdb-backup1/restore/rls1/archivelogs
Copy the archive logs to the retrieval directory:
> cd /data/pdb-backup1/rls1/archivelogs
> sudo -s
> tar cvf - * | (cd /backup/restore/rls1/archivelogs/; tar xvf -)
>exit
Retrieve the data:
Startup the Tivoli Storage Manager and locate the file space name that you need to retrieve, i.e. /data/pdb-backup1:
sudo /usr/bin/dsmc
tsm> query filespace
tsm> query archive "/data/pdb-backup1/rls1/*" -subdir=yes
Retrieve the data. The time taken to mount the required media may be long
tsm> retrieve "/data/pdb-backup1/rls1/*" /data/pdbbackup1/restore/rls1/ -subdir=yes