FTS Third Level procedure

The following describes the 3rd level procedure for checking the FTS nodes. It contains more detailed procedures for checking and fixing problems, along with general debugging hints. It is designed to be followed by the 3rd level support.

It should be followed mostly in sequence.

The code examples below are designed, as much as possible, to be cut and pasted into the terminal window. Please report any errors you see.

The general procedure in case of action is always:

1) Understand clearly from the 2nd level support the current situation

Ask the following questions:

  1. Did the backup FTS host get switched on and the DNS flipped?
  2. Is the problem with the primary machine, the backup machine or visible on both?
  3. Has any other action been taken so far? (Server reboots, daemon restarts, config file changes, etc)?
  4. Did the smoke test show up any strange things in any of the logfiles ("ORA errors, Java stack traces, etc)? (for which no action was indicated in the 2nd level support procedures)
  5. Anything else that looks out of place or behaves differently from what you've seen before?

2) Be VERY sure that you are not running daemons on both the primary and backup FTS servers.

Check the daemons on both the primary and backup FTS servers. There should only be daemons running on one of the servers.

If daemons are running on both:

  • Kill the daemons on the primary machine.
  • Restart the daemon on teh backup machine.
  • Attempt to fix any problems on the backup.

3) Repeat the FTS smoke test

Run the FTSSmokeTest if you haven't already. You may see something along the way that was missed the first time.

In the morning check to see if you can update the SecondLevelProcedure or FTSSmokeTest to clarify what to look out for.

4) Search for wierd things

Bad software versions, bad usernames, bad certificates, etc.

Log into the box as root, download and run the fts-check.sh script at the bottom of the FtsServerInstall112 page.

wget https://twiki.cern.ch/twiki/pub/LCG/FtsServerInstall/fts-check.sh
chmod u+x fts-check.sh
./fts-check.sh

TODO: make a proper suite of these diagnosis scripts and deploy them.

5) Log into the box an prod around

You're the expert. If you're not the expert, call him.

This section will be expanded as things occur to me. At the moment, it relies on the software developers' knowledge of the software.

Make sure that any fixes you find are put back into the FTSSmokeTestAndActions or FTSThirdLevelProcedure

6) Repeat the FTS clean procedure

This will give you a clean environment to start debugging.

Follow the FtsMeanAndClean15 procedure.

Useful notes

Log into the DB server FTS account and look around

You may need access to the DB schema to check for corruption or other problems. Look in the file:

/opt/glite/etc/config/glite-file-transfer-service-oracle.cfg.xml

the DB* attributes at the top give you the DB hostname, DB servicename, username and password. You can use these to log into the DB server:

sqlplus username/password@dbhostname/dbservicename

-- GavinMcCance - 14 Jul 2005

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2007-04-10 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback