LFC Smoke-Test


1) Log into any LCG UI (version >= 2_4_0). You can also log into lxplus and source the file /afs/cern.ch/project/gd/LCG-share/2.4.0/sl3/etc/profile.d/grid_env.sh
2) Set the environment to point to the "problematic" LFC server:

$ export LFC_HOST=lfc-dteam.cern.ch


3) Try to ping the host. If you can not ping the host you are probably experiencing an hardware problem or a network problem or the machine is in the middle of a reboot. Wait a few minutes and call operator if problem persists.
4) Try to list the contents of the root directory

$ lfc-ls -l /   
drwxr-xr-x   5 root     root                      0 Jun 06 11:30 grid    

Possible problem: Authentication. You have no valid credentials to access the service. This could mean:

  • You forgot to get a proxy (check with grid-proxy-info). Get a new proxy (grid-proxy-init).

  • Your DN is not in /etc/grid-security/grid-mapfile of the LFC (the grid-mapfile is empty, corrupted, not there at all, failed to update etc ...). Try to refresh the grid-mapfile running on the LFC the command looking for possible problems.

$ /opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/grid-mapfile --safe

  • The CRL of the CA which issued your certificate expired on the LFC server. In the LFC server, look for errors in the log file /var/log/edg-fetch-crl-cron.log. Also, try to run the command to refresh the CRL by hand and look for possible errors:

$ /opt/edg/etc/cron/edg-fetch-crl-cron

P.S. To debug authentication (authorization) problems it is useful to increase the verbosity level for the authentication setting the appropriate environment:

$export CSEC_TRACE=1 (increases verbosity level) 
$export CSEC_TRACEFILE=somefile.log (redirect the log to somefile.log) 

Possible Problem: the lfcdaemon died.

  • Log into the LFC server. Check if the daemon is still alive

$ /etc/init.d/lfcdaemon status
lfcdaemon (pid 27528) is running...                        [  OK  ] 

  • If NOT, try to restart the service and see possible error messages. In addition, look if the master pocess is running together with all the threads:

$ ps -auxm --forest | grep lfcd

  • You should see the master thread (on PID 27528 in this case) and 20 slaves. You can always look in logfiles for any hint (/var/log/lfc/log)

Possible Problem: there is no connection with the database: the connection with the backend has been lost. It could meen the backend has some problem or the LFC lost the connection for some reason and did not re-estabish it.

  • In the LFC, look for any entry in the log file (/var/log/lfc/log) mentioning the oracle databese (entries like ORA:). Try to restart the service.

5) Try to insert an entry into the catalog i.e. creating a directory and verify that the directory is really there

sh-2.05b$ lfc-mkdir /grid/dteam/simone-test
sh-2.05b$ lfc-ls -l /grid/dteam
drwxr-xr-x   3 dteam002 cg                        0 Jun 16 09:53 generated
drwxrwxr-x   0 dteamsgm cg                        0 Jun 28 18:29 simone-test
drwxr-xr-x   3 dteam001 cg                        0 Jun 16 11:53 tests    

Possible problem: you have no valid credentials to write into the area.

  • Most likely means yuo have no write access to the directory. Verify which access privileges you have (lfc-ls -l) and make sure you are writing in the right place

Possible problem: something filled up the filesystem (dump of a process, log file ...)

  • Log into the LFC server and check this is really the case (df -l), identifying the full partition.

  • Try to figure out what filled up the partition. Good places to look are /tmp and /var/log (especially look into /var/log/lfc/log which is where the lfcdaemon logs by default) and try to identify the source of problem.

  • For the moment old log files could be deleted, although it may be safer to copy them to somewhere where there is free space, like /shift

  • If a logfile grew too much, make sure there is a correct logrotate entry set up i.e.:

$ cat /etc/logrotate.d/lfcdaemon
/var/log/lfc/log { 
    compress
    daily
    delaycompress
    missingok
    rotate 15
    }

6) Remove the directory from the catalog (lfc-rm), so that you do not leave any garbage. If you reached successfully this point, removing the directory should work.
If the problem does not fall into any of the above categories, try to have look at the log file in the LFC (/var/log/lfc/log).

If all else fails, call the appropriate level 3 support person.


-- SimoneCampana - 29 Jun 2005
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2005-08-16 - AntonioDelgadoPeris
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback