Checking the ILCDirac Server Status
Dirac Webinterface
Look at the
SystemAdministration
- Click on the machines and select "Show Errors": In many cases the error message is just a warning and can be ignored. This takes some time to get used to.
Checking the status on the machines
Logon to the VOBOX of interest: voilcdirac01, voilcdirac02, voilcdirac03, etc.
ssh voilcdirac01
Make yourself dirac user: you'll need to be dirac to start/stop services:
sudo su dirac
Source the dirac environment if needed
source /opt/dirac/bashrc
Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.
Check the disk space with
df -h
/opt/dirac should never be at a 100%. In that case, the services start to have problems. In the worst case, the web page fails because it cannot put anything in cache. To "fix" the situation, usually restarting the services is enough: the mySQL cache is emptied, and some disk space recovered. It allows agents to work (in particular the JobCleaningAgent). Now, how to do that?
It requires to know that all services/agents are running with the runit framework (
http://smarden.org/runit/
). Dirac comes with a set of handy commands to allow proper supervision:
runsvctrl t path/to/service
restarts the service at path/to/service (example: DataManagement_FileCatalog). To restart properly an agent, it is needed to create an empty file called stop_agent under /opt/dirac/control/Sytem/Agent.
runsvctrl d path/to/service
takes down the service
runsvctrl u path/to/service
restarts the service after using the previous.
One can also use
runsvstat *
To see what is running and what is down. All on volcd01 should be running.
--
AndreSailer - 2014-10-30