Overview

The GridFTP Monitoring system takes entries from the GridFTP log files and inserts them into the Oracle database at CERN via R-GMA. This information can then be visualized using the GridView display. A daemon on the GridFTP node parses the GridFTP log file and publishes this information in the form of tuples via a Continuous Primary Producer.

The Primary Producer has two parts, a client on the GridFTP node and the server on the site MON node. This Primary Producer registers itself in the R-GMA Registry. The tuples from all the Primary Producers are consumed by a secondary Producer. This Secondary producer also has two parts; a Server on the site MON node and the client on the node hosting GridView. GridView takes this information, inserts it into the Oracle database give a visual representation of the data.

The nodes involved are:

  • All GridFTP nodes.
  • lxb2009.cern.ch (site MON: server for the producers and the archiver)
  • lcgic01.gridpp.rl.ac.uk (R-GMA Registry)
  • lxgate24.cern.ch (GridView and archiver of the information in Oracle)

  • lxn1193 and lxn1191 also archive the data in mysql and republish it as secondary producers, but they are not critical (the Oracle DB is). They exist mainly for debugging and monitoring purposes (see cron jobs below).

The system and be split into 4 parts.

  • GridFTP publishing.
  • R-GMA Registry
  • Secondary Producers
  • GridView

Monitoring

There are currently three cronjobs running that do basic monitoring on the system. The cron jobs mail both James Casey and Laurence Field when there is a problem. One cron job runs on lxn1193 and another on lxn1191. These two cronjobs are the same. They monitor the number of tuples received in the past hour (ish). If this number drops below a certain threshold a mail is sent. There is another cron job on lxn1193 that monitors the number of producers registered. When the number of either the history, latest or continuous producers falls below a certain threshold a mail is sent.

There are no Lemon alarms set for the moment.

Trouble shooting

Part of the troubleshooting actually refers to R-GMA. Please check also the R-GMA procedure: https://twiki.cern.ch/twiki/bin/view/LCG/RGMASmokeTestAndActions

You can run the R-GMA client check script on any node to find the status of R-GMA.

/opt/glite/bin/rgma-client-check

You can run the R-GMA sever check script on the site MON node to find the status of R-GMA server.

/opt/glite/bin/rgma-server-check

The meaning of error messages from these scripts can be found in the Grid Wiki site Mon node. If you think that there is a problem with the site Mon node then you can try to restart tomcat.

/etc/rc.d/init.d/tomcat4 restart

Look in the log file for errors.

/var/tomcat4/logs/catalina.out

If restarting does not help you will need to contact ...

Registry

If there is a problem with the registry you will need to contact ...

GridFTP publishing

You can use the rgma command line tool to check the number of continuous producers.

/opt/glite/bin/rgma -c "show producers of gridftpmonitor" | grep continuous | wc -l

If there are no producers then there could be a problem with the site Mon node or the Registry.

You can see tuples as they are being published by doing the following.

rgma> set query continuous Set query type to continuous

rgma> set timeout 0 Set timeout to 0 seconds

rgma> select * from GridftpMonitor

If there are no tuples being published then there might be a problem with the site Mon node.

You can check the status of the GridFTP publisher by using the following command.

/etc/rc.d/init.d/lcg-mon-gridftp status

If there is a problem you can try to restart the daemon.

/etc/rc.d/init.d/lcg-mon-gridftp restart

Take a look in the log file for errors.

/opt/lcg/var/log/lcg-mon-gridftp.log

On the castorgridsc cluster, this process is automatically monitored by LEMON, and restarted as necessary.

Secondary Producers

If the number of tuples falls bellow a certain level, the Primary Producers could have stopped publishing or the Secondary Producer could have stopped consuming data.

Note: Only try to fix lxn1193, you should leave lxn1191 so that the R-GMA developers can investigate the problem.

You can check what producers are registered with (in any LCG UI): =/opt/glite/bin/rgma -c "show producers of gridftpmonitor"

You should see 4 secondary producers, 2 of type latest and 2 of type history (of lxn1191 and lxn1193) and a primary producer per GridFTP server (many). A typical output follows:

+--------------------------------------------------------------------------+------------+-------------+
| Endpoint                                                                 | ID         | Query types |
+--------------------------------------------------------------------------+------------+-------------+
| http://lxn1191.cern.ch:8080/R-GMA/DBProducerServlet                      | 904336665  | history     |
| http://lxn1193.cern.ch:8080/R-GMA/DBProducerServlet                      | 2028626163 | history     |
| http://lxn1193.cern.ch:8080/R-GMA/LatestProducerServlet                  | 1595739184 | latest      |
| http://lxn1191.cern.ch:8080/R-GMA/LatestProducerServlet                  | 816998665  | latest      |
| http://grid-lcg.physik.uni-wuppertal.de:8080/R-GMA/StreamProducerServlet | 1221351875 | continuous  |
| http://a01-004-168.gridka.de:8080/R-GMA/StreamProducerServlet            | 571144787  | continuous  |
| http://fal-pygrid-17.lancs.ac.uk:8080/R-GMA/StreamProducerServlet        | 1699266066 | continuous  |
| http://bohr0002.tier2.hep.man.ac.uk:8080/R-GMA/StreamProducerServlet     | 1336858673 | continuous  |
| http://serv02.hep.phy.cam.ac.uk:8080/R-GMA/StreamProducerServlet         | 184502956  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831919  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831918  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831917  | continuous  |
| http://grid-rgma.desy.de:8080/R-GMA/StreamProducerServlet                | 684685523  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831916  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831915  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831914  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831913  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831912  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831911  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831910  | continuous  |
| http://lcgce01.nic.ualberta.ca:8080/R-GMA/StreamProducerServlet          | 1193222599 | continuous  |
| http://grid-se.ii.edu.mk:8080/R-GMA/StreamProducerServlet                | 1254562450 | continuous  |
| http://glenmorangie.epcc.ed.ac.uk:8080/R-GMA/StreamProducerServlet       | 1559985859 | continuous  |
| http://skurut18.cesnet.cz:8080/R-GMA/StreamProducerServlet               | 784204178  | continuous  |
| http://lcgmon01.phy.bris.ac.uk:8080/R-GMA/StreamProducerServlet          | 1561742277 | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831909  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831908  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831907  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831906  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831905  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831904  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831903  | continuous  |
| http://grid-rgma.desy.de:8080/R-GMA/StreamProducerServlet                | 684685378  | continuous  |
| http://xg006.inp.demokritos.gr:8080/R-GMA/StreamProducerServlet          | 1425345492 | continuous  |
| http://lcg01.gsi.de:8080/R-GMA/StreamProducerServlet                     | 1475613543 | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831928  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831927  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831926  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831925  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831924  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831923  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831922  | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831921  | continuous  |
| http://ui-lcg.projects.cscs.ch:8080/R-GMA/StreamProducerServlet          | 1783660483 | continuous  |
| http://lxn1178.cern.ch:8080/R-GMA/StreamProducerServlet                  | 929831920  | continuous  |
+--------------------------------------------------------------------------+------------+-------------+
45 Rows in set

First all all check the GridFTP publishing.

Check to see if the Secondary Producer is registered.

/opt/glite/bin/rgma -c "show producers of gridftpmonitor" | grep latest | wc -l /opt/glite/bin/rgma -c "show producers of gridftpmonitor" | grep history | wc -l

If there is a problem, try to restart tomcat on the node.

If this does not fix the problem you will need to contact ...

GridView

Overview

Currently GridView runs on lxgate24.

The archiver process on lxgate24 is started by init process (/etc/inittab) in respawn mode, if it gets killed due to any reason, it gets restarted by init.

As of now, if you want to explicitly restart it, one can just kill the running process and init will automatically start a new process.

Logs are maintained in /opt/gridview/logs. If tomcat and RGMA-consumer is restarted, the process should automatically restart. If the consumer exits the gridview application starts a new consumer, if any runtime exception occours, the archiver restarts itself. (That is, exits and then is respawned by init).

If grid view doesn't work you need to .....

-- DavidSmith - 21 Jul 2005

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2006-02-07 - PeterJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback