RB/WMS Monitoring
Metrics
- Number of jobs in the state Running in the condor queue.
- Number of jobs in the states Running or Idle in the condor queue.
- Number of jobs treated in the previous day.
- Average load of the machine during the past 15 minutes.
- Number of entries in file /var/edgwl/workload_manager/input.fl (Workload Manager).
- Number of entries in file /var/edgwl/jobcontrol/queue.fl (Job controler).
- Number of dg20logd_* files in directory /var/tmp.
- Size (in KB) of file /var/{edgwl,glite}/workload_manager/input.fl.
- Size (in KB) of file /opt/condor/var/condor/spool/job_queue.log.
- Number of file descriptors opened by process edg-wl-log_monitor.
- Occupancy (in %) of directory /var/{edgwl,glite}/SandboxDir.
Web server: lxb2007.cern.ch
You can access to the informations published by the RBs and WMS nodes at the following url:
http://lxb2007.cern.ch/monitoring/monitoring.html
The informations are updated every 30 minutes.
TODO list
- Run this service as a daemon.
- Add new metrics / sensors
- Write Lemon sensors.