CMS
When experts have to be contacted
- If there is a snow/GGUS then just send SMS to Eddie (00306948878807) and Julia (164588)
- If you get alarms for one of those machines dashb-ai-581,dashb-ai-583, dashb-ai-584, dashb-ml01, dashb-ai-601 (UI), dashb-ai-602 (UI), please, send SMS to Eddie and Julia
Collectors (only for Eddie)
Collectors are running on the machines
dashb-ai-581,dashb-ai-583,
dashb-ai-584 and dashb-ml01
In principle everything should restart automatically in case of failure.
First thing to check is that ML log is updated
It is located in the directory /data/arda_cms_ml/MonaLisa/Service/myFarm/result_logs/
the file name is
JStore_current_date_.log
If ML log is not updated for more than 20 minutes (ML should restart every 10 minutes if it fails), give Julia a call asap, this might be again a problem with the ML central services and nothing can be done from our side
Other 3 collectors are under our control
To see how they are running and what are log files, look into
/data/dboard_cms/restart.crontab.sh
Over last month or so , everything worked smoothly without even any delays.
If alarm arrives and ML is running fine, look in the directory:
/data/dboard_cms/ml-data/
if everything is fine, and there is just a bit of delay, this directory should be updated with ml_orig* and ml_dict* recent files
and directory /data/dboard_cms/tmp should be updates with ml_dict*log recent files. These are log files of the last collector step which writes data into DB.
UI
There are two UIs, if the servers are not responding you could restart them from openstack.cern.ch > Instances. Call Eddie and/or Julia in case of a problem.
ATLAS
The panda collector is running on dashb-ai-566. If you get the following alarm:
URGENT: PANDA collector is an hour or more behind!!!
Call Eddie!
--
JuliaAndreeva - 2014-12-09