Lemon for the CMS SAM client

Abnormal situations to be trapped

Test submission has stopped (i.e. the cron scripts do not run)

Detection

At least one of the SAM log files is older than 2 hours.

file.sslmtime gives the age of a file in seconds or -1 if the file does not exist. We need to have a different metric for each log file.

Action

Raise an alarm which triggers a call to the CMS SAM support.

Configuration

The file is customization/cms/jrobots_sam/sam_metrics.tpl.

The relevant metrics are 4120, 4121 and 4122 and the exception is 30691.

One of the cron scripts results in a fatal error (e.g. cannot create a proxy)

Detection

A FATAL error was logged in the SAM log files in the last 2 hours.

Action

Raise an alarm which triggers a call to the CMS SAM support.

Configuration

The file is customization/cms/jrobots_sam/sam_metrics.tpl.

The relevant metrics are 5370, 5371 and 5372 and the exception is 30692.

Publication to the SAM database fails

Detection

The number of error messages in the last 2 hours complaining about database problems is larger than 20. See what the IT-GT SAM team does.

Action

Raise an alarm which triggers a call to the CMS SAM support.

Configuration

To be done.

Known ssues

  1. The actuator runs as root and therefore the e-mail is sent by root@vocms36NOSPAMPLEASE.cern.ch, which is rejected by the e-group. Temporarily using my e-mail address.

-- AndreaSciaba - 15-Mar-2010

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2010-03-23 - AndreaSciaba
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback