Machine / Job Features tests and monitoring

A SAM/ETF probe for testing the Machine/Job Features functionality from within jobs exists as the Python script WN-mjf . The definitive copy is currently kept in the mjf-scripts GitLab repo.

This script can be run manually on worker nodes, or by submitting it yourself in test jobs. However, it is also run across WLCG sites by the ETF preprod service, and the per-CE output of the probe is on the etf-lhcb-preprod dashboard.

The probe produces an Error response if the $MACHINEFEATURES or $JOBFEATURES variables are not set. Missing key/values produce errors, warnings or informational messages, according to this dictionary in WN-mjf:

                            'MACHINEFEATURES' : { 'hs06'        : self.WarningCode,
                                                  'total_cpu'   : self.ErrorCode, 
                                                  'shutdowntime': self.InfoCode,
                                                  'grace_secs'  : self.InfoCode }, 
                            'JOBFEATURES' : { 'allocated_cpu'       : self.ErrorCode,
                                              'hs06_job'            : self.WarningCode,
                                              'shutdowntime_job'    : self.InfoCode,
                                              'grace_secs_job'      : self.InfoCode,
                                              'jobstart_secs'       : self.ErrorCode,
                                              'job_id'              : self.ErrorCode,
                                              'wall_limit_secs'     : self.ErrorCode,
                                              'cpu_limit_secs'      : self.ErrorCode,
                                              'max_rss_bytes'       : self.WarningCode,
                                              'max_swap_bytes'      : self.WarningCode,
                                              'scratch_limit_bytes' : self.WarningCode } 

On the ETF page, errors are indicated in red, warnings by yellow, and probes which only result in informational messages in green. At this stage, yellow or green count as passing the test. So for example, if you do not publish HS06 values, then the probe will give a warning but not a red error.

-- AndrewMcNab - 2016-06-02

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2016-11-19 - AndrewMcNab
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback