Content

Site Support Team - Metric Scripts

In this twiki page, we maintain documentation of our metric scripts, which feed the site status board. Site support team has many metrics in the site status board to monitor sites. We not only provide these metrics to monitor sites, but also to help other teams.

Usable Analysis Metric For CRAB3

No permission to view CMS.SiteSupportCrabStatus
View topic

AAA Federation Metrics

---+ AAA Federation Metrics

Introduction

This twiki page explains details of the AAA federation metrics which are used by Transfer Team. You should keep in mind that we don't give information about AAA project, but we give technical details about only these metrics.

Conditions

For these two metrics, we have the same conditions.
  • AAA-related ticket open for longer than two weeks
  • SAM access test < 50% for two weeks
  • HammerCloud test success rate < 80% for two weeks
It is enough to be marked as a bad site if one of these conditions is True for a site.

Dataflow

Quick Access

Content Path, link, machine name, etc. Notes
Developer cms-comp-ops-site-support-team[at]cern.ch SST will be responsible for run of the scripts
Scripts github  
Machine vocms077 SST is responsible for this machine
Output Path /afs/cern.ch/user/c/cmst1/www/SST/ {aaaTrans.txt, aaaProd.txt} outputs generated by the scripts are placed here
Metrics AAA Transitional Federation, AAA Production Federation  

-- AliMehmetAltundag - 2015-12-15
View topic


SAM, HammerCloud and Storage Metrics

---+ SSB scripts The following scripts are used to generate information for the Site Status Board. site_avail_sum.pl and jr_successrate.pl can be found at SSBScripts (for the rest: SSBScripts@git.cern)

Script Short Description Machine User Purpose Requires Syntax Notes
site_avail_sum.pl feeds SAM metric vocms077 cmssst calculates the site SAM availability in the last 24 hours X.509 RFC proxy site_avail_sum.pl <outfile> MUST be run AFTER 00:00 UTC
jr_successrate.pl feeds HammerClod metric vocms077 cmssst calculates the job success rate of a given activity in the last N hours X.509 RFC proxy jr_successrate.pl <activity> <outfile> <hours> MUST be run around 00:00 UTC
cms-storage-info.pl feeds most of the metrics in the storage view ??? SST is NOT responsible for this calculates several storage-related values for the SSB   cms-storage-info.pl SHOULD be run once per hour
pledged.pl feeds Disk pledge (TB) metric and Tape Pledge (TB) metric in the storage view ??? SST is NOT responsible for this calculates pledged disk and tape resources   pledged.pl SHOULD be run once per hour

Note: an RFC proxy is generated adding -rfc to the voms-proxy-init command.

Detailed description

site_avail_sum.pl

This script is used for the Site Readiness, to calculate the SAM availability of the previous day. The time range used to extract the numerical values and generate the URLs is [day-1 00:00, day 00:00] in UTC time, where day is the day when the script is run.
  • It is recommended to run the script at 2:10 local time to ensure that it is run after 00:00 UTC even during daylight saving time.

If the outfile is not given, it defaults to /afs/cern.ch/cms/LCG/SiteComm/site_avail_sum.txt, which is the path used by the Site Status Board (it can be changed in the SSB column configuration).

If it is run on SLC6, it has no special dependencies. On SL5, the JSON.pm module should be installed.

The typical cron job should look like

10 2 * * * (/data/cmssst/MonitoringScripts/init.sh && perl /data/cmssst/MonitoringScripts/SiteComm/SSBScripts/site_avail_sum.pl /afs/cern.ch/user/c/cmssst/www/ssb/site_avail_sum.txt)

In Central European Summer Time (CEST), unless it is possible to use UTC for cron jobs.

CEST = UTC +2
CET = UTC +1

jr_successrate.pl

This script is used for the Site Readiness, to calculate the HammerCloud jobs success rate. It allows to calculate it for the past N hours (the default is 24, which is what is needed by the Site Readiness). As time ranges are given up to seconds, there is no special need to run it before 00:00 UTC but it should still be run within minutes of that time because the Site Readiness metrics refer to whole days. The activity must be hctest, which for historical reasons corresponds to glidein HC jobs.

If the outfile is not given, it defaults to /afs/cern.ch/cms/LCG/SiteComm/successrate_<activity>.txt, which is the path used by the Site Status Board (it can be changed in the SSB column configuration).

If it is run on SLC6, it has no special dependencies. On SL5, the JSON.pm module should be installed.

The typical cron job should look like

55 0 * * * (/data/cmssst/MonitoringScripts/init.sh && perl /data/cmssst/MonitoringScripts/SiteComm/SSBScripts/jr_successrate.pl hctest /afs/cern.ch/user/c/cmssst/www/ssb/successrate_hctest.txt 24)
55 * * * * (/data/cmssst/MonitoringScripts/init.sh && perl /data/cmssst/MonitoringScripts/SiteComm/SSBScripts/jr_successrate.pl hctest /afs/cern.ch/user/c/cmssst/www/ssb/successrate_hctest6.txt 6)

In Central European Summer Time (CEST), unless it is possible to use UTC for cron jobs.

  • During winter time (CET) this requires to change the execution time to 55 0. This has been a frequent source of problems in the past.
CEST = UTC +2
CET = UTC +1

The first cron is for the Site Readiness, while the second cron job is to generate the information for the default view of the SSB.

cms-storage-info.pl

This script is used to feed the 'storage' view of the SSB with information taken from PhEDEx and the BDII. It is hard coded to write its output to files in /afs/cern.ch/cms/LCG/SiteComm/.
View topic

Other Metric Scripts

Script/Name Short Description Machine User Output Path Output URL Scripting Structure Status
Phedex Version feeds phedex version metric vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/phedex_version cmssst web area/phedex_version new running
ggus feeds ggus metric vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/aaa cmssst web area/aaa new running
space_monitoring feeds space monitoring metric vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/space_mon/ cmssst web area new running
LifeSatus feeds LifeStatus vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/newLifeStatus cmssst web area new running
New Lifestatus??? ??? vocms077 cmssst ??? ??? new running
SiteReadiness feeds site readiness ranking and site status metrics. These two are basic input of the waiting room and morgue concepts (confirm this!!! confirm this ok!?) vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/sreadiness/toSSB cmssst web area/sreadiness/toSSB {SiteReadiness_SSBfeed.txt, SiteReadinessRanking_SSBfeed_last15days.txt} software engineering of this script set needs improvements running
Waitingroom Controller not used anymore vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/others/ cmssst web area {WaitingRoom_Sites.txt, wr_log.txt} old running*
WRDays this script has dependency on old WR metric. so, this means when you stop feeding the old WR metric#153, this script will fail. see the metrics fed by this script: 154, 155, 156 these metrics were used by SST people, 1 generation before Memet, to investigate WR history of sites vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/others/ cmssst web area {WaitingRoom_*MonthSum.txt} old running
Morgue not used anymore vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/others/ cmssst web area {morgue.txt, morgue.log} old running*
drain (Prod Status) 158 vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/others/ cmssst web area {drain_log.txt, drain.txt} old running
prod_cores 159 vocms077 cmssst /afs/cern.ch/user/c/cmssst/www/others/ cmssst web area {prod.txt, prod.json} old running
*: these metric scripts will be stopped soon!??

-- AliMehmetAltundag - 2015-12-15

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2017-01-25 - StephanLammel
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback