Site Support Team - Documentation
Production Related Activities

Relevant SAM tests

After looking at the different metrics these are the SAM tests that are important for MC production:
  • org.cms.WN-env
  • org.cms.SRM-GetPFNFromTFC
  • org.cms.SRM-VOGet
  • org.cms.SRM-VOPut
  • org.cms.WN-basic
  • org.cms.WN-frontier
  • org.cms.WN-mc
  • org.cms.WN-squid
  • org.cms.WN-swinst
  • org.cms.WN-xrootd-fallback ---> This one is not a requirement at the moment, but will be really important in the near future.

Sites Pledges

  • For Production WMAgents use the pledges information from:
    • SSB - Pledges view
    • This information should be updated manually by the workflow team.
      • However, this is just a reference for monitoring the usage and has, as far as I known, no influence on how much we really run at a site.

How to update the Pledges for Production

  1. Find the # of slots (cores) available at the Site
  2. Log in SSB
    • You should be SSB admin with Modify Metrics privileges - if you are not, ask dashboard-support.cern.ch
  3. Go to metric history plot
  4. Right click on a row from the plot that you want to change the value for.
  5. Update the Value and Status

How to find the Pledges per Site

  1. Check http://gstat-wlcg.cern.ch/apps/pledges/resources/
  2. Hover with mouse over federated site to see resources for individual sites
  3. Convert HS06 normalized CPU performance to slots
    • 1 slot = 10 HS06 (on average)

Sites out of Production (Waiting Room)

  • When a site comes OUT of the waiting room
    1. Communicate with Workflow team.
    2. Test site if valid for re commissioning workflows (workflow team)
      • If the site has been in the WR for more than 8 weeks - it should be re-commissioned according to the procedures.
      • If the site was re-commissioned before and has been in the WR for less than 8 weeks, the site just needs to be put out of drain.
    3. Remove the sites in drain (workflow team)

How to Add/Remove sites in Drain

  1. login to any vocms machine
  2. sudo -u cmst1 /bin/bashs
  3. vim ~cmst1/www/site-limits.conf --> requires afs permissions to access the file (ask Edgar Fajardo and Jorge Amando Molina-Perez)
    1. To Add: write drain next to site name
    2. To Remove: delete drain next to site name, leave it blank
    • drain = finish running jobs & don't send anymore jobs to site
    • skip = site has never been commissioned & jobs will not be sent
    • down = site previously commissioned that is not going to be used anymore
    • [blank] = jobs will be sent to the site
  • There is a script running in all agents reading the txt file. The cronjobs run every 15 mins and update the agents running the following command for each site in the file.
[cmssrv94] /data/srv/wmagent/current > ./config/wmagent/manage execute-agent wmagent-resource-control --drain --site-name=T1_US_FNAL

How to re commission a site for Workflows

Commission a site in testbed by Assigning WFs via scripts:

  1. Follow the procedure CompOpsWorkflowComissionT2Site
    • The assigned WF is a long WF (>8 000 jobs) intended to take long time to complete so we don't have to keep assigning workflows until testing period (a few days) is complete.
  2. Follow the assigned WF: Procedures

Sites in Scheduled Downtime

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r19 - 2015-02-04 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback