Tier-0 Operations - Useful commands

WMAgent

Condor

Contents :

Get information about the the jobs sent by the agent running in a site (for example T0_CH_CERN)

$manage execute-agent wmagent-resource-control --site-name=T0_CH_CERN -p

Example output

T0_CH_CERN - 0 running, 28 pending, 10000 running slots total, 5000 pending slots total, Site is Normal:
  Cleanup - 0 running, 0 pending, 160 max running, 80 max pending, priority 5
  Merge - 0 running, 4 pending, 1000 max running, 400 max pending, priority 5
  Harvesting - 0 running, 0 pending, 80 max running, 40 max pending, priority 3
  Skim - 0 running, 0 pending, 1 max running, 1 max pending, priority 3
  LogCollect - 0 running, 1 pending, 80 max running, 40 max pending, priority 3
  Processing - 0 running, 23 pending, 9000 max running, 5000 max pending, priority 0
  Repack - 0 running, 0 pending, 2500 max running, 500 max pending, priority 0
  Production - 0 running, 0 pending, 1 max running, 1 max pending, priority 0
  Express - 0 running, 0 pending, 9000 max running, 500 max pending, priority 0

Get status of everything running in the pool (from the schedd)

This command should be executed from one of the schedds in the pool.
condor_status -schedd

Example output

Name                 Machine    TotalRunningJobs TotalIdleJobs TotalHeldJobs 

cmsgwms-submit1.fnal cmsgwms-su            19276         14202              9
cmsgwms-submit2.fnal cmsgwms-su             9322          9032              0
cmssrv113.fnal.gov   cmssrv113.                1             2              0
cmssrv218.fnal.gov   cmssrv218.              110          1329              0
cmssrv219.fnal.gov   cmssrv219.                0             0              0
vocms001.cern.ch     vocms001.c                8            40              0
vocms015.cern.ch     vocms015.c                0             0              0
vocms0230.cern.ch    vocms0230.                0             0              0
vocms0303.cern.ch    vocms0303.                0             0              0
vocms0308.cern.ch    vocms0308.            12413          3173              2
vocms0309.cern.ch    vocms0309.            24639         15850              8
vocms0310.cern.ch    vocms0310.            21581         18317            526
vocms0313.cern.ch    vocms0313.               27             1              0
vocms0314.cern.ch    vocms0314.               17             0              0
vocms039.cern.ch     vocms039.c                0             0              0
vocms047.cern.ch     vocms047.c                0             0              0
vocms053.cern.ch     vocms053.c                6             0              0
vocms074.cern.ch     vocms074.c             1113         19830              0
                      TotalRunningJobs      TotalIdleJobs      TotalHeldJobs

                    
               Total             88513              81776                545

Get the information about the VMs in a site (for example T0_CH_CERN) from a particular pool (-pool vocms007 refers to the collector of the pool)

condor_status -const 'ParentSlotId is undefined && GLIDEIN_CMSSite=?="T0_CH_CERN"' -totals -pool vocms007

WARNING: You are querying the Central Collector of the pool, using this command in a cronjob may result in harming the Collector's performance

Example output:

        X86_64/LINUX     1684     0       0        70       0          0

               Total     1684     0       0        70       0          0

This is a filtered version in case you need only the numeric value (used in the monitoring scripts)

condor_status -const 'GLIDEIN_CMSSite=?="T0_CH_CERN" && ParentSlotId is undefined' -totals | grep "X86_64/LINUX" | awk '{print $2}'

Example output:

1684

Turn off condor service

condor stop it if you want your data to be still available, then cp your spool directory to diskcp -r /mnt/ramdisk/spool /data/when you reboot the machinecopy it backotherwise just create an empty spool directory

Change the thresholds for jobs in the queue

(used by the agent to decide how many jobs to send to the queue).

The number of jobs per task-type has to be consistent with the global number of jobs

  •    $manage execute-agent wmagent-resource-control --site-name=T2_CH_CERN_T0 --task-type=Repack --pending-slots=600 
  •  $manage execute-agent wmagent-resource-control --site-name=T0_CH_CERN --pending-slots=1600 --running-slots=1600 --apply-to-all-tasks 
  •  $manage execute-agent wmagent-unregister-wmstats `hostname -f`:9999 

Number of Jobs per number of cores running in the pool

 condor_q -const 'JobStatus=?=2' -name vocms0314.cern.ch -af:h RequestCpus | sort | uniq -c

Details of the command

  • Basic condor command to check the queue
          condor_q 
          
    * Adding constraints to the query of the queue -const * Check only for jobs running 'JobStatus=?=2' * Check only the jobs sent from vocms0314 Schedd -name vocms0314.cern.ch * Choose ClassAd to show -af:h RequestCpus * Sort and count the output sort | uniq -c


How to choose a good run to get tarballs from?

7. Automatización runs para replays 8. Confirmar si se limpia toda la run y con qué peridiocidad

Nuevo plot para cores usadso

11. Nuevo plot para uso de HighIO Slots

--
-- Schedd: vocms0314.cern.ch : <128.142.209.169:4080?...

977 jobs; 0 completed, 0 removed, 0 idle, 977 running, 0 held, 0 suspended

<verbatim>
condor_q -w -const 'MATCH_GLIDEIN_CMSSite=?="T0_CH_CERN" && JobStatus =?=2 && RequestIoslots &gt;0 && CMS_JobType != "'Merge'"' -totals
</verbatim>
887 jobs; 0 completed, 0 removed, 0 idle, 887 running, 0 held, 0 suspended

<verbatim>
condor_q -w -const 'MATCH_GLIDEIN_CMSSite=?="T0_CH_CERN" && JobStatus =?=2 && RequestIoslots &gt;0 && CMS_JobType == "'Merge'"' -totals
</verbatim>

<verbatim>
condor_q -w -const 'MATCH_GLIDEIN_CMSSite=?="T0_CH_CERN" && JobStatus =?=2 && RequestIoslots &gt;0 && CMS_JobType == "'Merge'"' -totals
</verbatim>
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2016-07-11 - JohnHarveyCasallasLeon
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback