Alan Malta Twiki Page

Useful links and documents (TODO: need to review and clean up these links)

General links

Important Links

Production Statistics

Grid and Transfer Issues

  • Downtimes - CMS Schedule and unschedule downtimes
  • SiteReadiness - T1 Site Availability
  • NagiosT2Tests - SAM Availability based on Nagios tests (Latest results for T2's sites)
  • SiteDB - Information about Sites and Site Admins
  • PledgeSlots - To check the pledge number of slots

Useful Monitoring Links

RelVal Validation

Debugging GlideIns

  • Check if your jobs are matching in the FE, so if the FE is asking for them:
condor_status -any -const 'MyType=="glideresource"&&GLIDEIn_CMSSite=="T1_US_FNAL"' -format '%5i' GlideFactoryMonitorRequestedIdle -format '%s\n' Name | grep main
  • Check if new Glideins are getting into the queues (if you find 0's, then ask the FE expert):
condor_status -any -const 'MyType=="glideresource"&&GLIDEIn_CMSSite=="T1_US_FNAL"' -format '%5i' GlideFactoryMonitorStatusPending -format '%s\n' Name | grep main
  • Check if there are running Glideins (notify the FE expert in case of discrepancy):
condor_status -any -const 'MyType=="glideresource"&&GLIDEIn_CMSSite=="T1_US_FNAL"' -format '%5i' GlideFactoryMonitorStatusRunning -format '%s\n' Name | grep main

SVN commands

  • Pulling an SVN tree:
svn co svn+ssh://svn.cern.ch/reps/WmAgentScripts ./WmAgentScripts
  • Adding a file into SVN, note that the file/dir won't appear unless you do an svn commit afterwards:
svn add <file/dir>
  • Recursively commits all the changed, added and deleted files to the SVN server:
svn commit -m "<your message>"

GIT commands

  • Changing files from a commit existent in a PR:
git reset --soft HEAD~1
git reset FILENAME    ### to unstage a file
git checkout -- FILENAME ### to rollback that file to its original state before that commit

LSF commands

  • Getting information about LSF host configuration, including host type, host model, CPU normalization factor, number of CPUs, maximum memory, and available resources:
lshosts -l HOST
  • Getting information about the server hosts in the LSF Batch System:
bhosts -l HOST
  • Getting a list of hosts that fills the resource requirements:
bhosts -R "select[type==SLC6_64]"
  • Submitting a LSF job to SLC6 worker nodes:
bsub -R "select[type==SLC6_64]" -q cmsrelval -J "$job" < $job

Condor job status

  • 0 - Unexpanded U
  • 1 - Idle I
  • 2 - Running R
  • 3 - Removed X
  • 4 - Completed C
  • 5 - Held H
  • 6 - Submission_err E

Hints for cmsDriver.py

This tool creates production configuration files. For example the following command:
    # cmsDriver.py SingleMuPt1_cfi.py -s GEN,FASTSIM --conditions=MC_3XY_V10::All --eventcontent=RECOSIM  -n 10 --no_exec
  • SingleMuPt1_cfi.py: this file is used in order to get its settings to the config file to be created (placed in Configuration/Generators/).
  • GEN,FASTSIM: runs these parts of the CMSSW, generator followed by fast simulation.
  • MC_3XY_V10::All: this is the global tag which will be used for apply the alignment and calibration constants.
  • RECOSIM: the output file of this job will contain RECOSIM content.
  • 10: the number of events per run

Hints for WMStats

One could use the following couch query to get the status timestamps and calculate the time that the workflow took to complete:
https://cmsweb.cern.ch/couchdb/wmstats/_all_docs?keys=%5B%22jen_a_ACDC_Fall53_2011A_METBTag_Run2011A-v1_Prio1_5312p1_131101_030858_2617%22%5D&include_docs=true

Other general hints

Getting CERN users info

phonebook -lo username
phonebook full_name

Hints for PhEDEx Data Service

  • To get information regarding the dataset/block/file, lfn, checksum, etc:
    https://cmsweb.cern.ch/phedex/datasvc/xml/prod/data?file=/store/data/Run2010A/EG/RAW-RECO/WZEG-Nov4Skim_v1/0042/7A527E82-31ED-DF11-855E-003048D439C6.root
  • Checking requests information about a given dataset:
    https://cmsweb-testbed.cern.ch/phedex/datasvc/xml/prod/requestlist?dataset=/EG/Run2010A-WZEG-Nov4Skim_v1/RAW-RECO
  • Checking requests information about a given dataset, transfer only type and to T2_DE_DESY only:
    https://cmsweb-testbed.cern.ch/phedex/datasvc/xml/prod/requestlist?dataset=/ReggeGribovPartonMC_EposLHC_pPb_4080_4080_DataBS/pPb816Spring16GS-MB_80X_mcRun2_pA_v4-v2/GEN-SIM&type=xfer&node=T2_DE_DESY
  • To get information regarding the block (filtering according to the node):
    https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicas?dataset=/Jet/Run2010B-RelValRawSkim-v1/RAW&node=T1_US_FNAL*
  • Checking the TFC for a given site:
    https://cmsweb.cern.ch/phedex/datasvc/xml/prod/tfc?node=T2_CH_CERN
  • To get the PFN of a given file:
    https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=T1_FR_CCIN2P3_MSS&lfn=/store/unmerged/logs/prod/2012/12/13/spinoso_HIG-Summer12-00976_R2087_B222_00_LHE_121202_135447_6236/MonteCarloFromGEN/0160/0/013c8d40-4238-11e2-8352-003048f0e3f4-26796-0-logArchive.tar.gz&protocol=srmv2
  • Checking for all DQM samples at T2_CH_CERN:
    https://cmsweb.cern.ch/phedex/datasvc/xml/prod/data?node=T2_CH_CERN&dataset=*4_4_2*DQM
  • Getting the total number of files for a given dataset:
    curl -ks --cert $X509_USER_PROXY --key $X509_USER_PROXY "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/blockreplicas?dataset=/QCD_Pt-30to50_MuEnrichedPt5_TuneZ2_7TeV_pythia6/Summer11Leg-START53_LV4-v1/GEN-SIM" | grep -A1 NAME | grep FILES | sed 's/,//g' | awk '{FILES+=$3}END{printf("total files: %d\n",FILES)}'
  • Getting all samples at T2_CH_CERN that match a specific regex:
    curl -k -H 'Accept: text/xml' 'https://cmsweb.cern.ch/phedex/datasvc/perl/prod/blockreplicas?dataset=/SingleElectron/CMSSW_7_4_0_pre8*/*&node=T2_CH_CERN'

Hints for DBS3 API

  • Querying for a dataset and its status:
    # https://cmsweb.cern.ch/dbs/prod/global/DBSReader/datasets?dataset=/RelValSingleMuPt100/CMSSW_7_0_0_pre11-START70_V4_FastSim_TEST_Agent0988_Validation_RelVal-v1/GEN-SIM-DIGI-RECO&dataset_access_type=*
  • Request dataset/blocks details, such as # of files, of events and total file size:
    # https://cmsweb-testbed.cern.ch/dbs/prod/global/DBSReader/blocksummaries?dataset=/RelValSingleMuPt100/CMSSW_7_0_0_pre11-START70_V4_FastSim_TEST_Agent0988_Validation_RelVal-v1/GEN-SIM-DIGI-RECO
or
    # https://cmsweb-testbed.cern.ch/dbs/prod/global/DBSReader/blocks?dataset=/RelValSingleMuPt100/CMSSW_7_0_0_pre11-START70_V4_FastSim_TEST_Agent0988_Validation_RelVal-v1/DQM&detail=True

Git hints

  • Getting stats from the diff between two branches.
    # git diff --stat --color comp..comp_gcc481

  • Retrieving the diff of a file betweem two branches:
    # git diff --color comp..comp_gcc481 reqmon.spec

RPM hints

  • Lists all rpms installed in the system and grep for expat packages:
    # rpm -qa | grep expat

  • Show all the files installed by a given package
    # rpm -ql expat-2.0.1-11.el6_2.x86_64

  • Listing symbols provided by a given library:
    # nm -a libexpat.so

  • Listing files stored in an rpm file:
    # rpm -qpl file.rpm

Hints for T1_US_FNAL

  • Copying a file from FNAL to your machine:
    # dccp dcap://cmsdca3.fnal.gov:24145/pnfs/fnal.gov/usr/cms/WAX/11/store/data/Run2010A/EG/RAW-RECO/WZEG-Nov4Skim_v1/0043/6AFF1EFA-3FED-DF11-AAF0-002481E1512E.root .

Troubleshooting Links

Linux Administration

  • Tracing a process and sending its system calls/logs to a file:
    # strace -r -p PID -f -o outputfile
  • Collecting overall IOPS during an interval of 30min (for the vdb device). Then we also calculate the average of these values a) considering all reads and b) discarding 0s. Also collects max and min values:
    # iostat 1 1800 -d vdb | grep vdb | awk '{ print $2; }' > iops_30min
    # cat iops_30min | awk '{ sum += $1 } END { if (NR > 0) print sum / NR }'
    # egrep -v '^0.00' iops_30min | awk '{ sum += $1 } END { if (NR > 0) print sum / NR }'
    # cat iops_30min | sort -n | tail
  • Rsync a file with a less costly encryption algorithm:
    # rsync -e "ssh -c arcfour" /data/srv/logs/workqueue/old-logs-20141102-0107.zip amaltaro@alan-wmacloud:/data/amaltaro/
  • For the sake of my memory:
    # for comp in `cat ../components`; do echo $comp; tail -n100 install/wmagent/$comp/ComponentLog; echo ""; done > currentLogs
  • Querying CMSWEB from SL6 nodes:
    # wget -q --certificate=$X509_USER_PROXY --private-key=$X509_USER_PROXY --ca-certificate=$X509_USER_PROXY "https://cmsweb.cern.ch/sitedb/data/prod/whoami" -O -
  • Applying WMAgent patches:
    # wget https://github.com/dmwm/WMCore/pull/$PR_NUM.patch -O - | patch -d apps/wmagent/lib/python2.6/site-packages -p 3
  • Removing applied WMAgent patches:
    # wget https://github.com/dmwm/WMCore/pull/$PR_NUM.patch -O - | patch -d apps/wmagent/lib/python2.6/site-packages -p 3 -R
    or
    # patch -p0 -R < patch_file_path.patch
  • Component monitoring with `ps` as done in WMCore:
    # ps -p PID_NUMBER -o pid,ppid,rss,vsize,pcpu,pmem,cmd -ww | grep PID_NUMBER
  • Listing the background linux processes:
    # jobs
  • Converting a ugly PSet.py to a beautiful one :):
    # python -i Pset.py
    # python>print '%s' % process.dumpPython()

  • Adding and committing a file to svn:
    # svn add CheckWorkQueueElements.py
    # svn commit mc/CheckWorkQueueElements.py

  • Cleaning up old MySQL files:
     mysql> PURGE BINARY LOGS BEFORE '2012-11-22 22:46:26'; 

  • To list all files greater than ~10G:
    # find . -type f -size +10G -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
  • Find a given wmbs_id in the local agent:
    # find . -name \*job_342678\*
  • To check which kind of shell you are using:
    # echo $0

  • Compiling python files:
    # python -m compileall <DIR>

  • To remove lines that has a string:
    sed '/<string>/d' <Input_file> > <Output_file>

  • To replace a string by a new one in several files:
    sed -i 's/<old_string>/<new_string>/g' <Files>*

  • Replacing a comma separated list of runs by a pipe separated one:
    perl -p -i -e 's/([0-9]{6}),(?=[0-9]{6})/\1|/g' cmsDriver_standard_hlt.txt

  • To print only directories
    # ls -d */

  • To convert the EPOCH time in human readable time
    # date -d @<epoch_time>

  • To check the used space at T2_CH_CERN for RelVal team (only logi bytes):
    # eoscms quota | grep -A 5 /relval

  • List content in CMS EOS space at CERN:
    # eoscms ls /store/logs/prod/2015/10/WMAgent/

  • Copying files over with XROOTD (you need your proxy and the PFN from storage.xml):
    # xrdcp root://cmsxrootd.fnal.gov//store/unmerged/logs/prod/2016/10/21/mewu_RVCMSSW_8_1_0_pre13RunZeroBias2015D__RelVal_zb2015D_161019_140327_9153/HLTDR2_25ns/0000/1/025cb918-9755-11e6-ab8f-02163e00d7b3-0-1-logArchive.tar.gz .

  • Copy a file from CMS EOS storage to your local disk:
    # xrdcp root://eoscms.cern.ch//eos/cms//store/logs/prod/2015/10/WMAgent/amaltaro_MonteCarlo_Ext_alan_TEST_StageOut_v15_151001_131202_7326/amaltaro_MonteCarlo_Ext_alan_TEST_StageOut_v15_151001_131202_7326-LogCollect-s1wn80-1-logs.tar .

  • Checking the AFS quota:
    # fs lq ~

  • To check information related to a given file in EOS:
    # eos fileinfo $LFN

  • There are 2 output streams, stdout for normal output, stderr for warnings.
    # command> myfile     # will redirect the stdout of command to myfile
    # command 2> myfile  # will redirect the stderr of command to myfile

  • Show all output lines of command except those containing the string str.
    # command | grep -v <str>

  • To check what services should be automatically started:
    # ntsysv
    # crontab -l  # minute hour day month week_day linha-de-comando

  • To logout a user:
    # pkill -KILL -u <user>

  • Command line to comment out all lines in a file (using vim):
      :1,$s/^/#/g

  • Command to erase the first word in all lines of the file (using vim):
      :%norm dw

  • Command to find all files ended in ch:
    # find . -name "*.ch"

  • To check the selinux status on SLC5:
    # cat /etc/sysconfig/selinux

  • To list the active internet connections:
    # netstat -atunp

  • To check if a service is set up in the startup of the machine:
    # chkconfig --list | grep <service>
    # chkconfig --add <service>
    # chkconfig --level <run_levels> <service> on          // e.g.: chkconfig --level 2345 pool on
    # chkconfig <service> off

  • Network commands:
    # route add default gw <ip>
    # route -n
    # chkconfig --level <run_levels> <service> on          // e.g.: chkconfig --level 2345 pool on

  • How to list, remove and install a new kernel:
    # rpm -q kernel
    # rpm -ev <kernel_returned_above>
    # yum install kernel kernel-devel

  • Checking the state of a hard disk (in a non-destructive way):
    # badblocks -o badblocks_found.dat -n -v /dev/sda4
        or
    # fsck.ext3 -n -v /dev/sda4

  • sed: Text manipulation such as replace/delete/etc. :
    # sed 's/old_string/new_string/g' <file_in.job >file_out.job
    # echo $1 | sed 's/\*//g'                # removing the stars from the string

  • awk: just print the select columns from the filename, adding a space between the second and fifth column:
    # awk '{print $1 $2 " " $5}' <filename>

  • nohup: Keeps a program running in background even after you log off from the terminal. Example:
    # nohup program &
  • grep: looking for a pattern (in this case, X86_64)
    • -R: recursively
    • -H: print the filename for each match
    • -n: Prefix each line of output with the line number within its input file.
    • -s: Suppress error messages about nonexistent or unreadable files.
    • -I Process a binary file as if it did not contain matching data;
    • -i: Ignore case distinctions in both the PATTERN and the input files.
    # grep -R -H -I -n -s X86_64 *

  • networking: locate of some files to configure the network interface
    # cd /etc/sysconfig/network-scripts/

  • Cleaning the bash history:
    # history -c

  • Checking out a given branch:
    # cvs checkout -r branch_52X Configuration/PyReleaseValidation

Grid Administration

  • Commands related to proxy:
    # voms-proxy-init -debug -voms cms       # the cert/key pair of files should be placed under ~/.globus
    # voms-proxy-init --valid 192:00 -voms cms:/cms/Role=production
    # globusrun -a -r osgce.hepgrid.uerj.br   # testing authentication
    # voms-proxy-info -all -file <your proxy>

  • Copying the stageout file into your localhost
    # globus-url-copy -vb gsiftp://osgce.hepgrid.uerj.br//share/apps/scratch/<user>-stageout/<file> file://localhost//`pwd`/<file>
    # srmcp -debug -2 "srm://se-dcache.hepgrid.uerj.br:8443//srm/managerv2?SFN=/pnfs/hepgrid.uerj.br/data/cms/store/user/amaltaro/outfile.root" file:///`pwd`/Alan_output.root
    # lcg-cp -b -D srmv2 --vo cms "srm://se-dcache.hepgrid.uerj.br:8443//srm/managerv2?SFN=/pnfs/hepgrid.uerj.br/data/cms/store/user/amaltaro/outfile.root" file:///`pwd`/Alan_output.root
    # lcg-cp -b -D srmv2 --vo cms "srm://cmssrm.fnal.gov:8443/srm/managerv2?SFN=/11/store/data/Run2012A/MET/RAW/v1/000/191/226/1859BA1D-DA85-E111-ABB7-003048F118DE.root" file:///`pwd`/Alan_output.root
    # lcg-ls -b -D srmv2 --vo cms "srm://se-dcache.hepgrid.uerj.br:8443/srm/managerv2?SFN=/pnfs/hepgrid.uerj.br/data/cms/store/user/amaltaro/"
    # lcg-cp -v -b -D srmv2 srm://storm-fe-cms.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/cms$tarball_lfn file:////`pwd`/logArchive.tar.gz
    # srmls "srm://se-dcache.hepgrid.uerj.br:8443/srm/managerv2?SFN=/pnfs/hepgrid.uerj.br/data/cms"

  • Manually updating CRLs (SE and GridFTP, respectivelly):
    # /opt/vdt/fetch-crl/sbin/fetch-crl --loc /opt/vdt/globus/TRUSTED_CA/ --out /opt/vdt/globus/TRUSTED_CA/
    # /opt/vdt-client/fetch-crl/sbin/fetch-crl --loc /opt/vdt-client/globus/TRUSTED_CA/ --out /opt/vdt-client/globus/TRUSTED_CA/

  • Submitting jobs via globus:
    # globus-job-run osgce.hepgrid.uerj.br:/jobmanager-condor /bin/uname -a
    # globus-job-run red.unl.edu:/jobmanager-condor /bin/uname -a
    # globus-job-run cit-gatekeeper.ultralight.org:/jobmanager-condor /bin/uname -a

  • Condor commands:
    # condor_hold -reason 'alan messing around' 6502.11
    # condor_rm -const 'CMS_JobType =?= "Merge"'
    # condor_rm -const 'WMAgent_RequestName == "amaltaro_ACDC3_task_SUS-RunIIWinter15wmLHE-00039__v1_T_150817_164946_9264"'
    # condor_submit <submit.file>
    # watch -n 1 'condor_q'
    # condor_qedit 763.0 RequestMemory 2200
    # condor_qedit 261.0 DESIRED_Sites "\"T1_US_FNAL\""
    # condor_qedit -const 'JobStatus=?=1 && (CMS_JobType =?= "Production" || CMS_JobType =?= "Processing")' JobPrio 999999
    # condor_qedit -const 'JobStatus=?=1 && isUndefined(REQUIRED_OS)' REQUIRED_OS "\"any\""
    # condor_q -af:h MaxWallTimeMins
    # condor_q -af:h WMAgent_SubTaskName RequestCpus RequestMemory RequestDisk MachineCount AccountingGroup MaxWallTimeMins
    # condor_rm -name analysis4@analysis4.local64 <user>
    # condor_q -global | grep running
    # condor_q -hold
    # condor_status -submitters
    # condor_status -run
    # condor_rm <job_id>
    # condor_q -bet <job_id>
    # condor_q -global -better-analyze <job_id>
    # condor_prio -p 20 <job_id>      ### Giving the maximum priority in the local queue
    # condor_history -const 'regexp("_150903_",WMAgent_RequestName)' -format "%s " MaxWallTimeMins  -format "%s " RequestDisk -format "%s \n" WMAgent_SubTaskName
    # condor_userprio -all -allusers
    # condor_history -const 'WMAgent_RequestName == "amaltaro_RV620pre5slc6_TEST_SL6_FNAL_SingleMuPt100_131010_150812_226"' -format "%s " LastRemoteHost  -format "%s " REQUIRED_OS -format "%s \n" WMAgent_SubTaskName
  • Getting a total number of autoclusters or listing them in a given schedd
    # condor_status -schedd -af:h name Autoclusters | grep submit
    # condor_q -name vocms0250.cern.ch -autocluster
  • Checking where a workflow ran:
    # condor_history -const 'WMAgent_RequestName == "amaltaro_ReDigi_cmsRun2_LumiMask_DMWM_TEST_ALAN_location_v4_160210_100338_3679"' -af:h DESIRED_Sites MachineAttrGLIDEIN_CMSSite0
  • Looking for condor jobs running in a specific site:
    # condor_q -w -constraint 'MachineAttrGLIDEIN_CMSSite0 == "'T2_DE_DESY'"'
  • Condor history to get the exact reason for condor remove:
    # condor_history -b -m 1 15824.1 -format '%s ' DiskUsage -format '%s ' NumShadowStarts -format '%s\n' NumJobStarts
  • Get the number of running jobs per site and per schedd (run it from any agent):
    # condor_q -g -const 'MachineAttrGLIDEIN_CMSSite0=?="T2_CH_CERN"' -format '%s\n' User | sort | uniq -c
  • Query for jobs pending with T3_US_NERSC in the DESIRED_Sites list:
    # condor_q -const 'JobStatus==1 && stringListIMember("T3_US_NERSC", DESIRED_Sites,",")' -af WMAgent_SubTaskName DESIRED_Sites
  • Changing a WMAgent schedd priority in the condor collector (vocms97):
    # condor_userprio -all -allusers
    # condor_userprio -setfactor cmst1@vocmsxxx.cern.ch 100000.00

  • CRAB / CMSSW commands:
    # scramv1 list CMSSW
    # scramv1 project -n <project_name> CMSSW CMSSW_4_1_2_patch1
    # eval `scramv1 runtime -sh`                                 // or just "cmsenv"
    # crab -create
    # crab -validateCfg
    # crab -submit
    # crab -status
    # crab -status -c <project_dir>
    # crab -getoutput
    # crab -listMatch
    # crab -uploadLog

  • Checking the validity of the certificates (CEs):
    # openssl x509 -in  /data/srv/current/auth/proxy/proxy.cert -noout -dates -subject
    # openssl x509 -in /etc/grid-security/hostcert.pem -dates -noout
    # openssl x509 -noout -subject -in /etc/grid-security/hostcert.pem | sed -e 's/^subject= //'
    # openssl x509 -text -in /etc/grid-security/hostcert.pem | grep Subject:   # to get the hosts' DN
    # openssl x509 -dates -in /etc/grid-security/http/httpcert.pem -noout
    # openssl x509 -dates -in /etc/grid-security/rsvcert.pem -noout
    # openssl x509 -dates -in /etc/grid-security/containercert.pem -noout

  • Starting/stopping services on OSGCE:
    # source /opt/osg/setup.sh
    # vdt-control -off
    # vdt-control -list
    # vdt-control -<on/off> <service>

  • Logfiles to check the GUMS service:
    # tail -f /opt/gums/tomcat/v55/logs/gums-service-admin.log
    # vim /etc/grid-security/grid-mapfile

  • Extracting the *.pem certificate/key through the *.p12 file.
    # openssl pkcs12 -clcerts -nokeys -in cert.p12 -out usercert.pem
    # openssl pkcs12 -nocerts -in cert.p12 -out userkey.pem

  • Making a DBS query using the CLI (don't forget to initialize your proxy):
    # dbs listBlocks --report --path=/MinimumBias/Commissioning10-Sep17ReReco_v2/RECO
    # find dataset, block, block.size where dataset=/HIAllPhysics/HIRun2010-v1/RAW and run=152698

  • Making a PhEDEx xml query thru the url (if you wish you can replace "xml" by "json"):
    # https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicas?node=T2_CH_CERN&dataset=*4_4_2*DQM

Workload Management (WMCore stuff)

Locally killing (aborting/force-completing) a workflow inside an agent

$manage execute-agent kill-workflow-in-agent pdmvserv_task_HIN-pPb816Spring16GS-00123__v1_T_170913_201506_4093

Pushing couchapps to couch

$manage execute-agent wmagent-couchapp-init

Renewing myproxy in the production agents

From one of the agents, check the timeleft for the long term proxy:
(export X509_USER_CERT=/data/certs/myproxy.pem; export X509_USER_KEY=/data/certs/myproxy.pem; myproxy-info -v -l amaltaro -s "myproxy.cern.ch" -k amaltaroCERN)

Renewing the "amaltaroFNAL" long term proxy. Connect to a FNAL node or cmslpc cluster and run:

myproxy-init -v -l amaltaro -c 720 -t 168 -s "myproxy.cern.ch" -x -Z "/DC=org/DC=(opensciencegrid|incommon)/*/CN=(wmagent/|)wmagent.fnal.gov" -n -k amaltaroFNAL

Renewing the "amaltaroCERN" proxy. Connect to vocms0192 and run:

myproxy-init -v -l amaltaro -c 720 -t 168 -s "myproxy.cern.ch" -x -Z "/DC=ch/DC=cern/OU=computers/CN=wmagent/(vocms0192|vocms0280).cern.ch" -n -k amaltaroCERN

Adding a site to the WMAgent resource-control database

Method 1: if the site has a proper registration into SiteDB, including a proper PSN and PNN

$manage execute-agent wmagent-resource-control --plugin=SimpleCondorPlugin --pending-slots=100 --running-slots=200 --add-one-site T3_US_Colorado

Method 2: mostly for T3s or opportunistic resources that don't have a PNN (storage). Be *careful* because a wrong information can break... AgentStatusWatcher

$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --cms-name=T1_US_FNAL --ce-name=T1_US_FNAL --pnn=T1_US_FNAL_Disk --plugin=SimpleCondorPlugin --pending-slots=1500 --running-slots=4000
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=Processing --pending-slots=1500 --running-slots=4000
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=Production --pending-slots=1500 --running-slots=4000
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=Merge --pending-slots=50 --running-slots=50
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=Cleanup --pending-slots=50 --running-slots=50
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=LogCollect --pending-slots=50 --running-slots=50
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=Skim --pending-slots=50 --running-slots=50
$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --task-type=Harvesting --pending-slots=10 --running-slots=20

Updating the thresholds for a site. Setting the same values for ALL task types.

$manage execute-agent wmagent-resource-control --site-name=T1_US_FNAL --pending-slots=100 --running-slots=1000 --apply-to-all-tasks

Handy database queries

Getting the number of files available for unfinished subscriptions (provided a workflow name)

SELECT wmbs_subscription.id AS subId, wmbs_workflow.task AS wfName, wmbs_workflow.injected As wfInj, wmbs_fileset.name AS fsetName,
  wmbs_fileset.open AS fsetOpen, wmbs_subscription.finished AS subFin, wmbs_sub_types.name AS subType, count(wmbs_sub_files_available.fileid) AS filesAvail
  FROM wmbs_subscription
    INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
    INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
    INNER JOIN wmbs_fileset ON wmbs_subscription.fileset = wmbs_fileset.id
    INNER JOIN wmbs_sub_files_available ON wmbs_sub_files_available.subscription = wmbs_subscription.id
  WHERE wmbs_workflow.name='pdmvserv_task_MUO-RunIIFall18GS-00027__v1_T_190627_203802_3573'
    AND wmbs_subscription.finished = 0
 GROUP BY wmbs_subscription.id, wmbs_workflow.task, wmbs_workflow.injected, wmbs_fileset.name, wmbs_fileset.open,
   wmbs_subscription.finished, wmbs_sub_types.name;

Getting the number of jobs in the 'created' status by workflows

SELECT wmbs_workflow.name, count(wmbs_job.id)
FROM wmbs_job
INNER JOIN wmbs_jobgroup ON
wmbs_job.jobgroup = wmbs_jobgroup.id
INNER JOIN wmbs_subscription ON
wmbs_jobgroup.subscription = wmbs_subscription.id
INNER JOIN wmbs_sub_types ON
wmbs_subscription.subtype = wmbs_sub_types.id
INNER JOIN wmbs_job_state ON
wmbs_job.state = wmbs_job_state.id
INNER JOIN wmbs_workflow ON
wmbs_subscription.workflow = wmbs_workflow.id
WHERE wmbs_job_state.name = 'created'
GROUP BY wmbs_workflow.name;

Getting a map of workflow -> subscriptions -> fileset for a specific workflow:

  # SELECT wmbs_subscription.id AS subId, wmbs_workflow.task AS wfName, wmbs_workflow.injected As wfInj, wmbs_fileset.name AS fsetName,
  wmbs_fileset.open AS fsetOpen, wmbs_subscription.finished AS subFin, wmbs_sub_types.name AS subType 
  FROM wmbs_subscription
    INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
    INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
    INNER JOIN wmbs_fileset ON wmbs_subscription.fileset = wmbs_fileset.id
  WHERE wmbs_workflow.name='sagarwal_task_TOP-RunIIFall18wmLHEGS-00109__v1_T_181105_200726_3792'
    AND wmbs_fileset.open = 1
    AND wmbs_subscription.finished = 0
  ORDER BY fsetName;

Investigating held jobs in bossAir and wmbs

SELECT rj.id, rj.wmbs_id, rj.grid_id, rj.status, rj.sched_status, rj.status_time,
  wmbs.state AS wmbs_state, wmbs.state_time AS wmbs_state_time, wmbs.outcome AS wmbs_outcome, wmbs.cache_dir
  FROM bl_runjob rj INNER JOIN wmbs_job wmbs ON rj.wmbs_id = wmbs.id
  WHERE sched_status = 5;

Getting the number of jobs in each status for a specific workflow

SELECT wmbs_job_state.name, count(wmbs_job.id)
FROM wmbs_job
INNER JOIN wmbs_jobgroup ON
wmbs_job.jobgroup = wmbs_jobgroup.id
INNER JOIN wmbs_subscription ON
wmbs_jobgroup.subscription = wmbs_subscription.id
INNER JOIN wmbs_sub_types ON
wmbs_subscription.subtype = wmbs_sub_types.id
INNER JOIN wmbs_job_state ON
wmbs_job.state = wmbs_job_state.id
INNER JOIN wmbs_workflow ON
wmbs_subscription.workflow = wmbs_workflow.id
WHERE wmbs_workflow.name = 'pdmvserv_task_HIG-RunIIWinter15wmLHE-00561__v1_T_151207_101950_7247'
GROUP BY wmbs_job_state.name;

Provided a workflow task and an input file, get all the wmbs id that processed part of that input file and failed

SELECT wj.id AS jobid, wfd.id AS fileid, wfd.events, wfd.first_event, wfd.merged
                 FROM wmbs_file_details wfd
                 INNER JOIN wmbs_job_assoc wja ON wja.fileid = wfd.id
                 INNER JOIN wmbs_job wj ON wj.id = wja.job
                 INNER JOIN wmbs_jobgroup wjg ON wjg.id = wj.jobgroup
                 INNER JOIN wmbs_subscription ws ON ws.id = wjg.subscription
                 INNER JOIN wmbs_workflow ww ON ww.id = ws.workflow
                 WHERE wfd.lfn = 'MCFakeFile-pdmvserv_SUS-RunIISpring16FSPremix-00055_00034_v0__160929_122033_2363-Production-fa3a91bc780de492218c2fe34047a1c3'
                 AND wj.outcome=0 AND ww.task = '/pdmvserv_SUS-RunIISpring16FSPremix-00055_00034_v0__160929_122033_2363/Production';

Check number of jobs in each job state

SELECT wmbs_job_state.name, COUNT(*) FROM wmbs_job
JOIN wmbs_job_state ON (wmbs_job.state = wmbs_job_state.id)
GROUP BY wmbs_job.state, wmbs_job_state.name;

Provided a workflow, get an overview of the amount of jobs in each state and job type.

SELECT wmbs_sub_types.name AS job_type, wmbs_job_state.name as state, COUNT(*) AS count FROM wmbs_job
            INNER JOIN wmbs_jobgroup ON wmbs_job.jobgroup = wmbs_jobgroup.id
            INNER JOIN wmbs_subscription ON wmbs_jobgroup.subscription = wmbs_subscription.id
            INNER JOIN wmbs_workflow ON wmbs_workflow.id = wmbs_subscription.workflow
            INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
            INNER JOIN wmbs_job_state ON wmbs_job.state = wmbs_job_state.id
            WHERE wmbs_workflow.name = 'cmsunified_ACDC0_task_TOP-RunIISummer15wmLHEGS-00341__v1_T_191210_085250_7580'
            GROUP BY wmbs_sub_types.name, wmbs_job_state.name; 

Provided a workflow, get an overview of the amount of jobs for each task, in each state, job type and the job outcome (1 is success).

SELECT wmbs_workflow.task, wmbs_sub_types.name AS job_type, wmbs_job_state.name as state, wmbs_job.outcome, COUNT(*) AS count FROM wmbs_job
            INNER JOIN wmbs_jobgroup ON wmbs_job.jobgroup = wmbs_jobgroup.id
            INNER JOIN wmbs_subscription ON wmbs_jobgroup.subscription = wmbs_subscription.id
            INNER JOIN wmbs_workflow ON wmbs_workflow.id = wmbs_subscription.workflow
            INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
            INNER JOIN wmbs_job_state ON wmbs_job.state = wmbs_job_state.id
            WHERE wmbs_workflow.name = 'pdmvserv_task_TOP-RunIISummer15wmLHEGS-00427__v1_T_191009_224926_441'
            GROUP BY wmbs_workflow.task, wmbs_sub_types.name, wmbs_job_state.name, wmbs_job.outcome
            ORDER BY wmbs_workflow.task;

Provided a workflow, list the WMBS job id, state and job type for every job.

SELECT wmbs_sub_types.name AS job_type, wmbs_job_state.name as state, wmbs_job.id FROM wmbs_job
            INNER JOIN wmbs_jobgroup ON wmbs_job.jobgroup = wmbs_jobgroup.id
            INNER JOIN wmbs_subscription ON wmbs_jobgroup.subscription = wmbs_subscription.id
            INNER JOIN wmbs_workflow ON wmbs_workflow.id = wmbs_subscription.workflow
            INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
            INNER JOIN wmbs_job_state ON wmbs_job.state = wmbs_job_state.id
            WHERE wmbs_workflow.name = 'cmsunified_ACDC0_task_TOP-RunIISummer15wmLHEGS-00341__v1_T_191210_085250_7580'
            ORDER BY wmbs_sub_types.name, wmbs_job_state.name; 

Getting an overview of wmbs_job: number of jobs in each wmbs_job state, grouped by the job type and state.

SELECT wmbs_sub_types.name AS job_type, wmbs_job_state.name as state, COUNT(*) AS count FROM wmbs_job
            INNER JOIN wmbs_jobgroup ON wmbs_job.jobgroup = wmbs_jobgroup.id
            INNER JOIN wmbs_subscription ON wmbs_jobgroup.subscription = wmbs_subscription.id
            INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
            INNER JOIN wmbs_job_state ON wmbs_job.state = wmbs_job_state.id
            GROUP BY wmbs_sub_types.name, wmbs_job_state.name; 

Getting an overview of bl_runjob: number of jobs in each schedd_status and their status in the agent (active or not).

SELECT bl_status.name AS sched_status, status AS active, count(*) AS count
            FROM bl_runjob
            INNER JOIN bl_status ON bl_runjob.sched_status = bl_status.id
            LEFT OUTER JOIN wmbs_users ON wmbs_users.id = bl_runjob.user_id
            INNER JOIN wmbs_job ON wmbs_job.id = bl_runjob.wmbs_id
            GROUP BY sched_status, active;

Count the job types in created status (in the JobSubmitter queue)

SELECT wmbs_sub_types.name, COUNT(*) FROM wmbs_job
INNER JOIN wmbs_jobgroup ON wmbs_job.jobgroup = wmbs_jobgroup.id
INNER JOIN wmbs_subscription ON wmbs_jobgroup.subscription = wmbs_subscription.id
INNER JOIN wmbs_sub_types ON wmbs_subscription.subtype = wmbs_sub_types.id
WHERE wmbs_job.state = (SELECT id FROM wmbs_job_state WHERE name = 'created')
GROUP BY wmbs_subscription.subtype, wmbs_sub_types.name;

Get acquired files, subscription and location for a specific workflow id:

SELECT wmbs_sub_files_acquired.fileid, wmbs_sub_files_acquired.subscription, wpnn.pnn AS pnn
  FROM wmbs_sub_files_acquired
  INNER JOIN wmbs_subscription ON wmbs_sub_files_acquired.subscription = wmbs_subscription.id
  INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
  INNER JOIN wmbs_file_location ON wmbs_sub_files_acquired.fileid = wmbs_file_location.fileid
  INNER JOIN wmbs_pnns wpnn ON wmbs_file_location.pnn = wpnn.id
  WHERE wmbs_workflow.id=207 ORDER BY pnn;

Get the job mask and the amount of input files each wmbs job is supposed to process:

SELECT wmbs_job.id, wmbs_job_mask.firstlumi, wmbs_job_mask.lastlumi, count(wmbs_job_assoc.fileid) AS num_files
  FROM wmbs_job INNER JOIN wmbs_job_mask ON wmbs_job.id=wmbs_job_mask.job
  INNER JOIN wmbs_job_assoc ON wmbs_job.id=wmbs_job_assoc.job 
  WHERE cache_dir like '%TOP-RunIISummer16DR80Premix-00266_1/TOP-RunIISummer16MiniAODv2-00263_0/JobCollection%'
  GROUP BY wmbs_job.id, wmbs_job_mask.firstlumi, wmbs_job_mask.lastlumi
  ORDER BY num_files DESC, wmbs_job.id;

Get the job mask for every job in a workflow, ordered by workflow task:

SELECT wmbs_job_mask.*, wmbs_workflow.task FROM wmbs_job_mask
  INNER JOIN wmbs_job ON wmbs_job.id = wmbs_job_mask.job
  INNER JOIN wmbs_jobgroup ON wmbs_job.jobgroup = wmbs_jobgroup.id
  INNER JOIN wmbs_subscription ON wmbs_jobgroup.subscription = wmbs_subscription.id
  INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
  WHERE wmbs_workflow.name = 'amaltaro_TC_PreMix_Apr2019_Validation_190416_212623_616' ORDER BY wmbs_workflow.task;

Given a file name, find all its run/lumi mapping:

SELECT * FROM wmbs_file_runlumi_map
  WHERE fileid=(SELECT id from wmbs_file_details where lfn='MCFakeFile-pdmvserv_task_HIG-RunIISummer15wmLHEGS-02325__v1_T_190324_065355_1119-HIG-RunIISummer15wmLHEGS-02325_0-a7eaaaa702533cd0213c409632df836f');

Provided a block name (and files if needed), find out which workflow created them:

SELECT dbsbuffer_block.id AS blockid, dbsbuffer_block.blockname AS blockname,
 dbsbuffer_file.id AS fileid, dbsbuffer_file.lfn AS lfn,
 dbsbuffer_workflow.id AS wfid, dbsbuffer_workflow.name AS name FROM dbsbuffer_block
 INNER JOIN dbsbuffer_file ON dbsbuffer_block.id = dbsbuffer_file.block_id
 INNER JOIN dbsbuffer_workflow ON dbsbuffer_file.workflow = dbsbuffer_workflow.id
 WHERE dbsbuffer_block.blockname
 IN ('/TTJets_SingleLeptFromT_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v1/MINIAODSIM#db434bf2-1827-484d-93b6-2317a2a9d422',
 '/TTJets_SingleLeptFromT_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v1/MINIAODSIM#6070c12c-9db4-4a2e-bc06-31210e06d6dc')
 AND dbsbuffer_file.lfn
 IN ('/store/mc/RunIIAutumn18MiniAOD/TTJets_SingleLeptFromT_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/102X_upgrade2018_realistic_v15-v1/70000/78504BE6-7D0C-504D-AC33-4633FB046646.root',
 '/store/mc/RunIIAutumn18MiniAOD/TTJets_SingleLeptFromT_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/102X_upgrade2018_realistic_v15-v1/70000/99D23191-62CC-8A4B-8DDF-175BB2786FFC.root');

Unregistering a WMAgent agent from WMStats watch

The smart way is:
$manage execute-agent wmagent-unregister-wmstats `hostname -f`

The harder way. First you need to source the agent environment:

source /data/srv/wmagent/current/apps/wmagent/etc/profile.d/init.sh
Then run this stuff (changing agent and wmstats url):
from WMCore.Services.WMStats.WMStatsWriter import WMStatsWriter
wmstats = WMStatsWriter("https://cmsweb-testbed.cern.ch/couchdb/wmstats")

agentUrl = "cmsgwms-submit1.fnal.gov:9999"
wmstats.deleteDocsByIDs([agentUrl])

WMAgent deployment

1. Once the agent is correctly drained with no jobs in the system, all blocks files injected, etc. you can then connect to the machine and stop agent and services. Also cross-check all process have been stopped
ps aux | egrep 'wmcore|couch|beam|mysql'
2. Check the validity of the service certificate and remove (or not) the old WMAgent version:
openssl x509 -in /data/certs/servicecert.pem -dates -subject -noout
3. Start a fresh new terminal and update the secrets file:
vim /data/admin/wmagent/WMAgent.secrets
cd /data/srv
4. Get the deployment script from GH:
wget --no-check-certificate https://raw.githubusercontent.com/amaltaro/scripts/master/deployProd.sh
5. Run the deployment script providing arguments in command line:
./deployProd.sh -w 0.9.95b -c HG1406e -s slc5_amd64_gcc461 -t reproc_lowprio -f mysql -n 3 | tee log
6. X-check all the logs to make sure everything succeeded 7. Source the new WMA environment
source /data/admin/wmagent/env.sh
8. Double check the agent configuration
vim config/wmagent/config.py
9. Provided everything is ok, you can start the agent
$manage start-agent

Injecting a PSet configuration file into CouchDB

In order to upload a PSet.py configuration file to central CouchDB, we need both CMSSW and WMCore libraries in the path. 1. log in to lxplus or aiadm 2. create a CMSSW project area:
cd workarea/
source /cvmfs/cms.cern.ch/cmsset_default.sh
export SCRAM_ARCH=slc6_amd64_gcc630
scram project CMSSW_9_3_7
cd CMSSW_9_3_7/src
cmsenv
3. download and change the PSet.py file you want to upload 4. clone the wmcontrol repo:
git clone https://github.com/cms-PdmV/wmcontrol
5. access the wmcontrol/modules dir. Then create your proxy and source the wmcontrol script:
export X509_USER_PROXY=/tmp/x509up_u5514
source /afs/cern.ch/cms/PPD/PdmV/tools/wmclient/current/etc/wmclient.sh
6. finally, open the python interpreter and run something like

from wma import upload_to_couch

step_cfg_name = "/afs/cern.ch/user/a/amaltaro/workarea/CMSSW_9_3_7/src/pset.py"
section_name = "alan_test"
user_name = "amaltaro"
group_name = "DATAOPS"
url = "https://cmsweb-testbed.cern.ch/couchdb"

res = upload_to_couch(step_cfg_name, section_name, user_name, group_name, url=url)

Unpickling a WMAgent job config

Once you have a logArchive tarball, you can see two PSet files, PSet.py is just a wrapper while the PSet.pkl has the real job information. You first need a CMSSW environment (to use the FWCore module):
amaltaro@lxplus135:~ $ cd workarea/trustPU/
source /cvmfs/cms.cern.ch/cmsset_default.sh
export SCRAM_ARCH=slc6_amd64_gcc530
scram project CMSSW_8_1_0
cd CMSSW_8_1_0/src
cmsenv
cp ~/workarea/trustPU/cmsRun1/PSet.pkl .

then you can unpickle and dump its configuration with dumpPython() on the process object.

amaltaro@lxplus135:~/workarea/trustPU/CMSSW_5_3_19/src $ python
import pickle
with open('PSet.pkl') as pf:
    procObj = pickle.load(pf)

#data = procObj.dumpPython()
with open('myPSet.py', 'w') as f:
    f.write(procObj.dumpPython())

Looking for a specific LFN in the agent condor logs

First, we need to get the list of files available for a workflow, inside a specific agent, e.g.:
tree -f -i ../install/wmagent/JobArchiver/logDir/p/pdmvserv_task_SUS-RunIIFall15DR76-00098__v1_T_160223_044155_6562/ | grep tar > alan
Then, we extract the condor.out log file from each of those tarballs and grep for the LFN with:
for f in `cat alan`; do echo $f; tar jxf  $f -x */condor.*.out  -O | grep "2673FE5D-DCDC-E511-A120-001E67396ACC.root"; done > alanLog

Now check the text file alanLog and you should see which tarball has it.

Running a production job interactively

Connect to a machine that has access to EOS at CERN and grab a logArchive tarball. Then you can create a CMSSW environment and run the job like:
cd /data/amaltaro/
source /cvmfs/cms.cern.ch/cmsset_default.sh  # to get scram in the path
export SCRAM_ARCH=slc5_amd64_gcc462
scram project CMSSW_5_3_2_patch5
cd CMSSW_5_3_2_patch5/src
cmsenv
#git cms-addpkg Configuration/PyReleaseValidation
scramv1 b
### untar the logarchive
tar xvzf 6754b712-fab8-11e3-b35b-00155dffc454-0-0-logArchive.tar.gz
cmsRun -e PSet.py 2>err.txt 1>out.txt &

Changing schedd prio factor in the production pool

  1. Connect to vocms97 as _condor user
  2. Check the schedd priorities:
    condor_userprio -all -allusers
  3. And change it to the number you want, e.g.:
    condor_userprio -setfactor cmst1@vocms85.cern.ch 15000.00

Update block (and file) location:

SQL> SELECT * FROM dbsbuffer_block WHERE status!='Closed';
ID BLOCKNAME																				LOCATION CREATE_TIME STATUS		  STATUS3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1611 /ZPrimeToTTJets_M1000GeV_W100GeV_Tune4C_13TeV-madgraph-tauola/Fall13-POSTLS162_V1-v1/GEN-SIM#f3b9fc9c-1bc6-11e4-b97e-003048f02c8a					     217  1407150203 InDBS		  Pending
1606 /RPV_SM2_Muon_TuneZ2star_8TeV-madgraph-tauola/Summer12-START53_V19_FSIM_PU_S12-v1/AODSIM#3eec6722-1be5-11e4-b97e-003048f02c8a					     217  1407163215 InDBS		  Pending

SQL> SELECT * FROM dbsbuffer_location WHERE id=217;
	ID SE_NAME
---------- -------------------------------------------------------------------------------------------------
       217 ntugrid6.phys.ntu.edu.tw

SQL> SELECT * FROM dbsbuffer_location WHERE se_name='ntugrid4.phys.ntu.edu.tw';
	ID SE_NAME
---------- -------------------------------------------------------------------------------------------------
       161 ntugrid4.phys.ntu.edu.tw

SQL> UPDATE dbsbuffer_block SET location=161 WHERE location=217 AND status='InDBS';
2 rows updated.

SQL> SELECT dbsbuffer_file.id, dbsbuffer_file_location.location 
FROM dbsbuffer_file INNER JOIN dbsbuffer_file_location 
ON dbsbuffer_file.id = dbsbuffer_file_location.filename 
WHERE dbsbuffer_file.block_id=1611 OR dbsbuffer_file.block_id=1606;
	ID   LOCATION
---------- ----------
    222253	  217
    222502	  217

SQL> UPDATE (SELECT dbsbuffer_file.id, dbsbuffer_file_location.location
FROM dbsbuffer_file INNER JOIN dbsbuffer_file_location
ON dbsbuffer_file.id = dbsbuffer_file_location.filename
WHERE dbsbuffer_file.block_id=1611 OR dbsbuffer_file.block_id=1606) 
SET location=161;
2 rows updated.

Retrieving logs from castor@CERN:

  1. First log in into aiadm (since some clients are not available in vocms049) and request a stage of the file:
    stager_get -M /castor/cern.ch/cms/store/logs/prod/2015/04/WMAgent/whatever-1-logs.tar
  2. Then you can check the status of this request by (STAGED means you can copy it):
    stager_qry -M /castor/cern.ch/cms/store/logs/prod/2015/04/WMAgent/whatever-1-logs.tar
  3. Finally, you can copy this file with xrdcp:
    xrdcp root://castorcms//castor/cern.ch/cms/store/logs/prod/2015/04/WMAgent/whatever-1-logs.tar .

  1. If you retrieve data from EOS CERN, just:
    xrdcp root://eoscms.cern.ch//store/lhe/10643/8TeV_GoGo_2j_gluino1550_chargino1025_run22892_unwt_aw_winclusive.lhe .

Injecting and running a test workflow with reqmgr.py

  1. Get the ReqMgr client and the template you want to inject:
    wget --no-check-certificate https://raw.githubusercontent.com/dmwm/WMCore/master/test/data/ReqMgr/reqmgr.py
    wget --no-check-certificate https://raw.githubusercontent.com/dmwm/WMCore/master/test/data/ReqMgr/requests/MonteCarlo.json
  2. Then create the request, pointing it to the correct ReqMgr URL (usually in cmsweb). You also need to provide new values for all those "OVERRIDE-ME" fields. E.g.:
    python2.6 reqmgr.py -u https://cmsweb-testbed.cern.ch -f MonteCarlo.json -j '{"createRequest": {"Campaign": "DMWM_Test", "RequestString" : "MonteCarlo_Alan_25Jul2014"}}' -i
  3. The command above will create a new request in ReqMgr (which re-uses a configuration previously injected into couchdb). You should get the request name in the output of that command. E.g.:
    INFO:root:Approving request 'amaltaro_MonteCarlo_Alan_25Jul2014_140725_170927_3909' ...
  4. Now with the request name in hands, you can assign this workflow (again overriding all "OVERRIDE-ME" fields). E.g.
    python2.6 reqmgr.py -u https://cmsweb-testbed.cern.ch -f MonteCarlo.json -r amaltaro_MonteCarlo_Alan_25Jul2014_140725_170927_3909 -j '{"assignRequest": {"SiteWhitelist": ["T1_US_FNAL"], "Team": "testbed-dmwm", "dashboard": "test", "AcquisitionEra": "DMWM_Test", "ProcessingString": "MC_TEST_Alan_25Jul2014"}}' -g
  5. That's all, now you have to monitor your workflow...

SQL hints

  • Updating rogue jobs in the database that are still marked as executing but are gone long ago:
    # mysql> UPDATE wmbs_job SET state=(SELECT id from wmbs_job_state WHERE name='cleanout') WHERE state=(SELECT id from wmbs_job_state WHERE name='executing');

  • Selecting only duplicate values:
    # mysql> SELECT lfn, COUNT(*) FROM dbsbuffer_file GROUP BY lfn HAVING COUNT(*) > 1;
  • Getting table constraints information (Oracle):
    # SQL> SELECT cons.constraint_type, cons.constraint_name, cols.column_name
       FROM all_constraints cons, all_cons_columns cols
       WHERE cols.table_name = 'DBSBUFFER_FILE'
       AND cons.constraint_name = cols.constraint_name;
  • Getting the PhEDEx subscription status for a given dataset:
    # mysql> SELECT * FROM dbsbuffer_dataset_subscription WHERE id=1252;
  • Then you can see to each dataset this id belongs to:
    # mysql> SELECT * FROM dbsbuffer_dataset WHERE id=211;
  • In case you want to update the site that this subscription is supposed to be done (e.g. T0_CH_CERN_MSS):
    # mysql> UPDATE dbsbuffer_dataset_subscription SET site='T0_CH_CERN' WHERE id=1252 AND subscribed=0;
  • Selecting all subscriptions_id, dataset name and site for subscriptions that need to be made to T0_CH_CERN_MSS:
    # mysql> SELECT dbsbuffer_dataset_subscription.id, dbsbuffer_dataset_subscription.site, dbsbuffer_dataset.path
       FROM dbsbuffer_dataset INNER JOIN dbsbuffer_dataset_subscription 
       ON dbsbuffer_dataset.id=dbsbuffer_dataset_subscription.dataset_id 
       WHERE dbsbuffer_dataset_subscription.site='T0_CH_CERN_MSS';
  • Select and group dataset_id (print how many times they appear in the table):
    # mysql> SELECT dataset_id,count(*) FROM dbsbuffer_dataset_subscription WHERE site='T0_CH_CERN_MSS' OR site='T0_CH_CERN' GROUP BY dataset_id;
  • Getting all wmbs subscriptions and their status for a specific workflow:
    # mysql> SELECT wmbs_workflow.id AS wf_id, wmbs_subscription.id AS sub_id, wmbs_workflow.task, wmbs_subscription.finished
       FROM wmbs_workflow INNER JOIN wmbs_subscription 
       ON wmbs_workflow.id=wmbs_subscription.workflow
       WHERE wmbs_workflow.name='amaltaro_ReReco_Agent105_CDB16_forceComplete_150413_202829_507';
  • Checking files available (jobs) and their location for a specific subscription:
    # SELECT wmbs_sub_files_available.fileid, wlpnn.pnn AS pnn
                    FROM wmbs_sub_files_available
               INNER JOIN wmbs_file_location ON
                 wmbs_sub_files_available.fileid = wmbs_file_location.fileid
               INNER JOIN wmbs_location_pnns wlpnn ON
                 wmbs_file_location.location = wlpnn.location
             WHERE wmbs_sub_files_available.subscription = 24125;
  • NEW Getting a map of PSN and given a PNN:
    # select wmbs_location.id as site_id, wmbs_location.site_name, wmbs_location.state, wmbs_pnns.pnn from wmbs_location_pnns
         inner join wmbs_pnns on wmbs_location_pnns.pnn = wmbs_pnns.id
         inner join wmbs_location on wmbs_location_pnns.location = wmbs_location.id
         where wmbs_pnns.pnn in ('T2_UK_London_IC', 'T2_UK_SGrid_RALPP', 'T2_UK_London_Brunel');
  • Grouping available files by subscription:
    # select subscription,count(*) from wmbs_sub_files_available group by subscription;
  • Show all tables in MySQL:
    # mysql> SHOW tables;
  • Getting thresholds per site per task:
    # mysql> SELECT wmbs_location.site_name as site, wmbs_sub_types.name AS task, rc_threshold.max_slots, rc_threshold.pending_slots 
       FROM wmbs_location INNER JOIN rc_threshold ON rc_threshold.site_id = wmbs_location.id
       INNER JOIN wmbs_sub_types ON wmbs_sub_types.id = rc_threshold.sub_type_id ORDER BY site;
  • Describing a table in MySQL:
    # mysql> DESCRIBE table_name;
  • Printing a general overview of tables and their column names and sizes in MySQL:
    # mysql> SELECT TABLE_NAME,COLUMN_NAME,DATA_TYPE,CHARACTER_MAXIMUM_LENGTH,COLUMN_TYPE FROM information_schema.columns WHERE table_schema = 'wmagent';
  • Changing a table to increase a column's length:
    # mysql> ALTER TABLE wmbs_fileset MODIFY name varchar(650);
  • Find subscriptions not finished:
    # SQL> select wmbs_subscription.workflow, wmbs_subscription.id, wmbs_subscription.fileset, wmbs_fileset.open FROM wmbs_subscription
       INNER JOIN wmbs_fileset ON wmbs_subscription.fileset = wmbs_fileset.id  where wmbs_subscription.finished = 0 ORDER BY wmbs_subscription.workflow;
  • Find files available for a subscription not finished:
    # SQL> select wmbs_subscription.workflow, wmbs_subscription.id, wmbs_subscription.fileset, wmbs_sub_files_available.fileid 
       FROM wmbs_sub_files_available INNER JOIN wmbs_subscription ON wmbs_sub_files_available.subscription = wmbs_subscription.id
       WHERE wmbs_subscription.finished = 0;
  • Find files acquired for a subscription not finished:
    # SQL> select wmbs_subscription.workflow, wmbs_subscription.id, wmbs_subscription.fileset, wmbs_sub_files_acquired.fileid 
       FROM wmbs_sub_files_acquired INNER JOIN wmbs_subscription ON wmbs_sub_files_acquired.subscription = wmbs_subscription.id
       WHERE wmbs_subscription.finished = 0 and rownum < 3;
* Getting available LFNs for subscriptions not finished:
    # SQL> select wmbs_subscription.workflow, wmbs_subscription.id, wmbs_subscription.fileset, wmbs_sub_files_available.fileid, wmbs_file_details.lfn
       FROM wmbs_sub_files_available INNER JOIN wmbs_subscription ON wmbs_sub_files_available.subscription = wmbs_subscription.id
       AND wmbs_subscription.finished = 0 
       INNER JOIN wmbs_file_details ON wmbs_sub_files_available.fileid = wmbs_file_details.id and rownum < 3;
  • Insert a row into an oracle table:
    # SQL> INSERT INTO REQMGR_REQUEST_TYPE
       (TYPE_ID, TYPE_NAME) VALUES (122, 'StepChain');
  • Show all tables in Oracle SQL:
    # SQL> SELECT tname FROM tab;
  • Show all indexes and columns of a table in Oracle:
    # SQL> SELECT b.uniqueness, a.index_name, a.table_name, a.column_name 
         FROM  all_ind_columns a, all_indexes b
         WHERE a.index_name=b.index_name 
         AND a.table_name = upper('WMBS_PNNS')
        ORDER by a.table_name, a.index_name, a.column_position;
  • Describing a table in Oracle SQL:
    # SQL> DESC table_name;

Workflow of request injection into ReqMgr

First we call the putRequest() method from (L480):
WMCore/HTTPFrontEnd/RequestManager/ReqMgrRESTModel.py
then a makeRequest() call is made (L468):
WMCore/HTTPFrontEnd/RequestManager/ReqMgrWebTools.py
which will build and validate(?) the request schema (L367):
WMCore/HTTPFrontEnd/RequestManager/ReqMgrWebTools.py
falling into this buildWorkloadForRequest() call (L99):
WMCore/RequestManager/RequestMaker/Registry.py
which finally instantiates the correct request spec file and it's parent (StdBase usually):
WMCore/WMSpec/StdSpecs/StdBase.py
WMCore/WMSpec/StdSpecs/StepChain.py

Debugging MySQL (MariaDB) issues ('lock wait timeout exceeded'):

* Connect to the database and first let's check what's the lock wait timeout:
    # mysql> show variables like 'innodb_lock_wait_timeout';
* Now we can see the list of locked tables:
    # mysql> show open tables where in_use>0;
* Now we list the current process, one of them must be locking our table (time is in secs):
    # mysql> show processlist;
* Kill the greedy process:
    # mysql> kill put_process_id_here;

* IF we want to have much more details about these events, we can issue a:

    # mysql> show engine innodb status\G 

My Personal Preferences

  • Show tool-tip topic info on mouse-over of WikiWord links, on or off:
    • Set LINKTOOLTIPINFO = off

  • More preferences TWiki has system wide preferences settings defined in TWikiPreferences. You can customize preferences settings to your needs: To overload a system setting, (1) do a "raw view" on TWikiPreferences, (2) copy a Set VARIABLE = value bullet, (3) do a "raw edit" of your user profile page, (4) add the bullet to the bullet list above, and (5) customize the value as needed. Make sure the settings render as real bullets (in "raw edit", a bullet requires 3 or 6 spaces before the asterisk).

Related Topics

UserForm
First Name Alan
Last Name Malta Rodrigues
Titles

Email alan.malta@cernNOSPAMPLEASE.ch
Telephone 022 76 75815
Mobile 076 576 7467
Skype ID

Department PH
Organization CERN
URL http://amaltaro.web.cern.ch/amaltaro/
Location Segny
Region Pays des Gex
Country France
Status Update

Edit personal data
Edit | Attach | Watch | Print version | History: r217 < r216 < r215 < r214 < r213 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r217 - 2019-12-18 - AlanMalta
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback