Condor Commands

Is Condor running?

command description example
ps aux l grep condor_{master,schedd,collector,negotiator} To check if condor or any condor daemon is running ps aux l grep condor_master

Checking pool status

condor_status

command description example
condor_status -master List machines, but only names (status and slots are not shown)  
condor_status -avail List those slots that are not busy and could run HTCondor jobs at this moment  
condor_status -run List slots that are currently running jobs and show related information (owner of each job, machine where it was submitted from, etc.)  
condor_status -state -total List a summary according to the state of each slot  
condor_status -submitters Show information about the current general status, like number of running, idle and held jobs (and submitters)  
condor_status machine Show the status of a specific machine  
condor_status -sort Memory Sort slots by Memory, you can try also with other attributes  
condor_status -server List attributes of slots, like memory, disk, load, flops, etc.  
condor_status -schedd Shows the lits of schedds attached to the collector - can be done from both schedd and collector NOTE - if more than one collector is attached, it will choose a random collector to query!
condor_status -schedd -pool (collector) shows the schedds attached to that collector only condor_status -schedd -pool neut.cern.ch
condor_status -neg shows the current negotiator being used, can be used to identify what pool you are in  

Submitting Jobs

  • The submit directory has to be accesible from all machines.
  • Make sure you run condor_submit from a directory found on all machines (such as your own home directory tree).
  • Once you find your job is running on a particular machine, it seems to be fun to login and use see how it's doing (with ps, top, or vmstat). Resist! If Condor sees a user login to its machine, it will suspend your Condor job there, and wont restart until 15 minutes after you have logged out.

condor_submit

command description example
condor_submit submit_file -dry-run dest_file this option parses the submit file and saves all the related info (name and locations of input and output files after expanding all variables, value of requirements, etc.) to dest_file, but real jobs are not submitted. Using this option is highly recommended when debugging or before the real submission if you have made some modifications in your submit file and you are not sure whether they will work.  
condor_submit submit_file 'var=value' add or modify variable(s) at submission time, without changing the submit file. For instance, if you are using queue $(N) in your submit file, then condor_submit submit_file 'N = 10' will submit 10 jobs. You can specify several pairs of var=value  
More information about submitting jobs visit this link: Checking and managing submitted jobs  

Checking and managing submitted jobs

condor_q

command description example
condor_q shows the current queue on the machine (schedd).
Show my jobs in HTCondor queue and their ids (cluster.process), info and status (I: idle (waiting for a machine to execute on), R: running, H: on hold (there was an error, waiting for userís action), S: suspended, C: completed, X: removed, <: transferring input and >: transferring output)
 
condor_q -name (schedd_name) done on a collector - shows the queue of that schedd condor_q -name neut.cern.ch
condor_q -global Show all users' jobs in the queue
condor_q -analyze Analyse a specific job and show the reason why it is in its current state (useful for those jobs in Idle status: Condor will show us how many slots match our restrictions and may give us suggestion)  
condor_q -better-analyze Analyse a specific job and show the reason why it is in its current state, giving extended info
condor_q -run Show your running jobs and related info, like how much time they have been running, in which machine, etc..
condor_q -currentrun Show the consumed on the current run, the cumulative time from last executions will not be used (you can combine also with -run flag to see only the running processes at this moment)  
condor_q -hold Show only jobs in the "on hold" state and the reason for that. Held jobs are those that got an error so they could not finish. An action from the user is expected to solve the problem, and then he should use the condor_release command in order to check the job again  
condor_q -l (job #) lists the classads of that job. Useful for grepping condor_q -l 413.1 l grep -i glideinentryname
condor_q -const 'condor_var=?="string"' shows only the jobs matching the constraint, where condor_var = a string DESIRED_Sites=?="T1_US_FNAL" - shows only jobs that ask to run at FNAL
condor_q -const 'condor_var==5' shows only the jobs matching the contraing, where condor_var = number example: "jobstatus==5" - shows all jobs with jobstatus==5, meaning held jobs. The grep for "^holdreason = "
condor_q -format '\n%s' condor_var lists in whatever format (like "\n%s") condor_var of all jobs in the queue  

condor_tail and condor_hold

command description example
condor_tail Display on screen the last lines of the stdout (screen) of a running job on a remote machine. You can use this command to check whether your job is working fine, you can also visualize error (stderr) or output files created by your program  
condor_tail -f Do not stop displaying the content, it will be displayed until interrupted with Ctrl+C  
condor_tail -no-stdout output_file Show the content of an output file (output_file has to be listed in the transfer_output_files command in the submit file)  
condor_release -constraint constraint Release all my held jobs that satisfy the constraint  
Note: Jobs with on hold state are those that HTCondor was not able to properly execute, usually due to problems with executable, paths, etc. If you can solve the problems changing the input files and/or the executable, then you can use condor_release command to run again your program since it will send again all files to the remote machines. If you need to change the submit file to solve the problems, then condor_release will NOT work because it will not evaluate again the submit file. In that case you can use condor_qedit or cancel all held jobs and re-submit them again.  
condor_release -all Release all my held jobs  
condor_hold cluster_id Hold all jobs of a specific submission  
condor_hold -constraint constraint Hold all jobs that satisfy the constraint  
condor_hold -all Hold all my jobs from the queue  

condor_rm

command description example
condor_rm (job #) removes job # condor_rm 417.0, condor_rm 417 (for the whole cluster)
condor_rm (cluster #) Remove all jobs of a specific submission
condor_rm -const (expr) removes all jobs with constraint expr condor_rm -const 'jobstatus==5' - removes all held jobs, or condor_rm -const 'desired_sites=?="T2_CH_CERN"' removes jobs asking to go to CERN (only)
condor_rm -all Remove all my jobs from the queue  

Getting info from logs

condor_history

command description example
condor_history Show all completed jobs to date (it has to be run in the same machine where the submission was done). shows all completed jobs - can be used like condor_q (-l, -const, -format...)  
condor_history -userlog file.log list basic information registered in the log files (use condor_logview <file.log> to see information in graphic mode) (-l, -const, -format...)  
condor_userlog file.log Show and summarize job statistics from job log files (those created when using log command in the submit file)  
condor_logview file.log This is not an original HTCondor command, we have created this link to the script that allows you to display graphical information contained in the log of your executions  

condor_history -long XXX.YYY | grep LastRemoteHost: show machine where job XXX.YYY was executed

Note:There is also an online tool to analyze your log files and get more information:
HTCondor Log Analyzer (http://condorlog.cse.nd.edu/ )

Other Commands

condor_submit_dag dag_file: Submit a DAG file, used to describe jobs with dependencies

condor_version: Print the version of HTCondor.

condor_qedit: use this command to modify the attributes of a job placed on the queue. This may be useful when you need to change some of the parameters specified in the submit file without re-submitting jobs.

condor_compile: Relink a program with HTCondor libraries so it can be used in the standard universe where checkpoints are enable. Relinked programs can be also executed as an standalone checkpointing executable, what means that you can run it directly in your shell (no HTCondor submission is needed) and create specific or periodic checkpoints that allow you to recover the execution in case of problems.

Changing Priorities

Note: Condor_userprio is a list of priority of the machines - not an actual representation of what machines are in the pool or even exist. condor_userprio will hold on to hold entries of machines that are no longer needed (in another pool, another user, decommissioned, etc). To remove an old machine's priority listing, use condor_userprio -delete (user@old_machine).
command description example
condor_userprio -setfactor user@machineNOSPAMPLEASE.cern.ch xxxx.xx Change priority of machine user@machineNOSPAMPLEASE.cern.ch to have priority xxxx.xx condor_userprio -setfactor @neut.cern.ch 1000.00
condor_userprio -all -allusers Show priority of all machines in pool

HTCondor manual

HTCondor manual link



Arrow blue up Back to the CERN Neutrino Platform Computing Cluster Main Page Arrow blue up Back to the CERN Neutrino Platform-Computing Main Page
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2017-03-28 - NectarB
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback