BigPanDA HTCondor API

Introduction

BigPanDA HTCondor API Requirements

  • Basic job monitoring interface for Condor jobs
    • Define DB schema and curl API for sending Condor status info to a PanDA monitoring database
    • Produce the curl commands for adding a job record, updating a job record and removing a job record in the DB.
    • Make available to the Condor team this API with MySQL DB behind and corresponding monitoring web interface
  • Unique Condor job ID
    • Yes, job in Condor has two Ids - one local and one global. The global is a concatenation of the SchedD name, the job id and maybe something else that Todd knows ...
              Correct, upon submission into Condor, the job is assigned an attribute 'GlobalJobId' which is a concatenation of the SchedD name, the job id, and the submission time (as an epoch).
      
  • Doc: http://research.cs.wisc.edu/htcondor/manual/current/condor_q.html
  • README, schema, example call script

API Requirements - Requests for comment

  • Any suggestions for the data types in the schema (README)? Is the internal HTCondor schema for data storage available, please?

Monitoring

  • http://pandawms.org/bigpandamon/htcondorjobs/
  • As of 2013-12-06 it contains modified data taken from several ATLAS Pilot Factory machines on 22nd November 2013.
  • Features:
    • table with basic list of columns
    • show/hide full details of a HTCondor job by click on +/- image in the first column
    • URL points to a stdout and stderr log record from condor_q. N.B. that not URL records were correct in condor job information, and not all aipanda0XY machines are open outside CERN.

Monitoring - Requests for comment

  • What should be the default view time span once the data will be updated regularly? "Last 24 hrs"?
  • Is the list of visible columns OK? Should other columns be added?
  • What should be the behaviour of "removed" jobs? Show them by default with "removed" flag, or add filter field to show/not show removed jobs?
  • What summary information is good to have? We plan to introduce summary level info such that one can get a high level summary of e.g. schedd within the scope of the monitoring and drill down from there, but we would like to hear your guidance as to what summaries you would like to see and how you would like to see them organized.

API Description

  • Three methods available: addjob, updatejob, removejob .
  • Bulk operation available for all 3 methods.
  • Example script htcondorapi2.sh of all 3 method calls available in SVN.
  • Example script testFillCondorData.sh to insert jobs with data stored in json file is also available in SVN.
  • Send a POST request via HTTPS with VOMS proxy authentication.
  • Responses:
    • Successful request:
      • Returned data: repeat data which was inserted
      • Return code:
        • HTTP_201_CREATED: User provided correct information for all jobs inserted.
        • HTTP_202_ACCEPTED: User provided correct information for properties of all jobs updated/flagged as removed.
    • Partially unsuccessful request:
      • Returned data: list of dictionaries with internal error codes and error message
        • dictionary format: {'ret': returnCode, 'desc': errorMessage}
        • internal return error codes:
          • Methods addjob, updatejob, removejob: internal error code HTCAPI_MISSING_FIELD=10
          • Methods addjob, updatejob, removejob: internal error code HTCAPI_CANNOT_SAVE=20
          • Methods updatejob, removejob: internal error code HTCAPI_CANNOT_UPDATE=30
      • Return code:
        • HTTP_401_UNAUTHORIZED: User did not use HTTPS or is using a limited VOMS proxy.
        • HTTP_403_FORBIDDEN: User is banned from calling this method of the PanDA HTCondor API.
        • HTTP_400_BAD_REQUEST: User provided incorrect information for some of the jobs inserted/updated/removed. The jobs with correct description were successfully inserted.
  • A single job is represented as a dictionary with mandatory and optional keys.
    • Mandatory keys:
      • Methods addjob, updatejob: mandatory fields globaljobid and wmsid.
      • Method removejob: only globaljobid is mandatory.
    • Optional keys for methods addjob, updatejob: condorid, owner, submitted, run_time, st, pri, size, cmd, host, status, manager, executable, goodput, cpu_util, mbps, read_, write_, seek, xput, bufsize, blocksize, cpu_time, p_start_time, p_end_time, p_modif_time, p_factory, p_schedd, p_description, p_stdout, p_stderr.
      • removejob method does not recognise any field other than mandatory globaljobid.
    • See the README file for data types and field meaning description.

Prerequisites

API Call to add HTCondor jobs addjob

  • http://pandawms.org/bigpandamon/api-auth/htcondorapi/addjob/
  • Send a POST request via HTTPS with VOMS proxy authentication.
  • Responses:
    • Successful insertion: HTTP_201_CREATED.
    • Partially unsuccessful insertion:
      • internal return error codes: HTCAPI_MISSING_FIELD, HTCAPI_CANNOT_SAVE.
      • Return code: HTTP_401_UNAUTHORIZED, HTTP_403_FORBIDDEN, HTTP_400_BAD_REQUEST.
  • Mandatory keys: globaljobid and wmsid.
  • addjob example call

API Call to update HTCondor jobs updatejob

  • http://pandawms.org/bigpandamon/api-auth/htcondorapi/updatejob/
  • Responses:
    • Successful update: HTTP_202_ACCEPTED.
    • Partially unsuccessful update:
      • internal return error codes: HTCAPI_MISSING_FIELD, HTCAPI_CANNOT_SAVE, HTCAPI_CANNOT_UPDATE.
      • Return code: HTTP_401_UNAUTHORIZED, HTTP_403_FORBIDDEN, HTTP_400_BAD_REQUEST.
  • Mandatory keys: globaljobid and wmsid.
  • updatejob example call

API Call to remove HTCondor jobs removejob

API Calls - Requests for comment

  • DigiCertGrid support on grid UI - in early 2014 an OSG host certificate will be installed on the machine and used.
  • Are mandatory fields OK for each API call method?
  • Are response exit/error codes enough descriptive?
  • Please provide DN of the certificate which will be used to interact with PanDA HTCondor API.

Example call of API

Example call of API -- add HTCondor jobs addjob

  • addjob API description
  • echo "Add HTCondor job - multiple jobs"
    
    ### Add HTCondor job ###
    ### Multiple jobs
    ### Expected fields: 'globaljobid', 'wmsid'
    ### Optional fields: 'condorid', 'owner', 'submitted', 'run_time', 'st', 'pri', 'size', 'cmd', 'host', 'status', 'manager', 'executable', 'goodput', 'cpu_util', 'mbps', 'read_', 'write_', 'seek', 'xput', 'bufsize', 'blocksize', 'cpu_time', 'p_start_time', 'p_end_time', 'p_modif_time', 'p_factory', 'p_schedd', 'p_description', 'p_stdout', 'p_stderr'
    
    ### Proxy config
    PROXY="/data/${USER}/testing/proxy_up"
    CAPATH="/etc/pki/tls/certs"
    CACERT="/etc/pki/tls/certs/ca-bundle.crt"
    
    ### API config
    API_HOST="https://pandawms.org/bigpandamon"
    API_URI="/api-auth/htcondorapi/addjob/"
    
    ### Fake data generation, feel free to replace by your data
    RANGE=150 
    GLOBAL_JOB_ID="\"globaljobid\": \"sched.$(date +%y%m%d).$(date +%s -u)\""
    WMSID="\"wmsid\": $(($RANDOM % $RANGE + 1650))"
    GLOBAL_JOB_ID_1="\"globaljobid\": \"sched.$(date +%y%m%d).$(date +%s -u)\""
    WMSID_1="\"wmsid\": $(($RANDOM % $RANGE + 1650))"
    API_POST_DATA='['"{${GLOBAL_JOB_ID}, ${WMSID},"' "condorid": "2.0", "p_stderr": "n/a", "write_": 0, "xput": 0, "blocksize": 0, "pri": 0, "manager": "n/a", "p_description": "n/a", "owner": 0, "bufsize": 0, "seek": 0, "size": 0, "executable": "n/a", "p_start_time": "2013-11-14T23:00:00Z", "cpu_util": 0,  "goodput": 0, "read_": 0, "status": "PENDING", "p_schedd": "n/a", "host": "n/a", "run_time": 0, "removed": 0, "cpu_time": "0", "p_modif_time": "1970-01-01T00:00:00Z", "mbps": 0, "cmd": "n/a", "submitted": "1970-01-01T00:00:00Z", "st": "I", "p_stdout": "n/a", "p_factory": "n/a", "p_end_time": "1970-01-01T00:00:00Z"}'",""{${GLOBAL_JOB_ID_1}, ${WMSID_1},"' "condorid": "2.0", "p_stderr": "n/a", "write_": 0, "xput": 0, "blocksize": 0, "pri": 0, "manager": "n/a", "p_description": "n/a", "owner": 0, "bufsize": 0, "seek": 0, "size": 0, "executable": "n/a", "p_start_time": "2013-11-14T23:00:00Z", "cpu_util": 0,  "goodput": 0, "read_": 0, "status": "PENDING", "p_schedd": "n/a", "host": "n/a", "run_time": 0, "removed": 0, "cpu_time": "0", "p_modif_time": "1970-01-01T00:00:00Z", "mbps": 0, "cmd": "n/a", "submitted": "1970-01-01T00:00:00Z", "st": "I", "p_stdout": "n/a", "p_factory": "n/a", "p_end_time": "1970-01-01T00:00:00Z"}'"]"
    
    ### API call to add jobs
    curl --silent -compressed -X POST -H "Content-Type: application/json" --capath "${CAPATH}" --cacert ${CACERT} --cert ${PROXY} --key ${PROXY} -m 600 -d "${API_POST_DATA}" "${API_HOST}${API_URI}"
    
    

Example call of API -- update HTCondor jobs updatejob

  • updatejob API description
  • echo "Update HTCondor jobs"
    
    ### Update HTCondor job ###
    ### Multiple jobs
    ### Expected fields: 'globaljobid', 'wmsid'
    ### Optional fields: 'condorid', 'owner', 'submitted', 'run_time', 'st', 'pri', 'size', 'cmd', 'host', 'status', 'manager', 'executable', 'goodput', 'cpu_util', 'mbps', 'read_', 'write_', 'seek', 'xput', 'bufsize', 'blocksize', 'cpu_time', 'p_start_time', 'p_end_time', 'p_modif_time', 'p_factory', 'p_schedd', 'p_description', 'p_stdout', 'p_stderr'
    
    ### Proxy config
    PROXY="/data/${USER}/testing/proxy_up"
    CAPATH="/etc/pki/tls/certs"
    CACERT="/etc/pki/tls/certs/ca-bundle.crt"
    
    ### API config
    API_HOST="https://pandawms.org/bigpandamon"
    API_URI="/api-auth/htcondorapi/updatejob/"
    
    ### Fake data generation, feel free to replace by your data
    RANGE=150 
    GLOBAL_JOB_ID="\"globaljobid\": \"sched.131115.1384548939\""
    WMSID="\"wmsid\": 1676"
    NEWSTATUS="RUNNING"
    GLOBAL_JOB_ID_1="\"globaljobid\": \"sched.131115.1384549649\""
    WMSID_1="\"wmsid\": 1796"
    NEWSTATUS_1="HOLDING"
    API_POST_DATA="[""{${GLOBAL_JOB_ID}, ${WMSID},"' "condorid": "2.0", "p_stderr": "n/a", "write_": 0, "xput": 0, "blocksize": 0, "pri": 0, "manager": "n/a", "p_description": "n/a", "owner": 0, "bufsize": 0, "seek": 0, "size": 0, "executable": "n/a", "p_start_time": "2013-11-14T23:00:00Z", "cpu_util": 0,  "goodput": 0, "read_": 0, "status": "'"${NEWSTATUS}"'", "p_schedd": "n/a", "host": "n/a", "run_time": 0, "removed": 0, "cpu_time": "0", "p_modif_time": "'$(date +%FT%H:%M:%SZ -u)'", "mbps": 0, "cmd": "n/a", "submitted": "1970-01-01T00:00:00Z", "st": "I", "p_stdout": "n/a", "p_factory": "n/a", "p_end_time": "1970-01-01T00:00:00Z"}'",""{${GLOBAL_JOB_ID_1}, ${WMSID_1},"' "condorid": "2.0", "p_stderr": "n/a", "write_": 0, "xput": 0, "blocksize": 0, "pri": 0, "manager": "n/a", "p_description": "n/a", "owner": 0, "bufsize": 0, "seek": 0, "size": 0, "executable": "n/a", "p_start_time": "2013-11-14T23:00:00Z", "cpu_util": 0,  "goodput": 0, "read_": 0, "status": "'"${NEWSTATUS_1}"'", "p_schedd": "n/a", "host": "n/a", "run_time": 0, "removed": 0, "cpu_time": "0", "p_modif_time": "'$(date +%FT%H:%M:%SZ -u)'", "mbps": 0, "cmd": "n/a", "submitted": "1970-01-01T00:00:00Z", "st": "I", "p_stdout": "n/a", "p_factory": "n/a", "p_end_time": "1970-01-01T00:00:00Z"}'"]"
    
    ### API call to update jobs
    curl --silent -compressed -X POST -H "Content-Type: application/json" --capath "${CAPATH}" --cacert ${CACERT} --cert ${PROXY} --key ${PROXY} -m 600 -d "${API_POST_DATA}" "${API_HOST}${API_URI}"
    
    

Example call of API -- remove HTCondor jobs removejob

  • removejob API description
  • echo "Remove HTCondor job"
    
    ### Remove HTCondor job ###
    ### Expected field: 'globaljobid'
    ### Optional fields: none.
    
    ### Proxy config
    PROXY="/data/${USER}/testing/proxy_up"
    CAPATH="/etc/pki/tls/certs"
    CACERT="/etc/pki/tls/certs/ca-bundle.crt"
    
    ### API config
    API_HOST="https://pandawms.org/bigpandamon"
    API_URI="/api-auth/htcondorapi/removejob/"
    
    ### Fake data generation, feel free to replace by your data
    API_POST_DATA='[{"globaljobid": "sched.131115.1384541922"}, {"globaljobid": "sched.131115.1384541928"}]'
    
    ### API call to flag jobs removed
    curl --silent -compressed -X POST -H "Content-Type: application/json" --capath "${CAPATH}" --cacert ${CACERT} --cert ${PROXY} --key ${PROXY} -m 600 -d "${API_POST_DATA}" "${API_HOST}${API_URI}"
    
    






Major updates:

-- JaroslavaSchovancova - 06-Dec-2013






Responsible: JaroslavaSchovancova

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2014-01-16 - JaroslavaSchovancova
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback