Things That go Wrong

Pilots Aborted at certain Sites or CEs

When the PilotSummary shows a high abort-rate for some CE. Look at the pilot output e.g.

******  JobID=[https://grid-cr2.desy.de:8443/CREAM429777624]
   Current Status = [ABORTED]
   Working Dir    = [[reserved]]
   ExitCode       = []
   FailureReason  = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Unknown queue MSG=cannot locate queue-) N/A (jobId = CREAM429777624)]
   Description    = [submission to BLAH failed [retry count=3]]
   Grid JobID     = [N/A]
   LRMS Abs JobID = [[reserved]]
   LRMS JobID     = [[reserved]]
   Deleg Proxy ID = [D54E25F8-B54E-5554-E958-FFB4B3397F60]
   DelegProxyInfo = [[ isRFC="false"; valid from="10/29/14 9:18 PM (GMT)"; valid to="11/3/14 8:06 PM (GMT)"; holder DN="CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; holder AC issuer="CN=proxy,CN=proxy,CN=proxy,CN=proxy,CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; VO="ilc"; AC issuer="CN=host/grid-voms.desy.de, OU=DESY, O=GermanGrid, C=DE"; VOMS attributes={ /ilc/Role=NULL/Capability=NULL, /ilc/fcal/Role=NULL/Capability=NULL } ]]
   Worker Node    = [N/A]
   Local User     = [ilcusr053]
   CREAM ISB URI  = [gsiftp://grid-cr2.desy.de/var/cream_sandbox/ilcusr/CN_Andre_Sailer_CN_683529_CN_sailer_OU_Users_OU_Organic_Units_DC_cern_DC_ch_ilc_Role_NULL_Capability_NULL_ilcusr053/42/CREAM429777624/ISB]
   CREAM OSB URI  = [gsiftp://grid-cr2.desy.de/var/cream_sandbox/ilcusr/CN_Andre_Sailer_CN_683529_CN_sailer_OU_Users_OU_Organic_Units_DC_cern_DC_ch_ilc_Role_NULL_Capability_NULL_ilcusr053/42/CREAM429777624/OSB]
   JDL            = [[ StdOutput = "d4HYF8.out"; BatchSystem = "pbs"; QueueName = "emi2-sl6"; Executable = "DIRAC_iSOJXR_pilotwrapper.py"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "d4HYF8.out","d4HYF8.err" }; InputSandbox = { "/opt/dirac/data/work/SiteDirector/DIRAC_iSOJXR_pilotwrapper.py" }; StdError = "d4HYF8.err" ]]
   Type           = [Normal]

Notice the "Unknown queue" and the QueueName in the second to last line. This might mean that the queue no longer exists or is currently unavailable. Check gstat for the queues at this CE. or run the lcg-infosites --vo ilc ce command (grep for given site)

-- AndreSailer - 2014-10-30

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2014-12-08 - AndreSailer
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CLIC All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback