Things That go Wrong

Pilots Aborted at certain Sites or CEs

When the PilotSummary shows a high abort-rate for some CE. Look at the pilot output e.g.

******  JobID=[https://grid-cr2.desy.de:8443/CREAM429777624]
   Current Status = [ABORTED]
   Working Dir    = [[reserved]]
   ExitCode       = []
   FailureReason  = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Unknown queue MSG=cannot locate queue-) N/A (jobId = CREAM429777624)]
   Description    = [submission to BLAH failed [retry count=3]]
   Grid JobID     = [N/A]
   LRMS Abs JobID = [[reserved]]
   LRMS JobID     = [[reserved]]
   Deleg Proxy ID = [D54E25F8-B54E-5554-E958-FFB4B3397F60]
   DelegProxyInfo = [[ isRFC="false"; valid from="10/29/14 9:18 PM (GMT)"; valid to="11/3/14 8:06 PM (GMT)"; holder DN="CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; holder AC issuer="CN=proxy,CN=proxy,CN=proxy,CN=proxy,CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; VO="ilc"; AC issuer="CN=host/grid-voms.desy.de, OU=DESY, O=GermanGrid, C=DE"; VOMS attributes={ /ilc/Role=NULL/Capability=NULL, /ilc/fcal/Role=NULL/Capability=NULL } ]]
   Worker Node    = [N/A]
   Local User     = [ilcusr053]
   CREAM ISB URI  = [gsiftp://grid-cr2.desy.de/var/cream_sandbox/ilcusr/CN_Andre_Sailer_CN_683529_CN_sailer_OU_Users_OU_Organic_Units_DC_cern_DC_ch_ilc_Role_NULL_Capability_NULL_ilcusr053/42/CREAM429777624/ISB]
   CREAM OSB URI  = [gsiftp://grid-cr2.desy.de/var/cream_sandbox/ilcusr/CN_Andre_Sailer_CN_683529_CN_sailer_OU_Users_OU_Organic_Units_DC_cern_DC_ch_ilc_Role_NULL_Capability_NULL_ilcusr053/42/CREAM429777624/OSB]
   JDL            = [[ StdOutput = "d4HYF8.out"; BatchSystem = "pbs"; QueueName = "emi2-sl6"; Executable = "DIRAC_iSOJXR_pilotwrapper.py"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "d4HYF8.out","d4HYF8.err" }; InputSandbox = { "/opt/dirac/data/work/SiteDirector/DIRAC_iSOJXR_pilotwrapper.py" }; StdError = "d4HYF8.err" ]]
   Type           = [Normal]

Notice the "Unknown queue" and the QueueName in the second to last line. This might mean that the queue no longer exists or is currently unavailable. Check gstat for the queues at this CE. or run the lcg-infosites --vo ilc ce command (grep for given site)

-- AndreSailer - 2014-10-30


This topic: CLIC > Detector > CLICSoftwareComputing > Software > DiracUsage > DiracForAdmins > ILCDiracTypicalErrorCases
Topic revision: r2 - 2014-12-08 - AndreSailer
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback