Things That go Wrong

Pilots Aborted at certain Sites or CEs

When the PilotSummary shows a high abort-rate for some CE. Look at the pilot output e.g.

******  JobID=[]
   Current Status = [ABORTED]
   Working Dir    = [[reserved]]
   ExitCode       = []
   FailureReason  = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Unknown queue MSG=cannot locate queue-) N/A (jobId = CREAM429777624)]
   Description    = [submission to BLAH failed [retry count=3]]
   Grid JobID     = [N/A]
   LRMS Abs JobID = [[reserved]]
   LRMS JobID     = [[reserved]]
   Deleg Proxy ID = [D54E25F8-B54E-5554-E958-FFB4B3397F60]
   DelegProxyInfo = [[ isRFC="false"; valid from="10/29/14 9:18 PM (GMT)"; valid to="11/3/14 8:06 PM (GMT)"; holder DN="CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; holder AC issuer="CN=proxy,CN=proxy,CN=proxy,CN=proxy,CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; VO="ilc"; AC issuer="CN=host/, OU=DESY, O=GermanGrid, C=DE"; VOMS attributes={ /ilc/Role=NULL/Capability=NULL, /ilc/fcal/Role=NULL/Capability=NULL } ]]
   Worker Node    = [N/A]
   Local User     = [ilcusr053]
   CREAM ISB URI  = [gsi]
   CREAM OSB URI  = [gsi]
   JDL            = [[ StdOutput = "d4HYF8.out"; BatchSystem = "pbs"; QueueName = "emi2-sl6"; Executable = ""; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "d4HYF8.out","d4HYF8.err" }; InputSandbox = { "/opt/dirac/data/work/SiteDirector/" }; StdError = "d4HYF8.err" ]]
   Type           = [Normal]

Notice the "Unknown queue" and the QueueName in the second to last line. This might mean that the queue no longer exists or is currently unavailable. Check gstat for the queues at this CE. or run the lcg-infosites --vo ilc ce command (grep for given site)

-- AndreSailer - 2014-10-30

