Things That go Wrong
Pilots Aborted at certain Sites or CEs
When the
PilotSummary shows a high abort-rate for some CE.
Look at the pilot output
e.g.
****** JobID=[https://grid-cr2.desy.de:8443/CREAM429777624]
Current Status = [ABORTED]
Working Dir = [[reserved]]
ExitCode = []
FailureReason = [BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Unknown queue MSG=cannot locate queue-) N/A (jobId = CREAM429777624)]
Description = [submission to BLAH failed [retry count=3]]
Grid JobID = [N/A]
LRMS Abs JobID = [[reserved]]
LRMS JobID = [[reserved]]
Deleg Proxy ID = [D54E25F8-B54E-5554-E958-FFB4B3397F60]
DelegProxyInfo = [[ isRFC="false"; valid from="10/29/14 9:18 PM (GMT)"; valid to="11/3/14 8:06 PM (GMT)"; holder DN="CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; holder AC issuer="CN=proxy,CN=proxy,CN=proxy,CN=proxy,CN=Andre Sailer,CN=683529,CN=sailer,OU=Users,OU=Organic Units,DC=cern,DC=ch"; VO="ilc"; AC issuer="CN=host/grid-voms.desy.de, OU=DESY, O=GermanGrid, C=DE"; VOMS attributes={ /ilc/Role=NULL/Capability=NULL, /ilc/fcal/Role=NULL/Capability=NULL } ]]
Worker Node = [N/A]
Local User = [ilcusr053]
CREAM ISB URI = [gsiftp://grid-cr2.desy.de/var/cream_sandbox/ilcusr/CN_Andre_Sailer_CN_683529_CN_sailer_OU_Users_OU_Organic_Units_DC_cern_DC_ch_ilc_Role_NULL_Capability_NULL_ilcusr053/42/CREAM429777624/ISB]
CREAM OSB URI = [gsiftp://grid-cr2.desy.de/var/cream_sandbox/ilcusr/CN_Andre_Sailer_CN_683529_CN_sailer_OU_Users_OU_Organic_Units_DC_cern_DC_ch_ilc_Role_NULL_Capability_NULL_ilcusr053/42/CREAM429777624/OSB]
JDL = [[ StdOutput = "d4HYF8.out"; BatchSystem = "pbs"; QueueName = "emi2-sl6"; Executable = "DIRAC_iSOJXR_pilotwrapper.py"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "d4HYF8.out","d4HYF8.err" }; InputSandbox = { "/opt/dirac/data/work/SiteDirector/DIRAC_iSOJXR_pilotwrapper.py" }; StdError = "d4HYF8.err" ]]
Type = [Normal]
Notice the "Unknown queue" and the
QueueName in the second to last line. This might mean that the queue no longer exists or is currently unavailable.
Check
gstat
for the queues at this CE.
or run the
lcg-infosites --vo ilc ce
command (grep for given site)
--
AndreSailer - 2014-10-30