This page describes the error codes and diagnostics of the Panda jobs.
transExitCode |
diagnostics |
1 |
Athena release is not installed in the CE, or trf failed due to "Unknown Problem" (see checklog.txt) |
2 |
Athena core dump |
6 |
TRF_SEGVIO - Segmentation violation |
10 |
ATH_FAILURE - Athena non-zero exit |
26 |
TRF_ATHENACRASH - Athena crash |
30 |
TRF_PYT - transformation python error |
31 |
TRF_ARG - transformation argument error |
32 |
TRF_DEF - transformation definition error |
33 |
TRF_ENV - transformation environment error |
34 |
TRF_EXC - transformation exception |
40 |
Athena crash - consult log file |
41 |
TRF_OUTFILE - output file error |
42 |
TRF_CONFIG - transform config file error |
50 |
Athena crash-consult log file (can be "VKalVrtPrim ERROR Primary vertex not found") |
51 |
TRF_DBREL_TARFILE - Problems with the DBRelease tarfile |
60 |
TRF_GBB_TIME - GriBB - output limit exceeded (time, memory, CPU) |
79 |
Copying input file failed (Can't open source file : Invalid file name) |
80 |
file in trf definition not found, using the expandable syntax |
81 |
file in trf definition not found, using the expandable syntax -- pileup case |
85 |
analysis output merge crash - consult log file |
98 |
Oracle error - session limit reached |
99 |
Unknown transform error (69999, TRF_UNKNOWN) -- consult log file |
102 |
One of the output files did not get produced by the job |
104 |
Copying the output file from the worker node to the local SE failed (md5sum mismatch, or size mismatch, or LFNnonunique) |
126 |
trf is not executable - consult log file |
127 |
trf is not installed in the CE |
134 |
Athena core dump, or Athena time out, or ConditionsDB exception caught: MySQL error (database load problem), or Error ORA-03114: not connected to ORACLE |
141 |
No input file is available - input dataset is broken or doesn't exist at WN's site |
200 |
no Athena log file produced |
220 |
Proot: An exception occurred in the user analysis code |
221 |
Proot: Framework decided to abort the job due to an internal problem |
222 |
Proot: Job completed without reading all input files |
223 |
Proot: Input files cannot be opened |
2100 |
MyProxyError: server name not specified (not really trf error) |
2101 |
MyProxyError: voms attributes not specified (not really trf error) |
2102 |
MyProxyError: user DN not specified (not really trf error) |
2103 |
MyProxyError: pilot owner DN not specified (not really trf error) |
2104 |
MyProxyError: invalid path for the delegated proxy (not really trf error) |
2105 |
MyProxyError: invalid pilot proxy path (not really trf error) |
2106 |
MyProxyError: no path to delegated proxy specified (not really trf error) |
2200 |
MyProxyError: myproxy-init not available in PATH (not really trf error) |
2201 |
MyProxyError: myproxy-logon not available in PATH (not really trf error) |
2202 |
MyProxyError: myproxy-init version not valid (not really trf error) |
2203 |
MyProxyError: myproxy-logon version not valid (not really trf error) |
2300 |
MyProxyError: proxy delegation failed (not really trf error) |
2301 |
MyProxyError: proxy retrieval failed (not really trf error) |
2999 |
Unknown transExitCode error code (most likely a pilot script error, consult batch log) |
Recoverable error codes: 1101, 1114, 1122, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1140, 1141, 1142, 1152, 1154, 1155, 1157, 1181, 1185
(shown in green below, recovery of stranded jobs/output files, done by a later pilot on sites with schedconfig.retry = true)
Resubmission error codes: 1008, 1098, 1099, 1110, 1113, 1114, 1115, 1116, 1117, 1137, 1139, 1151, 1152, 1171, 1172, 1177, 1179, 1180, 1181, 1182, 1188, 1189
(pilot will instruct the server to retry the job)
pilotErrorCode |
diagnostics |
1008 |
General pilot error, consult batch log |
1097 |
Get function can't be called for staging input file |
1098 |
No space left on local disk |
1099 |
Get error: Staging input file failed |
1100 |
Get error: Replica not found |
1101 |
LRC registration error: Connection refused |
1103 |
Get error: No such file or directory |
1104 |
User work directory too large |
1105 |
Put error: Failed to add file size and checksum to LFC |
1106 |
Payload stdout file too big |
1107 |
Get error: Missing DBRelease file |
1108 |
Put error: LCG registration failed |
1109 |
Required CMTCONFIG incompatible with WN |
1110 |
Failed during setup |
1111 |
Exception caught by runJob |
1112 |
Exception caught by pilot |
1113 |
Get error: Failed to import LFC python module |
1114 |
Put error: Failed to import LFC python module |
1115 |
NFS SQLite locking problems |
1116 |
Pilot could not download queuedata |
1117 |
Pilot found non-valid queuedata |
1118 |
Pilot could not curl space report |
1119 |
Pilot aborted due to DDM space shortage |
1122 |
Bad replica entry returned by lfc_getreplicas(): SFN not set in LFC for this guid |
1123 |
Missing guid in output file list |
1124 |
Output file too large |
1130 |
Get error: Failed to get POOL file catalog |
1131 |
Put function can not be called for staging out |
1132 |
LRC registration error (consult log file) |
1133 |
Put error: Fetching default storage URL failed |
1134 |
Put error: Error in mkdir on localSE, not allowed or no available space |
1135 |
Could not get file size in job workdir |
1136 |
Error running md5sum on the file in job workdir |
1137 |
Put error: Error in copying the file from job workdir to localSE |
1138 |
Put error: could not get the file size on localSE |
1139 |
Put error: Problem with copying from job workdir to local SE: size mismatch |
1140 |
Put error: Error running md5sum on the file on localSE |
1141 |
Put error: Problem with copying from job workdir to local SE: md5sum mismatch |
1143 |
Failed to chmod trf |
1144 |
This job was killed by panda server |
1145 |
Get error: md5sum mismatch on input file |
1146 |
Trf installation dir does not exist and could not be installed |
1148 |
Put error: Failed to remove readOnly file in dCache |
1149 |
wget command failed to download trf |
1150 |
Looping job killed by pilot |
1151 |
Get error: Input file staging timed out |
1152 |
Put error: File copy timed out |
1153 |
Lost job was not finished |
1154 |
Failed to register log file |
1155 |
Failed to move output files for lost job |
1156 |
Pilot could not recover job |
1158 |
Reached maximum number of recovery trials |
1159 |
Job recovery could not read PoolFileCatalog.xml file (guids lost) |
1160 |
LRC registration error: file name string size exceeded limit of 250 |
1161 |
Job recovery could not generate xml for remaining output files |
1162 |
LRC registration error: Non-unique LFN |
1163 |
Grid proxy not valid |
1164 |
Get error: Local input file missing |
1165 |
Put error: Local output file missing |
1166 |
Put error: File copy broken by SIGPIPE |
1167 |
Get error: Input file missing in PoolFileCatalog.xml |
1168 |
Get error: Total file size too large |
1169 |
Put error: LFC registration failed |
1170 |
Error running adler32 on the file in job workdir |
1171 |
Get error: adler32 mismatch on input file |
1172 |
Put error: adler32 mismatch on output file |
1173 |
PandaMover staging error: File is not cached |
1174 |
PandaMover transfer failure |
1175 |
Get error: Problem with copying from local SE to job workdir: size mismatch |
1176 |
Pilot has no child processes (job wrapper has either crashed or did not send final status |
1177 |
Voms proxy not valid |
1178 |
Get error: No input files are staged |
1179 |
Get error: Failed to get LFC replicas |
1180 |
Get error: Globus system error |
1181 |
Put error: Globus system error |
1182 |
Get error: Failed to get LFC replica |
1183 |
LRC registration error: Guid-metadata entry already exists |
1184 |
Put error: PoolFileCatalog could not be found in workdir |
1186 |
Software directory does not exist |
1187 |
Athena metadata is not available |
1188 |
lcg-getturls failed |
1189 |
lcg-getturls was timed-out |
1190 |
LFN too long (exceeding limit of 150 characters) |
1199 |
Could not create directory |
1200 |
Job terminated by unknown kill signal |
1201 |
Job killed by signal: SIGTERM |
1202 |
Job killed by signal: SIGQUIT |
1203 |
Job killed by signal: SIGSEGV |
1204 |
Job killed by signal: SIGXCPU |
1206 |
Job killed by signal: SIGBUS |
1207 |
Job killed by signal: SIGUSR1 |
1210 |
No athena output |
1211 |
Missing installation |
1212 |
Payload ran out of memory |
1213 |
Reached batch system time limit |
1214 |
Site does not allow requested direct access or file stager |
1215 |
Failed to open TCP connection to localhost (worker node network problem) |
1216 |
Pilot TCP server has died |
1217 |
Mismatch between core count in job and queue definition |
1218 |
Exception caught by RunJobEvent |
1219 |
uuidgen failed to produce a guid |
1220 |
Job failed due to unknown reason (consult log file) |
1221 |
File already exist |
1222 |
Failed to get security key pair |
1223 |
TRF failed due to bad_alloc |
1224 |
Recoverable Event Service Merge error |
1225 |
Recoverable Event Service error |
1226 |
gLExec related error |
1227 |
AthenaMP ended Event Service job prematurely |
1228 |
Fatal Event Service error |
1229 |
Fatal Token Extractor error |
1230 |
Token Extractor error: Host name could not be resolved |
1231 |
Token Extractor error: Bad URL |
1232 |
Token Extractor error: Invalid GUID length |
1233 |
Token Extractor error: No tokens for this GUID |
1234 |
Already executed clone job |
1235 |
Payload exceeded maximum allowed memory |
1236 |
Failed by server |
1237 |
Event Service job killed by server |