IMPORTANT:

Any change which are done here also must be done in: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMExceptions.py

  • Range(1 - 512)
    • standard ones in Unix and indicate a CMSSW abort that the cmsRun did not catch as exception
  • Range(7000 - 9000)
  • Range(10000 - 19999)
    • Failures related to the environment setup
  • Range(50000 - 59999)
    • Failures related executable file
  • Range(60000 - 69999)
    • Failures related staging-OUT
  • Range(70000 - 79999)
    • Failures related only for WMAgent. (which does not fit into ranges before)
  • Range(80000 - 89999)
    • Failures related only for CRAB3. (which does not fit into ranges before)
  • Range(90000 - 99999)
    • Other problems which does not fit to any range before.

Error codes currently sent from CMS jobs to the dashboard

ALERT! indicates site error

  • Error exit code of the cmsRun application itself - range 0-10000
  • Exit codes in 1-255 are standard ones in Unix and indicate a CMSSW abort that the cmsRun did not catch as exception

  • cmsRun (CMSSW) exit codes. range 7000-9000 * *These codes may depend on specific CMSSW version, the list is maintained in here and you should look at tags there to find out what is appropriate for a given CMSSW release. The situation as of 5_0_X is below
    • // The first three are specific categories of CMS Exceptions.
    • 7000 - Exception from command line processing
    • 7001 - Configuration File Not Found
    • 7002 - Configuration File Read Error
    • 8001 - Other CMS Exception
    • 8002 - std::exception (other than bad_alloc)
    • 8003 - Unknown Exception
    • 8004 - std::bad_alloc (memory exhaustion)
    • 8005 - Bad Exception Type (e.g throwing a string)
    • // The rest are specific categories of CMS Exceptions.
    • 8006 - ProductNotFound
    • 8007 - DictionaryNotFound
    • 8008 - InsertFailure
    • 8009 - Configuration
    • 8010 - LogicError
    • 8011 - UnimplementedFeature
    • 8012 - InvalidReference
    • 8013 - NullPointerError
    • 8014 - NoProductSpecified
    • 8015 - EventTimeout
    • 8016 - EventCorruption
    • 8017 - ScheduleExecutionFailure
    • 8018 - EventProcessorFailure
    • 8019 - FileInPathError
    • ALERT! 8020 - FileOpenError (Likely a site error) (see HERE.)
    • ALERT! 8021 - FileReadError (May be a site error)
    • 8022 - FatalRootError
    • 8023 - MismatchedInputFiles
    • 8024 - ProductDoesNotSupportViews
    • 8025 - ProductDoesNotSupportPtr
    • 8026 - NotFound (something other than a product or dictionary not found)
    • 8027 - FormatIncompatibility
    • ALERT! 8028 - FileOpenError with fallback
    • 8030 - Exceeded maximum allowed VSize (ExceededResourceVSize).
    • 8031 - Exceeded maximum allowed RSS (ExceededResourceRSS).
    • 8032 - Exceeded maximum allowed time (ExceededResourceTime).
    • ALERT! 8033 - Could not write output file (FileWriteError) (usually local disk problem)
    • 9000 - cmsRun caught (SIGINT and SIGUSR2) signal.
  • ALERT! Failures related to the environment setup - range 10000-19999
    • 10001 - Connectivity problems
    • 10002 - CPU load is too high
    • 10003 - CMS software initialisation script cmsset_default.sh failed
    • 10004 - CMS_PATH not defined
    • 10005 - CMS_PATH directory does not exist
    • 10006 - scramv1 command not found
    • 10007 - Some CMSSW files are corrupted/non readable
    • 10008 - Scratch dir was not found
    • 10009 - Less than 5 GB/core of free space in scratch dir
    • 10010 - Could not find X509 certificate directory
    • 10011 - Could not find X509 proxy certificate
    • 10012 - Unable to locate the glidein configuration file
    • 10013 - No sitename detected! Invalid SITECONF file
    • 10014 - No PhEDEx node name found for local or fallback stageout
    • 10015 - No LOCAL_STAGEOUT section in site-local-config.xml
    • 10016 - No frontier-connect section in site-local-config.xml
    • 10017 - No callib-data section in site-local-config.xml
    • 10018 - site-local-config.xml was not found
    • 10019 - TrivialFileCatalog string missing
    • 10020 - event_data section is missing
    • 10021 - no proxy string in site-local-config.xml
    • 10022 - Squid test was failed
    • 10023 - Clock skew is bigger than 60 sec
    • 10031 - Directory VO_CMS_SW_DIR not found
    • ALERT! 10032 - Failed to source CMS Environment setup script such as cmssset_default.sh, grid system or site equivalent script
    • ALERT! 10034 - Required application version is not found at the site (see HERE.)
    • ALERT! 10040 - failed to generate cmsRun cfg file at runtime
    • ALERT! 10042 - Unable to stage-in wrapper tarball.
    • ALERT! 10043 - Unable to bootstrap WMCore libraries (most likely site python is broken).
    • ALERT! 10050 - WARNING test_squid.py: One of the load balance Squid proxies
    • ALERT! 10051 - WARNING less than 20 GB of free space per core in scratch dir
    • ALERT! 10052 - WARNING less than 10MB free in /tmp
    • ALERT! 10053 - WARNING CPU load of last minutes + pilot cores is higher than number of physical CPUs
    • ALERT! 10054 - WARNING proxy shorther than 6 hours

  • Executable file related failures - range 50000-59999
    • 50110 - Executable file is not found
    • 50111 - Executable file has no exe permissions
    • 50113 - Executable did not get enough arguments
    • 50115 - cmsRun did not produce a valid job report at runtime (often means cmsRun segfaulted)
    • 50116 - Could not determine exit code of cmsRun executable at runtime
    • 50513 - Failure to run SCRAM setup scripts
    • 50660 - Application terminated by wrapper because using too much RAM (RSS)
    • 50661 - Application terminated by wrapper because using too much Virtual Memory (VSIZE)
    • 50662 - Application terminated by wrapper because using too much disk
    • 50663 - Application terminated by wrapper because using too much CPU time
    • 50664 - Application terminated by wrapper because using too much Wall Clock time
    • 50665 - Application terminated by wrapper because it stay idle too long
    • 50669 - Application terminated by wrapper for not defined reason

  • Staging-OUT related troubles- range 60000-69999
    • 60302 - Output file(s) not found (see HERE.)
    • 60307 - Failed to copy an output file to the SE (sometimes caused by timeout issue). Or by the issues mentioned HERE.
    • ALERT! 60311 - Local Stage Out Failure using site specific plugin
    • 60312 - Failed to get file TURL via lcg-lr command
    • ALERT! 60315 - ProdAgent StageOut initialisation error (Due to TFC, SITECONF etc)
    • 60316 - Failed to create a directory on the SE
    • 60317 - Forced timeout for stuck stage out
    • 60318 - Internal error in Crab cmscp.py stageout script
    • 60319 - Failure to do AlcaHarvest stageout (WMAgent)
    • 60320 - Failure to communicate with ASO server
    • ALERT! 60321 - Site related issue: no space, SE down, refused connection.
    • 60322 - User is not authorized to write to destination site.
    • 60323 - User quota exceeded.
    • 60324 - Other stageout exception.
    • 60401 - Failure to assemble LFN in direct-to-merge by size (WMAgent)
    • 60402 - Failure to assemble LFN in direct-to-merge by event (WMAgent)
    • 60403 - Timeout during attempted file transfer - status unknown (WMAgent)
    • 60404 - Timeout during staging of log archives - status unknown (WMAgent)
    • 60405 - General failure to stage out log archives (WMAgent)
    • 60407 - Timeout in staging in log files during log collection (WMAgent)
    • 60408 - Failure to stage out of log files during log collection (WMAgent)
    • 60409 - Timeout in stage out of log files during log collection (WMAgent)
    • 60450 - No output files present in the report
    • 60451 - Output file lacked adler32 checksum (WMAgent)

  • Failures related only for WMAgent.- range 70000-79999
    • 71101 - No sites are available to submit the job because the location of its input(s) do not pass the site whitelist/blacklist restrictions (WMAgent) Twas 61101
    • 71102 - The job can only run at a site that is currently in Aborted state (WMAgent)
    • 71103 - The JobSubmitter component could not load the job pickle (WMAgent)
    • 71104 - The job can run only at a site that is currently in Draining state (WMAgent)
    • 71300 - The job was killed by the WMAgent, reason is unknown (WMAgent)
    • 71301 - The job was killed by the WMAgent because the site it was running at was set to Aborted (WMAgent)
    • 71302 - The job was killed by the WMAgent because the site it was running at was set to Draining (WMAgent)
    • 71303 - The job was killed by the WMAgent because the site it was running at was set to Down (WMAgent)
    • 71304 - The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Running.
    • 71305 - The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Pending.
    • 71306 - The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Error.
    • 71307 - The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Unkown.
    • 70318 - Failure in DQM upload.
    • 70452 - No run/lumi information in file (WMAgent)

  • Failures related only for CRAB3- range 80000-89999
    • 80000 - Internal error in CRAB job wrapper
    • 80001 - No exit code set by job wrapper.
    • 80453 - Unable to determine pset hash from output file (CRAB3).

  • Other problems which does not fit to any range before.- range 90000-99999
    • 90000 - Error in CRAB3 post-processing step (currently includes basically errors in stage out and file metadata upload).
    • 99109 - Uncaught exception in WMAgent step executor
Edit | Attach | Watch | Print version | History: r89 < r88 < r87 < r86 < r85 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r89 - 2019-07-26 - JeanrochVlimant
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback