SWGuideTroubleShootingMore
This page is used to collect and prepare further error messages for SWGuideTroubleShooting. You should view this to be in permanent flow. It is rather meant for developers than for users but you might still find some useful information, there.
How to determine which line number your CMSSW code crashed at
If your CMSSW job crashes, the error traceback will usually only tell you which module it crashed in. When developing code, it is therefore a good idea to add to your BuildFile the line
<Flags CXXFLAGS="-g"/> (See
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideScram#CmsswSCRAMBuildFlags). This will compile your code with compiler option "-g". Although this may marginally slow down your code, it will ensure that if your job crashes with error traceback, then the latter will tell you the exact line number where the error occurred.
Note that CMSSW catches some errors and handles them itself. In this case, since it prevents the job crashing, you will not get error traceback, so will only be informed which module the crash occurs in. In this case, to determine the line number, you should run the debugger:
gdb cmsRun
catch throw
run MyAnalysis _cfg.py
where
This sequence tells the debugger to prevent C++ from throwing errors and to tell you "where" the code crashed.
Python Errors in gdb?
When trying to run
gdb cmsRun
, if you get errors like
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
'import site' failed; use -v for traceback
----- Begin Fatal Exception 16-Jan-2012 16:48:28 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
[0] Processing the python configuration file named cmsRun3.py
Exception Message:
python encountered the error: <type 'exceptions.ImportError'>
No module named os
----- End Fatal Exception -------------------------------------------------
You can get around this by either resetting your shell:
env SHELL=/bin/sh gdb cmsRun
or copy the python environment variable from scram:
scram tool info python
then set
$PYTHONHOME
to wherever scram thinks it is.
Memory leaks (St9bad_alloc)
A 'bad_alloc' exception means the program has run out of memory. Usually this means your job has a memory leak. The symptom of is the problem is an exception thrown with the somewhat obscure string in the body of the message "St9bad_alloc". There is no cure except "fix your memory leak", this is the suggested course of action:
- Add the following to your cfg:
SimpleMemoryCheck = cms.Service("SimpleMemoryCheck",ignoreTotal = cms.untracked.int32(1) )
For reference, there is an oflline page about the "SimpleMemoryCheck" service
SWGuideEDMTimingAndMemory. Useful information can also be found in
SWGuideFrameWork#Coding_tools_and_instructions under guidelines for using pointers.
Getting more information from the traceback
There are many options available to obtain better debug output to help track down problems.
Some of these optios are briefly described below.
Tracer
The service Tracer helps by identifying what module is called and when. The usage is explained elsewhere in the WorkBook in:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWriteFrameworkModule#WatcH
Message Logger
The service MessageLogger does logging, and provides messages including warnings and errors. For basic usage, one must include the MessageLogger service header in their module header:
from FWCore.MessageService.MessageLogger_cfi import *
In addition, it is strongly recommended (for consistency with the way all services are used ) that the .cfg file contain at least the line
MessageLogger = cms.Service("MessageLogger")
Configuration options and more usage instructions for the MessageLogger service are documented in:
Memory Checker
The service SimpleMemoryCheck does very basic memory checking (it can sometimes show memory leaks). Some notes on usage can be found in:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMTimingAndMemory
EventContentAnalyzer
The module EventContentAnalyzer dumps all products stored in an event to the screen. Usage of this module is explained elsewhere in the WorkBook in:
https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookWriteFrameworkModule#SeE
Include it as a module in your configuration-file:
dump = cms.module("EventContentAnalyzer")
Then inculde
dump
in your path.
Other help
Benedikt Hegner has a script for helping with errors in
BuildFiles
. The path and filename on
lxplus for this script are:
~hegner/public/cmsfilt.py
Problems when reading files from CASTOR
If you suspect that you have trouble accessing data with CASTOR you can try the following
- to check that the file exists
nsls -l /castor/cern.ch/...
(complete with the file name)
- to check the staging status
stager_qry -M /castor/cern.ch/...
(it may be that your file is being staged and it may take a while)
- to check that the file is really available you can try to copy it locally
rfcp /castor/cern.ch/... /tmp
If the problem persists you can contact
cms.support@cernNOSPAMPLEASE.ch specifying the file and the output of the above commands
Information Sources
Review status
Responsible:
SudhirMalik
Last reviewed by:
SudhirMalik - 20 Jan 2010
--
RogerWolf - 15-Sep-2010