EDM Exception Analysis

Complete: 3

Condition Reporting: Using the "out of band" channel or nonstandard path to report information to interested parties outside the context of the current active code. Examples of why reporting is necessary include announcing failures, anomalous conditions, or producing execution traces.

Goal

The Goal here is first to define the various types of condition reporting that code must do and how the users of the code will obtain this information and control the actions based on reciept of it. Here is a list of phases to this project:

  1. define the various types of reporting
  2. write some requirements for and define the exception interface, describe correct use of it, describe the output that will be produced, and decide how it will be configured and used by the framework.
  3. design and implement the exception processing subsystem.
  4. produce the requirements for the other condition reporting APIs and how these will be used.

Types of condition reporting

Contained here are the different ways that developer-written code uses to communicate information to external entities (within the same program or outside of it). This list reflects how things ought to be, not how they currently are. The guidelines for determining what category a particular encountered problem lies will be explained in later sections. The policy for handling information from each of these category will also be explained in later section. The information in this document should cover the definition, nature, properties, and usage policy for each of these categories.

No change in current program flow

Debugging and progress announcements are included in this category. This is text announcing where the program is and includes state information. With appropriate compiler options, this code should be able to be able to be removed from the program. A configurable level should allow more or less information from appearing. It is likely true that the text coming out needs to be decorated in a similar way to the other reporting methods, but the interface and controls are likely to be different.

Nonlocal change of standard program flow possible

Pathologies or "important notices" are included in this category. This is the Message/Error Logger facility. When code discovers an interesting condition and decides to continue without issuing an exception, and perhaps producing a poorer quality or a truncated result, it will want to announce that this has happened. The error or message logger is the facility that captures this information and delivering to a destination.

Local change of standard progam flow

Exceptions issued locally are included in this category. The example is an exception throw. This is triggered by doing a throw in C++. An exception occurs when a piece of code cannot continue and cannot produce the things that it promised it would. The exception contains the reason why the algorithm could not continue.

Brief Survey of various reporting software

HLT

CDF/D0

Unix

Handling Exceptions

The places where something bad can happen boil down to:

  • user code - algorithms our developers write
  • infrastructure - exceptions that we know can happen
  • system level - things from std
  • external libraries
    • tools we know about - examples are root and boost
    • lower-level utilities that developers use, external to us

These categories of exceptions correspond to what the framework will catch.

 try {
   invokeSomeUserCode();
 }
 catch(edm::Exception& e) { ... }
 catch(cms::Exception& e) { ... }
 catch(std::exception& e) { ... }
 catch(...) { ... }

In addition, the infrastructure software will catch "char*" and std::string and kill the program is one of these is caught.

The action or control-flow change associated with each of these exceptions is configured at runtime from the following set:

  • skip this event altogether
  • continue as though nothing happened
  • stop processing in the current path
  • rethrow the exception (stop event processing)

In the future we can consider additional features like running user-written functions that decide what to do.

The cms::Exception class will be derived from std::exception. The edm::Exception class will be derived from cms::Exception.

Developers like to write out information in a form accepted by cout. It would be nice to capture the information that would normally be sent to cout in the exception and have that information also sent to a log. The cms::Exception will provide a stringstream to allow data to be added to it in a nice form.

The cms::Exception will allow for an identification string to be used as for runtime configuration of the action that is to take place as a result of catching the exception.

The edm::Exception will use an integer identifier for a similar purpose to the cms::Exception identifier.

The constructor of cms::Exception will take a string category and option string message.

The rules for using the exceptions:

  • cms::Exception - Can be thrown as is without derivation. Derived types must give a string category that matches the derived class name. This is the type of exception that the user is allowed to propagate through a module boundary. Infrastructure code expects all other exceptions to be caught and dealt with within the developer code. If these exceptions are rethrown, new locate/state information will be appended. Unique actions can be assigned to the category.
  • edm::Exception - Similar to cms::Exception, actions are distinct.
  • std::exception - rethrown after printing additional state information
  • ... - rethrow after printing additional state information

The developers propagate high-level announcements of what has happened. They should not throw any sort of resolution - this action is up to the user configuring the job, not up to the code developer.

Any cms::Exception that passes through infrastructure code will have context information added that include the module description and perhaps the event and any store/runsegment information. The Exception object could go to the environment to locate the currently active module to inject the module description and other state information.

In summary, each cms::Exception should contain an category, specifying the recognized problem, contextual information containing things like module type and label, and a user supplied string. If the exception rippled through layers of infrastructure, then the user supplied string will be a concatenation of the "what" information of the previously caught exceptions. In other words, the original exception object goes away and only their string data remains. The category of the root cause will be maintained in the thrown exception (this is worded poorly - the exception caught will contain the category that was last thrown).

Multi-line output (messages containing many newlines) poses a problem for automated tool inspecting output. One solution is to decorate each line of output with the category and part of the contextual information. Another is to put a clear "start block" indicator in the stream and an "end block". This may be only a problem in the output logger driver and many solutions may be available for use.

All exceptions will be caught by reference (nonconst).

Should exception information be sent to the logger automatically upon construction or should they be sent by the catcher? The catcher must throw it because it is only then complete.

Examples of what to throw

As mentioned earlier, the system expects to see cms::Exceptions that are a diagnosis or recognition of a problem, not a prescription or remedy for an encountered problem.

Here is a list of examples:

  • data corrupt
  • too many hits in a detector
  • failed to converge on a solution
  • infeasible solution calculated
  • invalid detector component

There is a problem of conflicting categories across different modules or even algorithms. The module context or description information can be used to disambiguate them. Using a string category can also minimize the chances of this happening.

Types of problems encountered

Cannot continue

Pathologies

Discussion with developers

Here we list some of the opinions for algorithm developers about presentation of output and about when exceptions or logging will be used.

Calorimetry group

Labeling data where fishy conditions are observed is good, but so prescribed action.

Always run to some completion - build something, even if it is somewhat defective (example is algorithm does not converge). Use an auxiliary channel to note the defect or use provenence labeling for this purpose.

No calotower -> no jet collection. This is an exceptional condition. In this case, the framework event "get" will throw and exit this current algorithm.

Will need an example of how and when to log data from an exception and how and when to propagate exception information out of a module.

Tracking group

Observation in code

Much of the code appears to print things to cout and cerr.

Some code throws runtime_error.

Some code throw GenTerminate and any type of error like parsing input configuration data.

Genexception is thrown in a few places for reasons like dynamic_cast failed. Also for missing event data or missing event i.e. null pointers.

Calorimetry has many derived from CMSexception. Used for things like null pointer bad dynamic_cast, illegal detector part. The basic one is CalException and is used for bad cell index,

Found things like CARFSkipEventexception, used if dynamic_cast failed because wrong buffer type and for bad CaliHit found.

Found places where a "char *" is thrown and also std::string.

Found DetLogicError, which is a CMSexception - used for null pointer found. Used in lots of places for various things like "cannot propagate to an arbitrary cylinder". Used all over CommonDet.

Muon code throw char* for lots of things like like missing data.

LogicError is used extensively (really DetLogicError). In fact, in a majority of the code. So is Genexception.

Open Questions

1. What needs to be done to the existing code that is moved over? 2. Where should the new exception base class live?

Proposed Interface

Most promising:

 throw cms::Exception(category) << "This is a test: " << some_value << endl;

first alternative

 cms::Exception e(category);
 throw e << "This is a test: " << some_value << endl;

The interface must permit derivation from this basic type and allow catching as the derived type. The category cooresponds to a classifier for this exception. If the exception is a concrete derived type, then the classifier can just be the derived class name. Otherwise, the classifier is any string that can be used to determine an action based on configuration parameters at runtime.

See the documentation in the header file for examples of how to use the class cms::Exception.

Discussion

Review Status

Reviewer/Editor and Date (copy from screen) Comments
Main.jbk - 02 Sep 2005 page author
Main.tomalini - 09 Oct 2006 page last content editor (Ian Tomalin)
JennyWilliams - 07 Feb 2007 editing to include in SWGuide
Main.wmtan - 09 Oct 2008 editing to remove references to SEAL

Responsible: Main.tomalini
Last reviewed by: Reviewer

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2008-10-09 - WilliamTanenbaum



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback