Recommendations (and a few rules) for logging in JRA1 Code

Version 1.1

Introduction

Logging serves multiple purposes:

This last use is emphasized in the document Middleware Security Audit Logging Guidelines (version 0.8). As far as possible we aim to be consistent with the intent of that document. All stakeholders require well thought out logging calls with easy control over the logging.

The code that is currently maintained by JRA1 is an outcome of development done by loosely-coupled groups originally. The overall programming approaches and coding conventions are diverse, and it is felt to be unproductive to require strong unification in the approach to logging. Therefore this document contains mostly recommendations rather then rules. Its main goals are:

Issues on implementation

Logging interface

It is recommended that each subsystem (or preferably set of subsystems) uses a chosen logging interface such as that provided by Jakarta commons or a high level API such as one of those provided within the apache logging framework consistently, not calling syslog() directly. The main benefit is the flexibility of configuration to define which parts of the code are logged at which severity. In addition the library may deal with such things as escaping non-ASCII characters.

JRA1 wide agreement on using a single logging interface is not required due to rather diverse requirements, programming environments, and coding practice of the various developers groups. However, the number of different logging interfaces should be kept small and it would be best to agree upon just one for each programming language.

These interfaces make use of a set of loggers with operations such as trace(), warn() and error() to log messages at different levels and can be configured to send output in different formats to different outputs. One output format that must be supported is syslog. The developer will make a call to a logger and the system will be configured to ensure that these developer messages, augmented by other information will be made available as a syslog message.

In the complex and actively developed software stack (which gLite is) we need flexible configuration which lets the sysadmin enable verbose logging (up to debug level) of a very specific piece of software, while keeping logging of other parts normal in order not to flood the logs. This can't be done with syslog.conf but finer granularity is required. In addition there are known performance problems with syslog so we do not recommend using syslog directly.

We do not currently recommend any specific libraries, but some to consider are listed below. They all follow the same principles but differ significantly in level of support and code quality.

Java
either log4j or Jakarta commons
C++
log4cxx The existing released version 0.9.7 is not recommended on their web site and the HEAD version has an outstanding bug. The situation is being investigated.
C
log4c For the moment, log4c uses specific GCC extensions, like __attribute__, so you will need GCC to compile it but not to run it.
Python
log4py or logger. Note that logger is part of the standard Python distribution since 2.3 and includes syslog support and so should probably be used instead of log4py.

The loggers have names which are used to define a tree structure. Configuration files control processing of each logging call in the application code. These actions are defined for all messages above a specified level for all loggers at or below a specified node in the tree. The logging mechanism used by the code must have this flexibility.

The reason to move to a common configuration mechanism is that sys-admins determine what is logged at their site and, suspecting a problem, may want to increase the logging though they are unlikely to have detailed knowledge of the middleware. Unfortunately the logging file format for the various packages listed above is slightly different. The long term aim is that all JRA1 components will use a common configuration file format modeled on the log4j one.

Once the supported logging libraries are defined a configuration file format can be defined and configuration components written. This will have no impact on all the logging calls in the code.

It should be noted that having a large number of loggers does not imply have a large number of log files - it just provides flexibility.

Set of Loggers

security
A logger with a name of "security" must be used for security related messages. Sensitive information MUST NOT be sent to any other loggers.
access
A logger with a name of "access" must be used to log initial access to a service. If you are using something like tomcat then this will generate an access log.
control
A logger with a name of "control" must be used for service startup, configuration and shutdown
others ...
Other loggers should be named in a shallow hierarchy - not simply one logger per class.

Message content and properties

Logging levels

Though it is hard to make precise rules, we recommend the following levels and their associated meanings:

FATAL The server/service/component is shutting down
ERROR An unanticipated failure (i.e. a bug), or where the system is unable to behave according to its specification
WARN Anticipated failure typically caused by bad user input . Note that from a system perspective bad user input is not an error.
INFO Reporting of an operation executed by the server. The level of detail should be approximately the same as the specification and where appropriate use the same terminology. Any given operation should only be reported once at this level (after completion).
DEBUG Reporting of an implementation-level operation. This may include details beyond what is described in the specification and an operation may generate more than one message at this level. Try to avoid having too much at the DEBUG level as it renders the code unreadable.

However, finer severity levels (for example as defined by syslog) may be used when appropriate.

Format of message string

Events to be logged

Client APIs

Client APIs should not do logging as it introduces dependencies, makes the API code more complex and troubles the user with logging messages.

Service start up, configuration and termination

All of these messages MUST be sent to the control logger

On service start up, configuration and any subsequent re-configuration, a message should be logged at level INFO. The message should identify the configuration or the file with the configuration information.

On normal service termination a message should be logged at level INFO. If it terminates abnormally the service should attempt to send a security logger message which would typically be at level FATAL and should include the reason for termination.

Authorization logging

All of these messages MUST be sent to the security logger

  1. The list of attributes (e.g. obtained from VOMS attribute certificates, or other sources), must be logged at level INFO.
  2. The result of an authorization decision that results in granting access should be logged at level INFO. If a local identity is assigned to this session or request, the local mapping, including the numeric uid, the numeric gid, and the list of numeric supplementary gids, must be logged at level INFO.
  3. The result of a denied authorization must be logged at level WARN. The message should include the connection source information.
  4. Any identifiers that are required to link the authorization state to subsequent specific requests must be logged at level INFO.

Initial service access, session establishment and termination

  1. Initial access to a service should log a message to the access logger with level INFO containing the (network) source address and port, or the process id and owner of the process connecting to the socket.
  2. If the connection is TLS-authenticated with client credentials, the certificate subject DN of the client must be logged.
  3. If the TLS handshake fails, a message including the reason, and if possible the client subject name, should be sent with level WARN. This message should include the connection source information already reported.

Other Logging

Other messages should be logged as the developer sees fit. However:

Message context

Identifiers

If the request processing etc. is related to an externally identifiable entity (jobid, file guid, etc.) this identifier should be logged. This should be done consistently even if the request generates multiple (non-debug) messages - they may get directed to different appenders and the context with respect to the external identifier could be lost.

Whenever there is a logical notion of session, request, etc., and multiple messages related to one such entity instance can be logged, it is recommended that a unique id is assigned to such an entity and it is logged in all the messages consistently.

When a service calls service, it is also recommended that the session/request etc. id's are exchanged between the services in the communication protocol and logged on both sides, allowing unique message interrelation.

Since adding identifiers to the log messages may require some interface or protocol changes this recommendation mainly applies to new code or major revisions.

Message accuracy and log completeness

In case a service contacts other services to fulfill a request, or initiates processes to which processing is delegated, log messages should be generated and sent out even if the delegation or invocation fails, even if such a failed attempt will cause the service itself to terminate. Note that this may imply wrapping any such invocations in code that traps and recovers from exceptions to ensure the log message is generated and sent.

Efficiency

If the logging system provides a call to see if a particular logger and action is currently enabled (e.g. isDebugEnabled() and similar calls with log4j) please use it if the argument string is constructed dynamically. Note that production service operation might have easily produced hundreds debugging messages per second if enabled; even their formatting and subsequent throwing away would represent a considerable overhead.

Configuring the system for syslog

The configuration should attempt to make the output conform to that described in Middleware Security Audit Logging Guidelines.

In the case of tomcat, a custom AccessLogValve will need to be written to deal with the access log

The following pattern can be used with log4j and tomcat

log4j.appender.syslog.layout.ConversionPattern = %c [%t]: %d{yyyy-MM-dd'T'HH:mm:ss.SSSZ} %-5p %x - %m%n

to provide:

<category> [<thread name>]: <UTC Date> <truncated log level> <NDC> - <developer's message> <end-of-line>