_Refer to initial document compiled by Alvin and Kuba at CERN on 13.09.2006 : GangaCleanShutdown _

The issue

Currently, Ganga 4.2/4.3, does not handle the shutdown of the monitoring service in a clean way, which results sometimes in repository locks being left over, trying to download the job output without write permissions in the AFS area, etc. The clean shutdown of services is needed in two situations:

  • ganga process is terminated (e.g. user types ^D in the text shell)
  • credentials expire (such as AFS token or Grid proxy)

Ganga defines an atexit handler which makes sure that repository is flushed but it does not make any attempt to shutdown monitoring loop. Sometimes the commit operation done by monitoring loop may be interrupted by abrupt Ganga shutdown. Also any ongoing output downloads or job status queries may be aborted leaving Ganga in inconsistent state.

Also the proxy checking is not optimal: LCG handler does proxy-init when the LCG module is loaded. This proxy created as a side-effect is used by remote repository (in authenticated mode). The bootstrap procedure is as follows:

  • load system plugins (e.g. LCG -> ask for passphrase to create grid-proxy)
  • load custom (extension) plugin (e.g. GangaLHCb)
  • initialize repository
  • start monitoring (if enabled in the config file)
  • start user interface (e.g. IPython, GUI, ...)


Internal Services

We develop a concept of InternalService which supports explicitly the correct runtime behavior upon bootstrap, shutdown or credentials expiry.

We define InternalService as a representation of runtime entities which may be started, stopped or enabled,disabled temporary. Examples of the internal services include:

  • repository
    • Local repository on AFS depends on AFS token
    • Remote repository in authenticated mode depends on grid-proxy (or voms-proxy in the future)
  • workspace * Workspace on AFS depends on AFS token
  • monitoring subsystem
    • monitoring threads require the repository and workspace to be started and all required credentials to be active.
      • if either repository or workspace is shutdown, the whole monitoring does not start at all.
      • if a certain credentials is about to expire (is not valid) the monitoring loop is disabled

The InternalService supports the following public methods:

  • start() : start the service, if already running or if not enabled() then has no effect
  • stop(timeout=None) : stop the service, return False if timeout exceeded, if timeout is None then block until the operation terminated and return True
  • disable() : temporarily disable the service (i.e. one of the required credentials is not valid)
  • enable() : enable the service when the all the required constraints are met ( e.g. called automatically by cred.renew() )

It is legal to start, stop the service multiple times.

Credential providers and Credential consumers

Currently the Credentials package provides a common interface to create generic credentials that may be used by different components of Ganga (LCG backend, AFS repository, proxy authenticated remote repository) Internally the GridProxy and AfsToken are created and monitored. However there might be other packages (such as GangaNG) that may require credentials (possible different) to be created and monitored. We want to extend the existing mechanism to support a dynamic list of credentials that can be created by any plugin and also shared between others components. This list of credentials is exposed in the GPI and also monitored in the monitoring loop.

Proposal: We define two new interfaces that may be implemented by plugins

  • Credential Provider : any Ganga plugin can provide a factory class for an ICredential and register this along with a metadescription in a global credentials manager (e.g Credentials module may act as a container for this)
  • Credential Consumer : when initialised, any Ganga plugin may register in the manager the credentials requirements (to be more flexible this registration can be in form of some description to allow sharing of compatible credentials and also minimize the list of active credentials)

Credential manager is the match-maker for the credentials providers and consumers.

Bootstrap of Ganga

The new bootstrap procedure would look as follows:

  • load plugins:
    (this will populate the list of available (but not created yet) credentials
    • load system plugins (e.g. LCG -> DO NOT ask for passphrase to create grid-proxy)
    • load custom (extension) plugins (e.g. GangaLHCb)
  • get the list of all required credentials and create each of these.
  • start repository
    • depending on Remote/Local, AFS/nonAFS, authenticated/nonauthenticated case check for proxy validity
    • if not all the credentials are valid then do not start() the service i.e. do not connect to the repository
  • create workspace internal service
    • check for token depending on AFS/nonAFS case
  • start monitoring (if enabled in the config file)
    • main monitoring loop: if repository or workspace services are not enabled then stop() itself, i.e.:
      • do not insert any jobs for updateMonitoringInformation()
      • "broadcast a signal" (by setting approperiate global flag) that all ongoing updateMonitoringInformation() methods should terminated ASAP (see Threading model and checkpoints)
    • do the selection of not enabled() backends and apply the same stop procedure to them
    • stop repository and workspace (i.e. flush uncommited repository changes)
  • start user interface (e.g. IPython, GUI, ...)

Main monitoring loop checks periodically the credentials for the time left. If certain alarm threshold is reached (e.g. 10 minutes before credential expiry), user is notified that services will stop automatically when the stop threshold is reached (e.g. 5 minutes). Unless credentials are renewed, the services are stopped accordingly.

User interactions

Credential expiration

We also agreed that we need a smarter way to deal with the expiring tokens and to deactivate any interactions triggered by the monitoring loop in this case: (backends monitoring/repository/workspace). We already have the monitoring of the active credentials so a reasonable solution would be to automatically disable the monitoring loop (if it's running) , flush the current changes in the repository and mark as "disabled" the workspace and the repository, if one of these active credential is about to expire.

At alarm time user should get a message like "AfsToken is going to expire in 10 minutes and services which use it will be stopped automatically in 5 minutes. Do AfsToken.renew() to re-enable the services." Also, the CLI prompt can be modified in this case to prepred a field for each thing that has expired. So if no proxy:
[No proxy][1]
and if no proxy and no afs token (and you need one):
[No afs][No proxy][1]

By default monitoring loop should NOT open xterms etc and ask for passphrase. User may use AfsToken.renew() or GridProxy.renew(). The side offect of the renew() should be restart of all services which depend on it

Ganga shutdown

When Ganga is shut-down, then ongoing monitoring service must be stopped. This may take some time, as the monitoring loops should finish correctly (e.g. downloading output files).

There are 3 ways to proceed:

  • interactive mode - after certain timeout (e.g. 5 seconds) Ganga should issue a message saying like this "N remaining jobs in the monitoring loop. Aborting the monitoring loop may lead to inconsistent jobs (e.g. partially retrieved output). Do you want to abort the monitoring loop (y/n)?
  • forced mode - wait certain timeout and shutdown anyway
  • safe mode - wait until the monitoring is really shutdown

These options may be configurable.

Threading model and checkpoints

Deamonic threads are used for all internal services (e.g. monitoring). The reason is that non-daemonic python threads cannot be killed and Ganga should give a user a possibility to force the shutdown. This is the way Ganga 4.2 is implemented and it is OK.

However the threads are requested to cooperate with the Core system in order to react to shutdown request as soon as possible. This means, that threads should often check if the shutdown is requested and terminate promptly in a clean way.

There are conceptual checkpoints i.e. places in the code where it is safe to terminate the thread. For example, the backend monitoring has a loop:

for j in jobs: # <-- checkpoint

If a shutdown has been requested, a thread will run until the next checkpoint and then exit. It is not possible to cleanly terminate the thread in-between the checkpoints. In the particular case of backend.updateMonitoringInformation() method, the special iterator of jobs collection will have the chckpointing mechanisms built-in transparently.

Is a job repository an Internal Service ?

Yes. The job repository however does not have separate threads of control:

  • stop() does the flush of the uncommited changes;
  • start() is no-op unless the repository was not initially connected;

Monitoring Loop as an Internal Service

The monitoring component for Ganga 4.4 has and enhanced behoviour: adhere to the Internal Service interface and and I want to have your comments on this. There is a more flexible way to control the monitoring loop (enable/disable/run-on-demand) and also a way to stop internal threads in a nicer way:

Definition:     MC.enableMonitoring(self)
    Run the monitoring loop continuously
Definition:     MC.disableMonitoring(self)
    Temporally disable the monitoring loop
Definition:     MC.runMonitoring(self, steps=1, timeout=60)
    Enable and run on demand the monitoring loop if this is not already running.
    This method is meant to be used in Ganga scripts or interactive sessions to request monitoring on demand. 
      steps:   number of monitoring steps to run
      timeout: how long to wait for monitor steps termination
      False, if the loop cannot be started or the timeout occurred while waiting for monitoring termination
      True, if the monitoring steps were successfully executed
Definition:     MC.stop(self, fail_cb=None)
    Shutdown the monitoring loop. These method should be called automatically when Ganga exits in order to do
        a clean shutdown of all the resources used by the monitoring loop (threads/task queue)
     fail_cb: if None this callback is called when the internal threads fails to terminate in the timeout period. 
                Externally this might be a function that ask user interactively to wait more for these resources cleanup
-- JakubMoscicki - 27 Jun 2006

-- AdrianMuraru - 19 Apr 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2007-04-20 - AdrianMuraru
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback