Directory Service

Overview of DIANE

  • When you have a job to execute, you start a master and submit some workers:
    • diane.startjob -job autodock.job --ganga -w 32@lcg,32@pbs

  • Then you can submit more workers needed for this job:
    • diane.ganga.submitworkers -job autodock.job -nw=100 -bk=lcg
  • When you start the job, the application creates a directory in your home directory: ~/diane.workspace/ with two subdirectories:
    • applications and jobs
  • The directory jobs contains a directory for each job executed
  • The name of the directory is the job identifier
  • Each directory contains a file with the IOR (Interoperable Object References) of the Master

Current limitation

  • One Worker is hardwired to one Master ('hardcoded' address in the input sandbox)
  • The Workers can not be shared/transferred between masters at runtime
  • The workers with no task left do not have to die, they might be able to join another master when possible

Overview of DIANE Directory Service

ds1.png

  • The Directory Service is a server containing a list of all the masters
  • The Master register itself to the Directory Service
  • The Workers obtain a Master through the Directory Service
  • Directory Service has an algorithm for the load balancing of the workers and prioritize of the masters

How it works

  • Start the directory service typing:
    • diane.directoryservice
  • This command creates a file called DSOID containing the IOR of the DirectoryService in the following location:
    • ~/diane.workspace/jobs/
  • Only one instance of DS can be running at the same time
  • The DSOID file is used as lock file to check the presence of a DS already running
  • Start a new DS killing the previous one is possible typing:
    • diane.directoryservice --kill
  • Run the Master and the Workers in the DS mode means adding the flag --ds to the usual command:
    • diane.startjob, diane.ganga.submitworkers, diane.startclient
  • Ping the directory service with the command, to obtain a summary of the masters and workers :
    • diane.ds.command ping

New features

  • The life time of the Workers depends on the task available to run
  • Once the Worker receives the notification from the Master of job finished, it comes back to the DS to join to another Master
  • If no Master is available, the Worker dies
  • The Master periodically sends a message to the DS to notify it is still alive, containing the number of Workers actually assigned to it
  • The Directory Service keeps track of all the Masters and the number of Workers assigned to each of them

First implementation

  • The first working version of the Directory Service is based on the current assumptions:
  • is used only among Masters of the same user
  • can only have Masters running the same application
  • the DS algorithm for matching Masters and Workers is a simple round-robin queue
  • You can find this version sourcing the following file: /afs/cern.ch/project/asddat/lhcxx/kuba/DIANE/specific/slc3_gcc323/1.4.9-AvianFluDC2-2/DIANE/etc/environment.csh

Work to do

  • Create a log file from the Directory Service, with the following information for each Master, updated every 10 seconds:
    • timestamp masterid userid #workers #task(completed/all)
  • A python module to know the userid of the user running the master:
    • the DN(distinguished name) of the proxy certificate
    • the unix userid
  • Provide a static DSOID, so even if the Directory Service process dies, it's possible to restart a new Directory Service with a consistent status
  • Provide a master matching algorithm (that could also be pluggable into the application on-fly)

User scenario

-- PaolaDiMarcello - 18 June 2007

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng ds1.png r1 manage 15.4 K 2007-03-08 - 12:38 UnknownUser directory service
Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r24 - 2007-06-18 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback