PandaJobSchedulerV000301

Introduction

The PandaJobScheduler is a component of PanDA, the OSG executor that is being developed to replace CaponE in the ATLAS production system ProdSys. Please refer to PanDA or ProdSys wiki pages for further details about the projects. This document describes the status and progress of the job scheduler for the new system. This document describes The JobScheduler and its features, how to install it and how to configure it. This document refers to version 0.3.1

Features

The Panda Job Scheduler allows to send pilots or test probes to different CEs. pilot2 jobs are the current main jobs: they include timeouts, multiple retries during file transfers and error recovery. Enabling the multipilot feature pilot2m allows to run analysis jobs in parallel to production jobs.

Installation

The following command will install the current Panda Job Scheduler
pacman -get GCL:PandaJS
It is included also in the full Panda package (pacman -get GCL:Panda)

Use

Invoke the main program in the panda/jobsubmitter directory to get a help page: python pusher.py -h The current version as PandaJS 0.3.1
    usage: python pusher.py  -w  -p  [-i ] [-e ][-c ]
    where:
                is the URL of the http web server that the pilot job should connect to
                is the port on which the web server listens on
                is a unique identifier that Panda can use to find the owner of a pilot (random number if not provided)

    options:
     h : help
     w  : central Panda server URL, used by pilotX jobs
     p  : central Panda server port, used by pilotX jobs
     i   : ID for the panda job submitter (default is PJS_XXXX)
     e  : pilot executable file name (path absolute or relative to the current dir)
     c  : sends the pilot to CE instead of using scheduling
     t N : max number of itarations
     j  : selects pilots of type  instead of the default production pilot2
           e.g. 'test' 'pilot', 'pilot2', 'pilot1a';
     x  : arguments to pass to the pilot (in the form p_value[,p2_value2])

Available Job Types: pilot3 SimpleJob Job xfertest2 test pilot2 pilot1a pilot2m pilot


Examples:
    python pusher.py -t 2 -c MWT2_IU -j xfertest2 -x j=http://iut2-grid1.mwt2.org:8000/dq2/,q=http://iut2-grid1.mwt2.org:8000/dq2/,i=DQ2ProdClient2.py,z=wo -w https://gridui000.usatlas.bnl.gov -p 26443 >& t2iux37.out

Configuration

The file panda/jobsubmitter/siteinfo.py allows to set and modify information about existing Computing Elements (CEs). Other options are available via command line. More in depth changes (like selecting a different scheduling algorithm) require source file modification. Start from the main program (panda/jobsubmitter/pusher.py) and read the comments in the source code.

Questions

How many pilots will run on my CE?

The system is sending up to GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE (a CondorG parameter, currently set to 400) jobs max but no more than 50 pending one (not to overload the gatekeeper). So if there are no free CPUs once the pending jobs become 50 no new pilot is submitted (even if there are 0 running). If the CE has available CPUs it will receive up to 400 pilots. The 50 pending pilots is 'about 50', due to the monitoring (feedback) delay.


Major updates:
-- MarcoMambelli - 01 Mar 2007



Responsible: MarcoMambelli

Never reviewed

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2007-06-11 - StefanoAntonelli
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback