PanDA share distribution and monitoring

Work in progress, under construction WORK IN PROGRESS

What is the idea behind?

Currently PanDA job assignment is done according to priorities: higher priority jobs are first in line to be sent to a site. In parallel, sites keep shares at batch level to control the ratio between production and analysis jobs (based on APF sending two different pilots to the sites). At occasions, the current implementation is not sufficient to let the operations team control the jobs traffic and prioritize urgent campaigns without manually reshuffling priorities.

Therefore the request from ADC Ops is to have a multi-level/nested share system that allows changing the fraction of resources assigned to the different activities. One example:

  • 50% of resources go to Analysis and 50% to Production.
  • Out of the Production share, the operator wants to define that 50% goes to MC Production and 50 % to Group Production, and so on.
  • If MC Production is not consuming their 50% share, then Group Production can overflow and fill it up - but not Analysis... Analysis can only overflow if Production is not filling the Production share.

Share_system.png

Design

The proposal for implementation is an independent and asynchronous service that will communicate with the JEDI and PanDA server. - The PanDA Distributor:

  1. Figures out how many slots are available to ATLAS (e.g. 150.000)
  2. Figures out the capabilities of the slots
  3. Splits the slots according to the shares and capabilities
- PanDA server brokerage:
  1. Queries the Distributor service
  2. Brokers
  3. Updates the Distributor service

There will be a Web interface to monitor and manage shares.

Open questions/things to discuss

  • How will the batch shares and the Distributor live together? What would be the consequences of having only one pilot?
  • The DB schema can't be flat as we talked last week
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2015-04-07 - FernandoHaraldBarreiroMegino
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback