PanDA share distribution and monitoring
WORK IN PROGRESS
What is the idea behind?
Currently
PanDA job assignment is done according to priorities: higher priority jobs are first in line to be sent to a site. At occasions, the current implementation is not sufficient to let the operations team control the jobs traffic and prioritize urgent campaigns without manually reshuffling priorities.
Therefore the request from ADC Ops is to have a multi-level/nested share system that allows changing the fraction of resources assigned to the different activities. One example:
- 50% of resources go to Analysis and 50% to Production.
- Out of the Production share, the operator wants to define that 50% goes to MC Production and 50 % to Group Production, and so on.
- If MC Production is not consuming their 50% share, then Group Production can overflow and fill it up - but not Analysis... Analysis can only overflow if Production is not filling the Production share.
Shares at batch level
Sites keep shares at batch level to control the ratio between production and analysis jobs, based on APF sending pilots with two different roles -production and pilot role- to the sites. Because of security reasons (e.g. an analysis user accidentaly deleting data), this situation will likely not change in the short-term future. Therefore, the
PanDA distributor system should only keep track of production shares.
Optimising resources
Sites usually have sets of different machines available (e.g. 8 core,16 GB | 16 core, 32 GB | 16 core, 48 GB) and ATLAS needs to match the different job types:
- MCORE: 8 core, 16 GB
- MCORE_HM: 8 core, 24 GB
- SCORE: 1 core, 2 GB
- SCORE_HM: 1 core, 4 GB
Depending on the combinations of jobs on a machine, we can be using the resources in a suboptimal manner. Obviously it is outside ATLAS'/PanDA's control which machine a job is going to end up on. However
PanDA can for example try to keep the flow of MCORE jobs to a site constant, to avoid draining machines to allocate MCORE jobs.
Design
The proposal for implementation is an independent and asynchronous service that will communicate with the JEDI and
PanDA server.
- The
PanDA Distributor:
- Figures out how many slots are available to ATLAS (e.g. 150.000)
- Figures out the capabilities of the slots
- Splits the slots according to the shares and capabilities
-
PanDA server brokerage:
- Queries the Distributor service
- Brokers
- Updates the Distributor service
There will be a Web interface to monitor and manage shares.
Open questions/things to discuss
- How will the batch shares and the Distributor live together? Focus only on production?
- Interactions between JEDI and Distributor.