Environment variables for multi-core jobs - Job to Machine channel
DRAFT 0.9 (31/01/2014)
Abstract
This document provides a proposal for the definition of a communication channel from the Job to the Machine.
The objective of this communication channel is to provide the machine owner enough information to pick "the best job" to vacate when needed.
The specification is optimized for the pilot use case in mind, but checkpointable user jobs may be a good fit as well.
Introduction
The proposed schema builds on the work done for the Machine to Job communication channel, but are targeting the opposite data flow.
The proposed attributes cover the information that multi-job pilots own as part of their job scheduling activity.
Below you can see a schematic view of this information; for more details, refer to the CHEP paper ( published paper ,
talk
)
Most pilot jobs will typically start more jobs, if possible, thus extending any job termination estimates. There are however times when a pilot is ready to be retired,
and does not expect more user jobs to be fetched. We provide an attribute to communicate this to the resource owner.
In addition, we want to allow a user to express a preference for one job versus any other job, in the form of a priority number.
Definitions
Environment variables
For each job, one environment variable has to be set, with the following name:
This environment variable is the base interface for the user payload. They must be set inside the job environment.
Directories
The directories to which the environment variable points contains job specific information. The file name is the key, the contents are the values.
Use cases
Use cases to be covered are
Identifier |
Actors |
Pre-conditions |
Scenario |
Outcome |
(Optional) What to avoid |
9. |
site |
site batch system |
The site wants to know how much longer the job will be running |
|
10. |
site |
site batch system |
The site wants to know the amount of draining waste, if the job was asked to drain |
|
11. |
site |
site batch system |
The site wants to know the amount of waste, if the job was killed |
|
12. |
site |
site batch system |
The site wants to pick the job that is the least critical for the user |
|
Requirements
- The propose schema must be unique and leave no room for interpretation of the values provided.
- For this reason, basic information is used which is well defined across sites.
- The information is expected to be dynamic.
- Files will be owned by the user and reside on a /tmp like area.
List of requirements
Job specific information which are:
- found in the directory pointed to by $JOBSTATUS
- owned by the user who is executing the original job. In the case of pilots this would be the pilot user at the site.
- created by the job, and will be updated several times during its lifetime
Identifier |
File Name (key) |
Originating use cases |
Value |
(Optional) Comments |
3.1 |
used_CPU |
10,11,12 |
Number of used cores by the job. |
Must be locked before any of the other files in this section are either read or written to. Must be less or equal than allocated_CPU. |
3.2 |
last_job_start |
11 |
UNIX time (integer) |
|
3.3 |
first_exp_job_end |
10 |
UNIX time (integer) |
Good faith estimate |
3.4 |
last_exp_job_end |
9,10,12 |
UNIX time (integer) |
Good faith estimate |
3.5 |
last_max_job_end |
9,10,12 |
UNIX time (integer) |
Enforced limit |
3.6 |
add_uncom_time |
9,11 |
CPU seconds (integer) |
|
3.7 |
add_final_exp_waste |
10 |
CPU seconds (integer) |
Good faith estimate |
3.8 |
can_postpone_last_job |
9,10 |
string, either "True" or "False" |
If the job decides to revert from "False" to "True", it should not update any of the other values for a significant amount of time. |
3.9 |
priority_factor |
12 |
Integer, higher is better |
The semantics is user specific, and should not be used to compare jobs of different users. |
Notes:
Most of the above values are meant to be used together, so both readers and writers are requested to lock the
used_CPU file before either reading or writing any of the files.
Conversion formulas
While the above information may be useful on their own, they are usually combined to produce numbers used to make decisions.
Below are the formulas used to satisfy the originating use cases.
Originating use case |
Symbolic name |
Formula |
(Optional) Comments |
9 |
remaining_time |
$last_exp_job_end-$now or $last_max_job_end-$now |
There are two possible formulas, depending on the level of confidence the user has in the estimate |
10 |
draining_waste |
($allocated_CPU-$used_CPU)*($first_exp_job_end-$now)+$add_final_exp_waste |
|
11 |
kill_waste |
$add_uncom_time+$used_CPU*($now-$last_job_start) |
|
External Resources
TBD
Impact
Pilot frameworks and site batch system configurations will have to be modified in order to profit from the above declarations.
Recommendations
Conclusions
The new mechanism allows to propagate basic information from the user payload to the machine owner. The interface is independent of the pilot framework and/or batch system in use.
References
Igor's presentation at
CHEP 2013
--
IgorSfiligoi - 31 Jan 2014