Different nature of non-pledged resources
Non-pledged resources could be of different nature
- Commecial clouds
- HPC
- BOINC
- So called mid-SLA
- ...
Required actions and known issues
In order to integrate usage of such resources into APEL, there are few important matters to be addressed:
- Describe these resources from the topology point of view
- Define how accounting report to APEL is generated. There are several alternatives :
- Enable resource provider to generate a report
- Since experiments do account usage of all available resources via their work load management systems, inject information from the experiment accounting system into APEL (smry records)
- Benchmarking . While accounting of such resources in terms of wallclock time is normally available at least via experiment work load management systems, benchmarking of such resources if experiments did not perform this task can be problematic
Issues to be solved:
- How do we tag those resources in order to indicate that these are non-pledged resources, even though they can be provided by T1 or T2
- How do we make sure that we avoid double counting. Supposing non-pledged resources are reported from the site A, while for site B this reporting is not enabled. If we enable import of the accounting information of non-pledged resources into APEL from the experiment-specific system, need to make sure that site A non-pledged usage is accounted only once
- Agree on the policy against which site these resources are accounted. For example, it could be either site which provides HW where jobs are running or site which sets up a queue to which these resources are attached. Ensure that we implement proper mapping of usage vs site in our data flow according to the agreed policy. The most evident solution is to resolve it on the queue level and correspondingly do aggregation of the accounting information on the queue level. However in some cases can be misleading.
Experiment input
Experiments were asked to provide input for the following questions:
- Whether your experiment would be interested that opportunistic resources are accounted in APEL? (we alreday collected this input in the beginning of the task force activities, just to confirm that nothing changed)
- If yes, what are possible scenarios?
- Whether these opportunistic resources are already accounted in the experiment-specific systems?
- How/whether benchmarking of such resources performed?
- How these resources are described regarding topology?
- Would it be possible to retrieve accounting data for the opportunistic resources from the experiment-specific systems via APIs?
Answers are summarized in the table below
Experiment |
Interested |
Accounted already in the experiment-specific systems? |
Benchmarked? |
Topology |
API from the experiment-specific systems |
ALICE |
Not interested |
Yes |
Not yet, but will be benchmarked with DB12 |
ALICE toplogy in ML |
- |
ATLAS |
Interested |
Wallclock is recorded for every type of resource |
HPC/cloud/boinc not benchmarked yet. Plan is to run the short benchmark (DB12) in the pilots but this doesn't work on all the resources. Cloud resources in Canada are using the CERN-benchmark suit already asynchronously. HPC and boinc are still under discussion. We don't have yet a way to store the results which is also still under discussion. |
It depends on the resources. Over pledged resources installed in the grid way are declared in the usual way. Other resources like HPC/clouds and boinc aren't because they can come and go. This was discussed at length in the Information System TF. Many of these resources don't have explicitely declared end points and sometimes don't even have PandaQueues as they get dynamically added to existing pledged resources. |
Dashboard API |
CMS |
Interested |
Yes, wallclock is accounted through the dashboard and the HTCondor global pool |
Benchmarking is not yet implemented. The plan would be to benchmark resources with fast benchmark and to keep results on dashboard in every job record. |
These resources are mapped to sites created in SiteDB with extension 'Opportunistic' |
Dashboard API |
LHCb |
Interested |
Accounted already in DIRAC. The resources we have in mind would be managed by Vcycle or Vac so we would just use their existing support for publishing to APEL. |
It’s only done with DB12-in-job for our internal purposes. The Vac resources use the site’s HS06 measurements, but they are in GOCDB so they look like conventional grid sites to APEL. The Vcycle resources don’t have HS06 benchmarks. If we were using DB12 for APEL accounting, then would use the measured value (maybe at the job end too?) in the job records. |
Topology is described inside Dirac. APEL requires site to be registered in GocDB which could be an big overhead for commercial providers. |
Dirac API exists. Though LHCb preference would be to use Vcycle or Vac native APEL reporting |
--
JuliaAndreeva - 2017-03-08