Classification of ATLAS jobs by data interaction
The result should be: the average {MC/user analysis/reconstruction} job has/uses this many IOPS, input, output, storage, bandwidth.
Steps
:
- look at PANDAARCH table
- remove all the jobs where PRODUSERNAME is gangarbt (HC tests)
- remove all the jobs where PRODSOURCELABEL is prod_test, rc_test, test, or ptest (divide on User and production jobs)
- get number of input files, size of the input and output
Questions:
- there is no measurements of IOPS (measure it via own test jobs?)
- Brokering jobs for FAX data access does not take into account latency (not even measured, as ping is disabled on most sites). Can it be measured?
- Is "stagein" and "stageout" in "pilotTiming" the time it takes for the data to be transferred?
Overview: How do Cloud Providers provide Data?
- Local Disk on VM, Block Storage ...
- Result should be related to their pricing models.
- Market changes quickly, results may not be valid in a couple of months
So far:
- ran HC test jobs (Helix Nebula template) on all "Cloud" sites (specified as such in AGIS i.e. "Panda resource type":
HelixNebula
)
- created HC template for Azure (Panda-Queue is available)
- looking into FAX test jobs (by Ilija)
Overprovisioning
Goal: submit a job to a 4 core machine, that then runs simultaneously 5, 6, 7… threads to see if there is an improvement through this overprovisioning.
--
GerhardFerdinandRzehorz - 2015-05-20
Topic revision: r2 - 2015-05-27
- unknown