Information for CMS T2s concerning the deployment of multicore resources
CMS will employ
multicore pilots to allocate resources at its computing sites. These pilots are
partitionable, meaning that they can internally rearrange to schedule multiple single-core payloads, multicore payloads, or a combination of the two. CMS is starting the deployment of multicore resources at its supporting T2s in 2016, having competed the transition to T0 and T1s. For that, the glideinWMS pilot factories will be reconfigured to stop submitting single core pilots and use multicore pilots exclusively. Being part of the global pool, both production and analysis jobs will be executed as payloads of the same multicore pilots.
What needs to be setup from the site side?
CMS does not require any specific configuration of the site in order to run multicore. The only requirement is that the multicore request for resources be allowed. However, the site admins may decide, for example, to create additional queues in order to provide an alternative gatekeeper as compared to the one used by single core pilots. See below information concerning configuration guidelines specific to most popular batch systems, from our experience in the deployment to T1s.
If the site supports VOs which use single core jobs, a dynamic partitioning/defragmentation/reservations of the farm slots may be needed in order to allocate CMS multicore pilots (again, see information below on T1 experiences). A static partitioning (devoting some WNs to multicore pilots and a separated set of WNs to single core slots) is not favored or recommended, being less flexible and potentially inefficient, however it may be easier to configure as a starting point.
What info does CMS require from the site?
In order to setup and tune the new multicore entries in the glideinWMS pilot factories, CMS needs the following information from the site:
* Gatekeeper to which multicore pilots should be submitted: may be the same as the one used for single core pilots or not
* Information on the CE and batch system technologies, in order to tune the multicore rsl request to each CE syntax
* Typical number of cores per WN and memory per core at the farm: useful in order to setup proper pilot parameters
CMS will be using as default values those currently in use for the multicore pilots running at the T1s, namely the request should be for
8 cores per pilot, a
minimum of 16 GB and
a minimum of 30h of running time per pilot.
Initially, a multicore entry will be created in the CMS glideinWMS ITB environment (pilot testbed), using working similar entries from T1 sites as template. It will be used to submit pilots with some test payloads to the site. When the tests are successful, that configuration will be moved to the production environment.
Information on batch systems configuration
Depending on the VO composition of the site, as well as what requests (memory, number of cores, etc) these users pass to the farm, site admins may need to tune their configuration to in order to provision such requests. A site with a simple configuration, where resources are used mostly or exclusively by CMS, may not require any advanced customization of the setup.
However, in more complex cases, techniques such as dynamic partitioning of the farm, or dynamic scheduling or node draining may be needed in order to be able to allocate N_cores to the CMS multicore pilots, while also serving other VOs. The following information, classified by batch system technology, also provides examples on how CMS T1s are dealing with the complex task of handling CMS multicore pilots while also serving other VOs efficiently.
PBS/Torque:
There is no need to configure muticore slots: CMS pilots will request multiple single core slots in the same machine from the batch system (
nodes = 1:ppn=8) passing its request through the CE (CREAM syntax; rsl ="WholeNodes = False;
HostNumber = 1; CPUNumber = 8").
general info:
https://twiki.cern.ch/twiki/bin/view/LCG/Torque_Maui,
with PIC configuration:
https://twiki.cern.ch/twiki/bin/view/LCG/PICConfiguration
See also
contribution to CHEP2015 on the mcfloat algorithm
HTCondor:
general info:
https://twiki.cern.ch/twiki/bin/view/LCG/HTCondor
with RAL configuration:
https://www.gridpp.ac.uk/wiki/RAL_HTCondor_Multicore_Jobs_Configuration
general info:
https://twiki.cern.ch/twiki/bin/view/LCG/SGE
with KIT configuration:
https://twiki.cern.ch/twiki/bin/view/LCG/KITMulticoreConfig, in particular
these slides about reservation
general info on
CCIN2P3 setup:
https://indico.cern.ch/event/339461/contribution/0/attachments/665495/914792/StatusOfMulticoreCCIN2P3.pdf
LSF:
general info on CNAF setup:
https://indico.cern.ch/event/340994/contribution/0/attachments/669550/920369/INFN-T1_Multicore_Sep2014.pdf
See also
contribution to CHEP2015 LSF dynamic partitioning by S. Dal Pra (CNAF)
SLURM:
general info about CSCS setup:
https://indico.cern.ch/event/305623/contribution/0/attachments/581126/799985/20140415_MC_TaskForce_CSCS-LCG2_MCjobs_SLURM.pdf
Publishing queues, accounting, etc
If new gatekeepers are setup in order to allocate multicore resources, they should be made public to the information layer (
BDII) in order to be tested by
SAM jobs. SAM jobs, even if single-core, are much faster in execution and of higher priority, so that they should not interfere with the multicore operations.
Accounting of the usage of resources needs to be properly made, hence the
APEL parser accounting algorithm requires the option
"parallel = true" to be setup.
--
AntonioPerezCalero - 2016-02-17