TWiki
>
LHCb Web
>
LHCbComputing
>
DiracProject
>
DIRACWMSPrioritySchema
(2009-09-07,
AdriaCasajus
)
(raw view)
E
dit
A
ttach
P
DF
-- Main.AdriaCasajus - 13 Feb 2009 ---+ DIRAC Priority schema %TOC{title="Contents:"}% This page describes how the DIRAC handles priorities ---++ Scenario There are two user profiles: * Users that submit jobs on behalf of themselves. For instance normal analysis users. * Users that submit jobs on behalf of the group. For instance production users. In the first case, users are competing for resources, and on the second case users share them. But this two profiles also compete against each other. DIRAC has to provide a way to share the resources available. On top of that users want to specify a "UserPriority" to their jobs. They want to tell DIRAC which of their own jobs should run first and which should ran last. DIRAC implements a priority schema to decide which user gets to run in each moment so a fair share of CPU is kept between the users. ---++ Priority implementation DIRAC handles jobs using _TaskQueues_. Each _TaskQueue_ contains all the jobs that have the same requirements for a user/group combination. To pioritize user jobs, DIRAC only has to prioritize _TaskQueues_. To handle the users competing for resources, DIRAC implements a group priority. Each DIRAC group has a priority defined. This priority can be shared or divided amongst the users in the group depending on the group properties. If the group has the *JOB_SHARING* property the priority will be shared, if it doesn't have it the group priority will be divided amongst them. Each _TaskQueue_ will get a priority based on the group and user it belongs to: * If it belongs to a *JOB_SHARING* group, it will get 1/N of the priority being N the number of _TaskQueues_ that belong to the group. * If it does *NOT*, it will get 1/(N*U) being U the number of users in the group with waiting jobs and N the number of _TaskQueues_ of that user/group combination. On top of that users can specify a "UserPriority" to their jobs. To reflect that, DIRAC modifies the _TaskQueues_ priorities depending on the "UserPriority" of the jobs in each _TaskQueue_. Each _TaskQueue_ piority will be P*J being P the _TaskQueue_ priority. J is the sum of all the "UserPriorities" of the jobs inside the _TaskQueue_ divided by the sum of sums of all the "UserPiorities" in the jobs of all the _TaskQueues_ belonging to the group if it has *JOB_SHARING* or to that user/group combination. ---+ Dynamic share corrections DIRAC includes a priority correction mechanism. The idea behind it is to look at the past history and alter the priorities assigned based on it. It can have multiple plugins but currently it only has one. All correctors have a CS section to configure themselves under /Operations/Scheduling/<setup>/ShareCorrections. The option Operations/Scheduling/<setup>/ShareCorrections/ShareCorrectorsToStart defines witch correctors will be used in each iteration. ---++ WMSHistory corrector This corrector looks the running jobs for each entity and corrects the priorities to try to maintain the shares defined in the CS. For instance, if an entity has been running three times more jobs than it's current share, the priority assigned to that entity will be one third of the corresponding priority. The correction is the inverse of the proportional deviation from the expected share. Multiple timespans can be taken into account by the corrector. Each timespan is weighted in the final correction by a factor defined in the CS. A max correction can also be defined for each timespan. The next example defines a valid WMSHistory corrector: <verbatim> ShareCorrections { ShareCorrectorsToStart = WMSHistory WMSHistory { GroupsInstance { MaxGlobalCorrectionFactor = 3 WeekSlice { TimeSpan = 604800 Weight = 80 MaxCorrection = 2 } HourSlice { TimeSpan = 3600 Weight = 20 MaxCorrection = 5 } } lhcb_userInstance { Group = lhcb_user MaxGlobalCorrectionFactor = 3 WeekSlice { TimeSpan = 604800 Weight = 80 MaxCorrection = 2 } HourSlice { TimeSpan = 3600 Weight = 20 MaxCorrection = 5 } } } } </verbatim> The previous example will start the WMSHistory corrector. There will be two instances of the WMSHistory corrector. The only difference between them is that the first one tries to maintain the shares between user groups and the second one tries to maintain the shares between users in the _lhcb_user_ group. It makes no sense to create a third corrector for the users in the _lhcb_prod_ group because that group has *JOB_SHARING*, so the priority is assigned to the whole group, not to the individuals. Each WMSHistory corrector instance will correct at most x[ 3 - 1/3] the priorities. That's defined by the _MaxGlobalCorrectionFactor_. Each instance has two timespans to check. The first one being the last week and the second one being the last hour. The last week timespan will weight 80% of the total correction, the last hour will weight the remaining 20%. Each Timespan can have it's own max correction. By doing so we can boost the first hour of any new entity but then try to maintain the share for longer periods. The final formula would be: <verbatim> hourCorrection = max ( min( hourCorrection, hourMax ), 1/hourMax ) weekCorrection = max ( min( weekCorrection, weekMax ), 1/weekMax ) finalCorrection = hourCorrection * hourWeight + weekCorrection * weekWeight finalCorrection = max ( min( finalCorrection, globalMax ), 1/globalMax ) </verbatim>
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r3 - 2009-09-07
-
AdriaCasajus
Log In
LHCb
LHCb Web
LHCb Web Home
Changes
Index
Search
LHCb webs
LHCbComputing
LHCb FAQs
LHCbOnline
LHCbPhysics
LHCbVELO
LHCbST
LHCbOT
LHCbRICH
LHCbMuon
LHCbTrigger
LHCbDetectorAlignment
LHCbTechnicalCoordination
LHCbUpgrade
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LHCb
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback