TWiki
>
LCG Web
>
WLCGCommonComputingReadinessChallenges
>
WLCGOperationsWeb
>
WLCGOpsCoordination
>
DeployMultiCore
>
Minutes20140204
(2014-02-20,
AlessandraForti
)
(raw view)
E
dit
A
ttach
P
DF
T. Hartmann, A. Forti, G. Roy, J. Belleman, D. Traynor, Carles, A. McCrea, A. Perez Calero Yzquierdo A. Filipcic, A. Lahiff, D. Crooks, A. Sedov, Andrea, Stefan Roiser, J. Hernandez Calama, S. Skipsey J. Templon, A. McNab, C. Walker, A. DiGirolamo, C. Wissing, Rod Walker, Manfred Alef Comments after the CMS presentation Simone question: this mechanism works better if you have the length of the pilot much larger than the length of the executable. You waste time at the end so the longest is the pilot the least you loose. Based on CMS jobs what is the optimal or the minimal life time for the pilot job. Antonio: We need to tune it. If the last job has to be killed the relative loss is small. Taking into account what the sites want. Chris: slide 29 once you got the 8 slots... it means sites have already done a significant chunk of the scheduling. multi-VO support is not compatible with long pilots aside of the draining issue to apply changes. Need exchanges with sites. Long pilots from CMS would interfere with atlas workload. In an atlas site giving spare cycle to CMS like QMUL this can result in Atlas not getting the resources when needed. Rod: that's easy you can confine CMS in a shorter queue. But that means the CMS model still needs work on the batch system to add extra queues. Jeff: One of the things that make things easier in scheduling is enthropy, reducing enthropy makes scheduling more difficult. To do that requiring longer jobs or more resources makes enthropy more difficult. Confusing predictability with ability to schedule. You need also predictability. Predictability doesn't matter at all for single core. Predictability might help with multicore but reduced enthropy hits at all level. If you have alrge enthropy you can fill the gaps but to fill the gaps you need to know how long the job will be. If there's peaks and valleys that's site going to waste and that's thousands of euros wasted. Your efficiency of the CMS model depends on filling the pilots and the ability to guarantuee pilots are full. It doesn't help with other VOs, high predictability would force system administator to allocate resources to avoid interference with other users different patterns. Even if the experiments are free to fill the pilot as they want having pilots all of the same length doesn't help the batch system scheduler. High entropy helps the batch system scheduler. If the pilots cannot be guarantuued to be full the pilot could kill itself if after sometime it doesn't receive any workload. However this would result in a degraded predictability. Short MC jobs can be used to mop up the waste space inside the pilot. Time is guarantueed by the batch system and queries the machine-job features are not necessary, a job knows the time it has left. Antonio: Number of cores should be tested by application people. They should tell what is the best for the application. Simone: 8 is a magic number in atlas it wasn't chosen randomly but it was a compromise between reducing the memory consumtpion (too few cores) and avoid the serial component taking over (too many). Jeff can you mix different streams multicore and single core? It would be useful if experiments could turn the know and increase one or the other according to necessity. What is ATLAS pilot lifetime in general? do they have predictable length? No, analysis jobs are typically short mostly below 4h, production is a mixture depending on the application. Discussion on advantages of predictability vs high entropy. High entropy helps the scheduler filling the gaps in any case,high predictability helps with multicore but doesn't work well with a mixture of single core. In general it is recognised that high entropy is preferable as it works in any case. -- Main.AntonioPerezCalero - 03 Feb 2014
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r2 - 2014-02-20
-
AlessandraForti
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback