TWiki
>
LCG Web
>
ManagementBoard
>
WorkloadManagementTechnicalEvolution
>
WMTEGInformationServiceSummary
(2012-02-03,
MaartenLitmaath
)
(raw view)
E
dit
A
ttach
P
DF
---++ Use of the IS by WLCG experiments The baseline for usage of the IS in WLCG comes from the document _WLCG Information System Use Cases_ [1]. Updates where then provided primarily at the May 2011 GDB [2]. ---+++ ALICE Each ALICE site has an ALICE VO-box installed within the site boundary and responsible for job submissions to local CEs. It queries the resource BDII of each CE to regulate the flow of jobs to the site through the pilot-based AliEn framework. The list of CEs to be used is hard-coded in the AliEn LDAP service. The BDII is also used to identify which CEs are in production mode, looking at the CE status (only production CEs are used to submit jobs). For SAM tests ALICE depend on the WMS, which relies on the top-level BDII. ---+++ ATLAS ATLAS maintains a cache of experiment-specific info for its software components. (PanDA, dashboards, etc.) This info is collected from the BDII, but also from other sources (GOCDB, the OSG Information Management System, other services) and involves static and quasi-static information like downtimes, queues being set offline, blacklisted sites. BDII is used by PanDA to discover and keep current the list of known endpoints/sites. The most dynamic part is SoftwareRunTimeEnvironment and the status of CEs. BDII is periodically scanned and its info cached. ATLAS maintains a site configuration db in Oracle site attributes may be added to this database when needed; for example, recently, fields to control many-core queues were defined. A common source of problems is related to the publication of disk space (it would be desirable to know how much is _in use_ and how much is _available_ instead; this is not necessarily coincident with how much is _installed_). ATLAS relies on services such as FTS and, for the time being, the WMS (SW installation/validation jobs, SAM tests); these services query the BDII, which must therefore work reliably. ---+++CMS The BDII information used by CMS is quasi-static. For example, in CRAB (WMAgent) there are queries for CE status, SoftwareRunTimeEnvironment, CEUniqueId or Close SE [static match for inclusion/exclusion], OS version, but typically with a trust but verify model. For pilot factories, the list of sites is not automatically updated based on BDII info. Site attributes are not auto-updated (and, like ATLAS, CMS may define custom site attributes). The requirements to the IS are for relatively basic items, which should be easy for the sites to operate and not error-prone. CMS does not use dynamic info (slot utilization, system usage), which were found frequently unreliable / out-of-date. CMS suggests that it is better to have a fast, simple, reliable, quasi-static IS. It is also questionable how much benefit pilots would gain from dynamic info. For example, CMS validates nodes directly before the glideins start on a given node. This avoids most black-hole problems. CMS also uses services like FTS and WMS, which rely on the BDII. ---+++LHCb DIRAC, the LHCb software framework, does not basically use the BDII and incorporates its own workload management system. Endpoint info (e.g. list of CEs) is statically defined in the DIRAC Configuration Service. Like others, LHCb uses services like FTS and WMS, which rely on the BDII. ---+++WMS Requirements and ranking may be specified in the JDL to an extent that allows the WMS to work without querying the BDII. The matchmaking and its corresponding IS dependency can also be bypassed by providing the "-r" option with the designated CE as argument to the job submission command. Furthermore, the WMS can use its _replanning_ feature [3], used to remove a job from a queue after some timeout, and to automatically resubmit it to another queue, to build its own resource ranking without querying the BDII. ---+++Conclusions WLCG experiments all have developed (totally or in part) their own software frameworks and tend to use the BDII for static/quasi-static information, and in general for limited purposes; the general pattern is often use for bootstrap, then refine with our own heuristics. Quality control of the IS content is important and needs to be automated. WLCG experiments have learnt by experience that no info is at least not worse than unreliable dynamic information. Reliable storage information would be certainly desirable, but it is currently not available. Having cached info in the IS is considered to be vital to overcome possible services outages. Other, future services related e.g. to the integration of Cloud resources might possibly use the IS; however, it is still early days in that area and it is difficult to draw conclusions. It is also not clear to what extent a (more) dynamic IS would benefit pilot-job based frameworks, based on late-binding of jobs to slots. It is likely that in the future WLCG experiments will continue to need mostly a simple discovery service. A summary of initiatives and ideas for a common service registry across heterogeneous infrastructures can be found in [4]. ---+++Notes [1] WLCG Information System Use Cases, https://twiki.cern.ch/twiki/pub/LCG/WLCGISArea/WLCG_IS_UseCases.pdf [2] May 2011 GDB, https://indico.cern.ch/conferenceDisplay.py?confId=106644 [3] EMI 1 WMS v.3.3.0, http://www.eu-emi.eu/products/-/asset_publisher/z2MT/content/wms [4] Towards an Integrated Information System, Amsterdam Dec 1, 2011, https://www.egi.eu/indico/conferenceDisplay.py?confId=654 -- Main.DavideSalomoni - 03-Feb-2012
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r2 - 2012-02-03
-
MaartenLitmaath
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback