Maria Dimou (chairperson), Maarten Litmaath (advisor & ALICE), Andrea Manzi (MW Officer), Lionel Cons (Monitoring expert, developer), David Cameron (ATLAS), Andrea Sciabà (CMS), Joel Closier (LHCb), Jeremy Coles (GridPP), Simone Campana (WLCG Ops Coord. chairman), Alberto Aimar, Alessandro di Girolamo, Stefan Roiser, Oliver Keeble, Nicolo Magini, Markus Schulz] (CERN/IT-SDC), Joăo Pina (EGI staged rollout manager), Cristina Aiftimiei (EGI operations officer & release manager), Peter Solagna (EGI operations manager), Rob Quick (OSG), [Helge Meinhard, Maite Barroso, Manuel Guijarro] (Tier0 service managers), [Ben Jones, Alberto Rodriguez Peon] (HEPiX configuration WG), [Patrick Fuhrmann, Andrea Ceccanti, Andy Hanushevsky, Gerardo Ganis, Lukasz Janyst, Michail Salichos, Andreas Peters, Massimo Sgaravatto] (Product Team Contacts). Better check the members of
. This static listing here may be out of date.
Task |
Deadline |
Progress |
Affected VO |
Affected Siites |
Comment |
Cream-CE and BDII Readiness verification |
End of September |
The latest Cream-CE and Bdii update have been installed at LNL-T2 site ( Legnaro). The configuration for the Job submission is ongoing ( Andrea S is following this up) |
CMS |
LNL-T2 |
30% |
MW Readiness "dashboard" design and first prototype |
End of September |
A first design /prototype to be delivered before the next Meeting ( 1st October). Activity not yet started. |
n/a |
n/a |
Lionel and Andrea M. with Maarten supervision. Lionel is working on the reported info visualisation part ("dashboard") JIRA:MW12 . AndreaM is testing the design of a DB to hold the info of rpms running at the site. JIRA:MW15 |
MW clients' deployment in grid.cern.ch |
a.s.a.p. |
On the long run, all clients in the PT table |
LHCb |
CERN |
Feedback from LHCb, given their experience would be very useful. The work is done by AndreaM (MW Officer) |
Action List from the July 2nd meeting |
The next meeting October 1st |
20% |
ATLAS, CMS, LHCb |
OSG, Edinburgh, GRIF, CERN, ? |
Read the Actions |
).
Product |
Contact |
Test Repository |
Production Repository |
Puppet modules' availability? y/n (*) |
Used by which experiment |
Tested at site |
Comments |
dCache |
P. Fuhrmann |
same as prod repo |
dCache site (recommended versions appear GREEN!) + EMI repo + EGI-UMD repo |
ready but not yet published |
All 4 |
IN2P3, PIC, NL_T1, NDGF, FNAL, BNL, Triumf, RWTH-Aachen (1 of a set of T2s) |
Green versions mean internal tests are done by the developers as well as functional tests + 2 or 7 day stress tests are done at FNAL as part of the dCache.org collaboration. The sites test with their own configuration the version they intend to upgrade to for a month (not necessary the green ones). |
StoRM |
A. Ceccanti |
same as prod repo |
StoRM site + EMI repo + EGI-UMD repo |
n - still using yaim |
ATLAS, CMS, LHCb |
INFN-T1, some T2s, e.g. QMUL and others (INFN Milano T2) |
|
EOS |
A. Peters |
doesn't exist |
git repo , rpm repo |
y |
All 4 |
CERN, FNAL, ASGC, SPBSU.ru (ALICE T2) |
ATLAS tests via Hammercloud |
xrootd |
A. Hanushevsky, G. Ganis, L. Janyst |
yum testing |
yum stable |
y - here |
All 4 |
OSG sites (SLAC, UCSD & Duke) and CERN |
xrootd is a basic dependency for EOS and also a dependency for CASTOR. It's also used by DPM and FTS3 to extend their functionality and some sites run it on top of GPFS or HDFS. |
DPM |
O. Keeble |
EPEL testing |
EPEL stable |
y - here & here |
All 4 |
DPM Collaboration (ASGC, Edinburgh, Glasgow, Napoli...) |
DPM needs an extra repo for stuff that doesn't qualify for EPEL, eg yaim or an Argus client. This is currently EMI and could be the WLCG repo in the future, in which case a testing area would be beneficial |
LFC |
O. Keeble |
EPEL testing |
EPEL stable |
y |
ATLAS, LHCb |
CERN |
Atlas and LHCb have both stated their plan to retire the LFC. For lfc-oracle (used only at CERN) the CERN koji/mash repo is used instead of EPEL. |
FTS3 |
O. Keeble |
EPEL testing |
EPEL stable |
y - here |
ATLAS, CMS, LHCb |
CERN, RAL (in prod), PIC, KIT, ASGC, BNL (for test) |
Non-WLCG VOs snoplus.snolab.ca, ams02.cern.ch, vo.paus.pic.es, magic, T2K also use FTS3. All sites and experiments take for the moment! the code from the grid-deployment afs directory because the fixes are available there much faster than according to the EPEL mandatory cycle. Recent FTS3 presentation |
VOMS client |
A. Ceccanti |
same as prod repo |
VOMS site + EMI repo + MAVEN for java stuff |
y - here & here & here |
All 4 |
INFN-T1 |
|
HTCondor |
Tim Theisen (HTCondor), Tim Cartwright (OSG Middleware), Rob Quick (OSG Ops) |
downloads (Development Release) |
downloads (Stable Release) |
y in OSG |
ALICE, ATLAS, USCMS , LHCb |
OSG (FNAL & ?) |
tested internally by HTCondor team, then in production at Univ. of Wisconsin, then by OSG before inclusion in their release. CMS position: The testing that OSG and in particular the glideinWMS developers do, is enough. CMS experts discuss with them which version should be deployed on the pilot factories, etc. From this point of view, HTCondor is seen as an experiment service, because of course its usage as batch system is not tested by CMS or the glideinWMS team. So, the feeling is that the current interaction between CMS and the HTCondor team is good enough and there is no strong motivation to set up any new testing system as we do in for the MW Readiness verification of other services. |
CVMFS |
J.Blomer, P.Buncic, D.Dykstra, R.Meusel |
here disabled by default |
cvmfs site |
y here |
All 4 |
RAL &CERN mostly but also BNL, SFU, TRIUMF, HPC2N, AGLT2, and others. |
Technical paper . Release procedure . D.Dykstra runs the OSG tests. |
ARC CE |
Balazs.Konya (hep.lu.se) |
EPEL testing (but also final tags in OS-specific repos) & ARC own repos for yum & apt release candidates |
Nordugrid web site , includes prod and test variants |
y here (supplied by UK T2s, untested at CERN) |
All 4 |
NDGF-T1, SiGNET, RAL-LCG2 |
D. Cameron is the WLCG ARC expert. Info provided by O.Smirnova |
CREAM CE |
M. Verlato |
INFN products' testing area |
EMI , UMD |
y here |
All 4 |
INFN-Padova & the EGI early adopters |
Project leader L. Zangrando |
BDII |
M.Alandes |
listed in this index , separate dir per OS |
gridinfo web site |
y here & here & (soon) here |
All 4 |
CERN & many (who?) |
2014/06/04 Presentation . Trac dir with test cases found, 3 yrs old though... |
ARGUS |
Valery Tschopp (SWITCH) |
none? |
in github & EMI/UMD(repo used by the sites) |
y here |
All 4? |
none regular |
Info provided by Andrea Ceccanti on June 10. The future of ARGUS support is brought to the GDB & MB in Sept 2014. Valery & Andrea C. made this twiki about its components |
UI |
C. Aiftimiei |
INFN products' testing area |
EMI , UMD |
n |
All 4 |
INFN-Padova |
Cristina does the testing |
WN |
C. Aiftimiei |
INFN products' testing area |
same as the UI |
y here & here |
All 4 |
INFN-Padova |
Cristina does the testing |
gfal/lcg_utils |
O.Keeble |
EPEL testing and WLCG Application Area AFS space |
EPEL stable |
n |
Atlas, CMS, LHCb |
none |
being replaced by gfal2 |
gfal2 (inc gfal2-util) |
O.Keeble |
EPEL testing and WLCG Application Area AFS space |
EPEL stable |
n |
Atlas, CMS, LHCb |
none |
Developers are Alejandro Alvarez Ayllon & Adrien Devresse |
with help from Ben Jones.
_Volunteer_ Site |
Experiment VO |
Middleware product to test |
Experiment application |
Set-up at the site for the MW Readiness effort |
Other Comments |
Triumf |
ATLAS |
dCache SE |
Panda pilot/DQ2/Rucio |
Simon wrote on 2014/05/12: For TRIUMF, the setup is done, dcache 2.6.25. Di explained that the infrastructure for MW Readiness verification is still with the same site name, of course it's defined as different resource queue in the ATLAS Panda production system. |
Contacts: Simon Liu, Di Qing. Site used by the PT too |
NDGF-T1 |
ATLAS |
dCache SE |
Panda pilot/DQ2/Rucio |
Gert wrote on 2014/05/14: We made a separate installation, but with a configuration similar to the production system. All services are concentrated on a single physical machine (no VM), except for actual storage. A few tens of TBs of storage has been diverted from the production system, but no tape system is attached. Currently running dCache 2.9, but we will deploy new software versions as they become available - sometimes even before officially released. |
Contact: Gerd Behrmann. Site used by the PT too |
Edinburgh |
ATLAS |
DPM SE |
Panda pilot/DQ2/Rucio |
This is a separate installation, with all services are concentrated on a single physical machine (no VM), including actual storage. Around 10TB of storage is available. The machine has the epel-test repository enabled and auto-updates DPM from that. It currently has DPM 1.8.8. |
Contact: Wahid Bhimji. Site used by the PT too, as part of the DPM collaboration |
QMUL |
ATLAS |
StoRM SE |
Panda pilot/DQ2/Rucio |
Chris wrote on 2014/05/19: Test SE running (se01.esc.qmul.ac.uk) with same backend storage - accessible by production worker nodes. Used for initial testing. se04 used as a production GridFTP node - used for production load testing of GridFTP. se03 - production SE - final production test. |
Contact: Chris Walker. Site used by the PT too |
INFN-T1 |
ATLAS |
StoRM SE |
Panda pilot/DQ2/Rucio |
Salvatore wrote on 2014/05/20: At CNAF we're going to set up a VM where the endpoint will be hosted with production-wise resources and it's going to be declared on gocdb. Such test instance will host FE, BE and gridFTP for StoRM. |
Contact: Salvatore Tupputi. Site used by the PT too |
OSG |
ATLAS |
Xrootd |
Panda pilot |
RobQ at EGI CF now. We shall have OSG participation at the next meeting. This is not urgent (see also our Mandate) |
Contact: Rob Quick. Site used by the PT too |
CERN_T0 |
ATLAS |
FTS3 |
DQ2/Rucio |
Michail's input: A parallel infrastructure exists that runs all new FTS3 versions for 1 month from within experiment workflows before it is considered 'production. Endpoints (gSoap): https://fts-pilot.cern.ch:8443 , (REST): https://fts-pilot.cern.ch:8446 , Monitoring: https://fts-pilot.cern.ch:8449 . Steve's input: Two FTS3 instances exist at CERN fts3.cern.ch and fts3-pilot.cern.ch The fts3-pilot runs rolling updates of the FTS service, operating system and configuration. At suitable times a current fts3-pilot version is installed on the production fts3 service.The pilot service is continuously tested by CMS and ATLAS. |
Contact: Michail Salichos & Steve Traylen. Michail "is" the PT |
T1_ES_PIC |
CMS |
dCache |
PhEDEx, HC, SAM |
info is coming a.s.a.p. Site mgrs now at EGI CF & HEPiX |
Contact: Antonio Perez. Site used by the PT too |
T2_FR_GRIF_LLR |
CMS |
CREAM CE/WN, DPM |
PhEDEx, HC, SAM |
Andrea wrote on 2014/05/13: We installed a Preproduction DPM cluster (which is meant to be up to date with latest MW releases). This is a small resources running on VM's, but enough to make tests that do not involve load. The cluster has been setup to run PhEDEx LoadTest transfers to/from the production storage of T2_FR_GRIF_IRFU. |
Contact: Andrea Sartirana. Site used by the PT too, as part of the DPM collaboration |
CERN_T0 |
CMS |
EOS |
PhEDEx, HC, SAM |
CERN/IT-DSS uses a well-sized preproduction service (EOSPPS) for final release validation before deployment. EOSPPS is monitored and configured in the same way as the five production instances: EOSPUBLIC,EOSALICE,EOSATLAS,EOSCMS,EOSLHCB |
Contact: Andreas Peters. He is the main PT member. |
CERN_T0 |
CMS |
FTS3 |
PhEDEx, ASO? |
Michail's input: A parallel infrastructure exists that runs all new FTS3 versions for 1 month from within experiment workflows before it is considered 'production. Endpoints (gSoap): https://fts-pilot.cern.ch:8443 , (REST): https://fts-pilot.cern.ch:8446 , Monitoring: https://fts-pilot.cern.ch:8449 . Steve's input: Two FTS3 instances exist at CERN, fts3.cern.ch and fts3-pilot.cern.ch. The fts3-pilot runs rolling updates of the FTS service, operating system and configuration. At suitable times a current fts3-pilot version is installed on the production fts3 service. The pilot service is continuosly tested by CMS and ATLAS. |
Contact: Michail Salichos & Steve Traylen. Set-up identical to the one for ATLAS higher in this table. |
T2_IT_Legnaro |
CMS |
CREAM CE |
HC, glidein factory?, SAM |
Massimo wrote on 2014/05/13: A virtual machine is allocated where to install the CREAM CE MW to be verified. Guidelines are needed for this machine installation (e.g. which MW repo should we consider ?, should the CE appear in the site bdii ?). This CREAM CE will use the same WNs used by CREAM CEs used in production and the same batch system (LSF in our case) installation. Should a separate LSF queue be used for these activities? |
Contact: Massimo Sgaravatto. |
Site Name |
If, How and Where you publish the MW versions you run in production. |
How you use the Baseline versions' table given that the "baseline version" number doesn't necessarily reflect all individual updates of the packages in the dependencies. |
Info provided by |
Comment |
CERN |
BDII |
As a general reference to deploy services for WLCG, we keep up to date our production services with the versions referenced there, if not higher. |
Maite Barroso |
|
ASGC |
We publish version information only by BDII. |
We basically follow baseline version for service upgrade after we finish internal tests on testbed. |
Jhen-Wei Hwang |
|
BNL |
Since we are an OSG site, MW versions are captured in the OSG version we run. Info is published via the OSG BDII, which is fed to the WLCG interop BDII. http://is.grid.iu.edu/cgi-bin/status.cgi Content for the T1 is here which includes GlueHostApplicationSoftwareRunTimeEnvironment: OSG 3.2.7 There is also some info about Globus CE, dCache, and SRM versions in the BDII info for those services, e.g. GlueCEImplementationName: Globus, GlueCEImplementationVersion: 4.0.6 etc... |
We don't use the table. Minimum platform standards get discussed and decided upon between OSG and WLCG, then OSG includes the necessary packages in the OSG middleware release. |
John Hover |
|
FNAL |
The versions are published in BDII and the twiki page for the WLCG Ops Coordination meeting. |
We don't. We rely on OSG where possible to deliver the baseline. |
Burt Holzman |
|
JINR |
In the BDII via our site BDII |
We do not use this table. We follow updates in multiple repositories: EMI, EPEL, dCache, WLCG, so our installed MW usually have a bit newer version than Baseline. In practice, this scheme has advantages and disadvantages, but we have chosen this method for a long time and do not have big problems. |
Valery Mitsyn |
We (SCODs) had forgotten to include the site in the wlcg-tier1-contacts e-group till 2014/05/12, this is why MariaD emailed the questions late and separately. |
IN2P3 |
At CC-IN2P3 the information system provided by the middleware is used to publish the version in production |
The page is checked frequently and the updates are done most of the time following the recommendations. |
Vanessa Hamar |
|
INFN-T1 |
BDII |
Our installed versions are always equal or greater than the ones listed in the baseline. We upgrade from time to time, particularly to fix security problems. |
Andrea Chierici |
|
KIAE |
Our site BDIIs |
We usually don't use it: we're trying to keep up with various recommendations straight from the vendors (EMI, UMD, dCache) and experience gained by other sites or our testbeds. And we try to be aligned to the WLCG/EGI/VO requirements for the (minimal) needed versions. |
Eygene Ryabinkin |
We (SCODs) had forgotten to include the site in the wlcg-tier1-contacts e-group till 2014/05/12, this is why MariaD emailed the questions late and separately. |
KISTI |
We, KISTI, are not sure whether the MW versions are being published by BDII as well or not but currently we do not publish the information anywhere (we keep the information up-to-date on the internal twiki) |
The link has been useful for us e.g. EMI-2 and EMI-3 migration. We usually do not keep the latest version of middleware unless there is a security issue. |
Sang-Un Ahn |
|
KIT |
As many others we publish the MW versions in BDIIs only. |
Almost all our MW services have baseline versions or higher. We perform updates to the recommended versions following the release cycles, but sometimes also upon requests from user communities. The updates come in production after thorough tests and validation by all supported VOs. For our dCache instance we try to stick to a so-called "Golden Release" in order to avoid often upgrades. If you need we could provide exact versions for all services we are running. |
Pavel Weber |
|
NDGF |
Automatically in the infosystem. The MW versions not being published in the infosys we don't publish anywhere else. |
Not at all. |
Mattias Wadenstein |
|
NL_T1 (NIKHEF) |
Isn't this information now amply published in the BDII on a per-service basis? We anyway do not publish this information publicly anywhere else. |
We follow the EGI UMD release cycles, with the caveat that upgrades to services that are actually working and for which no service change requests are explicitly requested by users (including those outside wLCG) are upgraded based on personnel availability. Also any changes are done only after internal validation and re-testing at Nikhef before they are deployed to 'production'. Deployed services are available to all our user communities, which makes EGI the more obvious place to coordinate that through UMD. I personally don't remember we referred to the wiki above after 2009 or so ... |
David Groep |
|
NL_T1 (SARA) |
For SURFsara, almost the same applies as for Nikhef. We publish our MW versions in the BDII. |
We follow the UMD release cycles, with the exception of dCache, which we get directly from www.dcache.org. We regularly report our dCache version on the twiki pages of the WLCG ops coordination meeting. |
Onno Zweers |
|
PIC |
BDII |
We install versions equal or greater than the ones listed in the baseline (with some excepcional cases, which are widely discussed, where interoperability of some underlying services is affected). We promptly react and upgrade, particularly to fix security problems. |
Pepe Flix |
|
RAL-LCG2 |
BDII |
We don’t really use this twiki page with any regularity. We carry out more or less rolling updates following the MW release cycles. We do pour own testing (and participate in staged rollout and early adopter activities) before deploying new versions of MW to production. |
Ian Collier |
|
Triumf |
The only place we publish the MW versions is information system (BDII) |
We always try to follow the Baseline versions' table. However, we do some tests before upgrading the middleware, and only upgrade to the version listed in the Baseline versions' table or newer after our tests passed. |
Di Qing |
Unclear for MariaD how the individual updates are reflected in the global version number. |
This allows information to be updated in case of absences.