Middleware Readiness WG Archive
Note: this is the archive of the first years. A more recent state of affairs is described at MiddlewareReadiness.
Mandate
- To ensure that operation won't suffer when a site is called to upgrade to version X of Middleware (MW) package Y, i.e. that versions are not only tested and certified internally by the Product Teams (PT), validated by the EGI staged rollout but also ready to be used in operation at the sites, integrated into the experiment workflows.
- The Working Group (WG) concentrates on european WLCG sites, as the OSG sites are covered by an experiment expert at the site who takes care of the software to be deployed. Any OSG-EGI inter-operability issues are handled at the monthly meeting, (3rd Friday of each month at 16hrs CET/CEST) from where we can draw advice.
Members
Maria Dimou (chairperson 2013-2016), Andrea Manzi (chaiperson 2017, MW Officer), Lionel Cons (Monitoring expert, developer -until March 2016), Maarten Litmaath (advisor & ALICE), David Cameron (ATLAS), Andrea Sciabą (CMS), Marcelo Vilaca Pinheiro Soares (LHCb), Jeremy Coles (GridPP), Joćo Pina (EGI staged rollout manager), Peter Solagna (EGI operations manager), Rob Quick (OSG), [Helge Meinhard, Jerome Belleman] (Tier0 service managers), [Ben Jones, Alberto Rodriguez Peon] (HEPiX configuration WG), [Patrick Fuhrmann, Andrea Ceccanti, Andy Hanushevsky, Gerardo Ganis, Lukasz Janyst, Andreas Peters, Massimo Sgaravatto] (Product Team Contacts). Better
check the members of the e-group
wlcg-ops-coord-wg-middleware@cernNOSPAMPLEASE.ch. This static listing here may be out of date.
Coming Events
Our JIRA tracker
Please click
HERE!
. Alternative
Dashboard view
.
Meetings
Product Teams
NB!! This product list is not exhaustive! Some products were suggested at the
3rd meeting of this WG
, the LHCb Computing Workshop of 2014/05/22 and
the 2014/05/22 WLCG Ops Coord meeting).
Product |
Contact |
Test Repository |
Production Repository |
Puppet modules' availability? y/n (*) |
Used by which experiment |
Tested at site |
Comments |
dCache |
P. Fuhrmann |
same as prod repo |
dCache site (recommended versions appear GREEN!) + EMI repo + EGI-UMD repo |
ready but not yet published |
All 4 |
IN2P3, PIC, NL_T1, NDGF, FNAL, BNL, Triumf, RWTH-Aachen (1 of a set of T2s) |
Green versions mean internal tests are done by the developers as well as functional tests + 2 or 7 day stress tests are done at FNAL as part of the dCache.org collaboration. The sites test with their own configuration the version they intend to upgrade to for a month (not necessary the green ones). |
StoRM |
A. Ceccanti |
same as prod repo |
StoRM site + EMI repo + EGI-UMD repo |
n - still using yaim |
ATLAS, CMS, LHCb |
INFN-T1, some T2s, e.g. QMUL and others (INFN Milano T2) |
|
EOS |
A. Peters , L Mascetti |
doesn't exist |
git repo , rpm repo |
y |
All 4 |
CERN, FNAL, ASGC, SPBSU.ru (ALICE T2) |
ATLAS tests via Hammercloud |
xrootd |
A. Hanushevsky, G. Ganis, L. Janyst |
yum testing |
yum stable |
y - here |
All 4 |
OSG sites (SLAC, UCSD & Duke) and CERN |
xrootd is a basic dependency for EOS and also a dependency for CASTOR. It's also used by DPM and FTS3 to extend their functionality and some sites run it on top of GPFS or HDFS. |
DPM |
O. Keeble |
EPEL testing |
EPEL stable |
y - here |
All 4 |
DPM Collaboration (ASGC, Edinburgh, Glasgow, Napoli...) |
DPM needs an extra repo for stuff that doesn't qualify for EPEL, eg yaim or an Argus client. This is currently EMI and could be the WLCG repo in the future, in which case a testing area would be beneficial |
LFC |
O. Keeble |
EPEL testing |
EPEL stable |
y |
ATLAS, LHCb |
CERN |
Atlas and LHCb have both stated their plan to retire the LFC. For lfc-oracle (used only at CERN) the CERN koji/mash repo is used instead of EPEL. |
FTS3 |
O. Keeble |
EPEL testing |
EPEL stable |
y - here |
ATLAS, CMS, LHCb |
CERN, RAL (in prod), PIC, KIT, ASGC, BNL (for test) |
Non-WLCG VOs snoplus.snolab.ca, ams02.cern.ch, vo.paus.pic.es, magic, T2K also use FTS3. All sites and experiments take for the moment! the code from the grid-deployment afs directory because the fixes are available there much faster than according to the EPEL mandatory cycle. Recent FTS3 presentation |
VOMS client |
A. Ceccanti |
same as prod repo |
VOMS site + EMI repo + MAVEN for java stuff |
y - here & here & here |
All 4 |
INFN-T1 |
|
HTCondor |
Tim Theisen (HTCondor), Tim Cartwright (OSG Middleware), Rob Quick (OSG Ops) |
downloads (Development Release) |
downloads (Stable Release) |
y in OSG |
ALICE, ATLAS, USCMS , LHCb |
OSG (FNAL & ?) |
tested internally by HTCondor team, then in production at Univ. of Wisconsin, then by OSG before inclusion in their release. CMS position: The testing that OSG and in particular the glideinWMS developers do, is enough. CMS experts discuss with them which version should be deployed on the pilot factories, etc. From this point of view, HTCondor is seen as an experiment service, because of course its usage as batch system is not tested by CMS or the glideinWMS team. So, the feeling is that the current interaction between CMS and the HTCondor team is good enough and there is no strong motivation to set up any new testing system as we do in for the MW Readiness verification of other services. ATLAS position HERE |
CVMFS |
J.Blomer, P.Buncic, D.Dykstra, R.Meusel |
here disabled by default |
cvmfs site |
y here |
All 4 |
RAL &CERN mostly but also BNL, SFU, TRIUMF, HPC2N, AGLT2, and others. |
Technical paper . Release procedure . D.Dykstra runs the OSG tests. |
ARC CE |
Balazs.Konya (hep.lu.se) |
EPEL testing (but also final tags in OS-specific repos) & ARC own repos for yum & apt release candidates |
Nordugrid web site , includes prod and test variants |
y here (supplied by UK T2s, untested at CERN) |
All 4 |
NDGF-T1, SiGNET, RAL-LCG2 |
D. Cameron is the WLCG ARC expert. Info provided by O.Smirnova |
CREAM CE |
M. Verlato |
INFN products' testing area |
EMI , UMD |
y here |
All 4 |
INFN-Padova & the EGI early adopters |
Project leader L. Zangrando |
BDII |
M.Alandes |
listed in this index , separate dir per OS |
gridinfo web site |
y here & here & (soon) here |
All 4 |
CERN & many (who?) |
2014/06/04 Presentation . Trac dir with test cases found, 3 yrs old though... |
ARGUS |
Valery Tschopp (SWITCH) |
none? |
in github & EMI/UMD(repo used by the sites) |
y here |
All 4? |
none regular |
Info provided by Andrea Ceccanti on June 10. The future of ARGUS support is brought to the GDB & MB in Sept 2014. Valery & Andrea C. made this twiki about its components |
UI |
C. Aiftimiei |
INFN products' testing area |
EMI , UMD |
n |
All 4 |
INFN-Padova |
Cristina does the testing |
WN |
C. Aiftimiei |
INFN products' testing area |
same as the UI |
y here & here |
All 4 |
INFN-Padova |
Cristina does the testing |
gfal/lcg_utils |
O.Keeble |
EPEL testing and WLCG Application Area AFS space |
EPEL stable |
n |
Atlas, CMS, LHCb |
none |
being replaced by gfal2 |
gfal2 (inc gfal2-util) |
O.Keeble |
EPEL testing and WLCG Application Area AFS space |
EPEL stable |
n |
Atlas, CMS, LHCb |
none |
Developers are Alejandro Alvarez Ayllon & Adrien Devresse |
(*) Info taken from
https://github.com/cernops
with help from Ben Jones.
- List of Supported Products
with contacts and QoS levels, maintained by GGUS.
- (*) Puppet-related modules in THIS LIST
are pre-fixed puppet-. Input by Ben Dylan Jones (CERN/IT-PES & HEPiX).
Experiment workflows
Volunteer Sites
- The Volunteer Sites' rewarding method proposal was presented to the 2014/03/18 WLCG MB. Document HERE
.
- Their tasks are listed in the Procedure Guidelines for VOs & Sites here below.
- The first group of Volunteer sites is assembled below for presentation to the 2014/04/15 WLCG MB. Information is taken from the Documentation column of the Experiment workflows table.
- More UK sites than the ones listed below are keen/available to be involved when needed. E.g. Brunel University undertakes various MW testing as part of GridPP operations work.
- LHCb has a well-established testing procedure. Nevertheless, site names participating in the MW Readiness Verification effort, from within the LHCb workflows do not yet appear in their documentation.
- ALICE will eventually set-up a parallel validation infrastructure for Xrootd, after having dealt with important Spring 2014 priorities.
- The table of Volunteer Sites was presented at the 2014/04/15 WLCG MB Slides
. The MB approved of this activity and the list of sites. It also endorsed the last version of the document
on Volunteer Sites' rewarding. There was a suggestion on adding a column about the existence of a parallel infrastructure for readiness verification at the site. This column is now added.
_Volunteer_ Site |
Experiment VO |
Middleware product to test |
Experiment application |
Set-up at the site for the MW Readiness effort |
Other Comments |
TRIUMF |
ATLAS |
dCache SE |
Panda pilot/DQ2/Rucio |
Simon wrote on 2014/05/12: For TRIUMF, the setup is done, dcache 2.6.25, dCache upgraded to 2.6.34 on Sep. 25 2014. Di explained that the infrastructure for MW Readiness verification is still with the same site name, of course it's defined as different resource queue in the ATLAS Panda production system. |
Contacts: Simon Liu, Di Qing. Site used by the PT too |
NDGF-T1 |
ATLAS |
dCache SE |
Panda pilot/DQ2/Rucio |
Gert wrote on 2014/05/14: We made a separate installation, but with a configuration similar to the production system. All services are concentrated on a single physical machine (no VM), except for actual storage. A few tens of TBs of storage has been diverted from the production system, but no tape system is attached. Currently running dCache 2.9, but we will deploy new software versions as they become available - sometimes even before officially released. |
Contact: Gerd Behrmann. Site used by the PT too |
Edinburgh |
ATLAS |
DPM SE |
Panda pilot/DQ2/Rucio |
This is a separate installation, with all services are concentrated on a single physical machine (no VM), including actual storage. Around 10TB of storage is available. The machine has the epel-test repository enabled and auto-updates DPM from that. It currently has DPM 1.8.8. |
Contact: Wahid Bhimji. Site used by the PT too, as part of the DPM collaboration |
QMUL |
ATLAS |
StoRM SE |
Panda pilot/DQ2/Rucio |
Chris wrote on 2014/05/19: Test SE running (se01.esc.qmul.ac.uk) with same backend storage - accessible by production worker nodes. Used for initial testing. se04 used as a production GridFTP node - used for production load testing of GridFTP. se03 - production SE - final production test. |
Contact: Chris Walker. Site used by the PT too |
INFN-T1 |
ATLAS |
StoRM SE |
Panda pilot/DQ2/Rucio |
Salvatore wrote on 2014/05/20: At CNAF we're going to set up a VM where the endpoint will be hosted with production-wise resources and it's going to be declared on gocdb. Such test instance will host FE, BE and gridFTP for StoRM. |
Contact: Salvatore Tupputi. Site used by the PT too |
OSG |
ATLAS |
Xrootd |
Panda pilot |
RobQ at EGI CF now. We shall have OSG participation at the next meeting. This is not urgent (see also our Mandate) |
Contact: Rob Quick. Site used by the PT too |
CERN_T0 |
ATLAS |
FTS3 |
DQ2/Rucio |
Michail's input: A parallel infrastructure exists that runs all new FTS3 versions for 1 month from within experiment workflows before it is considered 'production. Endpoints (gSoap): https://fts-pilot.cern.ch:8443 , (REST): https://fts-pilot.cern.ch:8446 , Monitoring: https://fts-pilot.cern.ch:8449 . Steve's input: Two FTS3 instances exist at CERN fts3.cern.ch and fts3-pilot.cern.ch The fts3-pilot runs rolling updates of the FTS service, operating system and configuration. At suitable times a current fts3-pilot version is installed on the production fts3 service.The pilot service is continuously tested by CMS and ATLAS. |
Contact: Michail Salichos & Steve Traylen. Michail "is" the PT |
T1_ES_PIC |
CMS |
dCache |
PhEDEx, HC, SAM |
info is coming a.s.a.p. Site mgrs now at EGI CF & HEPiX |
Contact: Antonio Perez. Site used by the PT too |
T2_FR_GRIF_LLR |
CMS |
CREAM CE/WN, DPM |
PhEDEx, HC, SAM |
Andrea wrote on 2014/05/13: We installed a Preproduction DPM cluster (which is meant to be up to date with latest MW releases). This is a small resources running on VM's, but enough to make tests that do not involve load. The cluster has been setup to run PhEDEx LoadTest transfers to/from the production storage of T2_FR_GRIF_IRFU. |
Contact: Andrea Sartirana. Site used by the PT too, as part of the DPM collaboration |
CERN_T0 |
CMS |
EOS |
PhEDEx, HC, SAM |
CERN/IT-DSS uses a well-sized preproduction service (EOSPPS) for final release validation before deployment. EOSPPS is monitored and configured in the same way as the five production instances: EOSPUBLIC,EOSALICE,EOSATLAS,EOSCMS,EOSLHCB |
Contact: Andreas Peters. He is the main PT member. |
CERN_T0 |
CMS |
FTS3 |
PhEDEx, ASO? |
Michail's input: A parallel infrastructure exists that runs all new FTS3 versions for 1 month from within experiment workflows before it is considered 'production. Endpoints (gSoap): https://fts-pilot.cern.ch:8443 , (REST): https://fts-pilot.cern.ch:8446 , Monitoring: https://fts-pilot.cern.ch:8449 . Steve's input: Two FTS3 instances exist at CERN, fts3.cern.ch and fts3-pilot.cern.ch. The fts3-pilot runs rolling updates of the FTS service, operating system and configuration. At suitable times a current fts3-pilot version is installed on the production fts3 service. The pilot service is continuosly tested by CMS and ATLAS. |
Contact: Michail Salichos & Steve Traylen. Set-up identical to the one for ATLAS higher in this table. |
T2_IT_Legnaro |
CMS |
CREAM CE |
HC, glidein factory?, SAM |
Massimo wrote on 2014/05/13: A virtual machine is allocated where to install the CREAM CE MW to be verified. Guidelines are needed for this machine installation (e.g. which MW repo should we consider ?, should the CE appear in the site bdii ?). This CREAM CE will use the same WNs used by CREAM CEs used in production and the same batch system (LSF in our case) installation. Should a separate LSF queue be used for these activities? |
Contact: Massimo Sgaravatto. |
T2_IN2P3_LAPP Annecy |
ATLAS |
SRM-less DPM |
Verifying whether the ATLAS workflow can work with only gridftp+http access. |
This is an additionnal, testbed style, DPM SE (lapp-se99.in2p3.fr) , separated from the production instance. Composition: Virtualised separate headnode, a single disk server (physical machine) and 7 TB available. atlas and ops VOs supported. http, xrootd and griftp with redirection are configured. SRM is disabled. |
Contact: Frederique Chollet |
MW versions in production (T0+T1s) & usage of the Baseline versions' table
This table was produced following a poll to the sites in
May 2014. It was presented at the
4th meeting of the WG
.
Site Name |
If, How and Where you publish the MW versions you run in production. |
How you use the Baseline versions' table given that the "baseline version" number doesn't necessarily reflect all individual updates of the packages in the dependencies. |
Info provided by |
Comment |
CERN |
BDII |
As a general reference to deploy services for WLCG, we keep up to date our production services with the versions referenced there, if not higher. |
Maite Barroso |
|
ASGC |
We publish version information only by BDII. |
We basically follow baseline version for service upgrade after we finish internal tests on testbed. |
Jhen-Wei Hwang |
|
BNL |
Since we are an OSG site, MW versions are captured in the OSG version we run. Info is published via the OSG BDII, which is fed to the WLCG interop BDII. http://is.grid.iu.edu/cgi-bin/status.cgi Content for the T1 is here which includes GlueHostApplicationSoftwareRunTimeEnvironment: OSG 3.2.7 There is also some info about Globus CE, dCache, and SRM versions in the BDII info for those services, e.g. GlueCEImplementationName: Globus, GlueCEImplementationVersion: 4.0.6 etc... |
We don't use the table. Minimum platform standards get discussed and decided upon between OSG and WLCG, then OSG includes the necessary packages in the OSG middleware release. |
John Hover |
|
FNAL |
The versions are published in BDII and the twiki page for the WLCG Ops Coordination meeting. |
We don't. We rely on OSG where possible to deliver the baseline. |
Burt Holzman |
|
JINR |
In the BDII via our site BDII |
We do not use this table. We follow updates in multiple repositories: EMI, EPEL, dCache, WLCG, so our installed MW usually have a bit newer version than Baseline. In practice, this scheme has advantages and disadvantages, but we have chosen this method for a long time and do not have big problems. |
Valery Mitsyn |
We (SCODs) had forgotten to include the site in the wlcg-tier1-contacts e-group till 2014/05/12, this is why MariaD emailed the questions late and separately. |
IN2P3 |
At CC-IN2P3 the information system provided by the middleware is used to publish the version in production |
The page is checked frequently and the updates are done most of the time following the recommendations. |
Vanessa Hamar |
|
INFN-T1 |
BDII |
Our installed versions are always equal or greater than the ones listed in the baseline. We upgrade from time to time, particularly to fix security problems. |
Andrea Chierici |
|
KIAE |
Our site BDIIs |
We usually don't use it: we're trying to keep up with various recommendations straight from the vendors (EMI, UMD, dCache) and experience gained by other sites or our testbeds. And we try to be aligned to the WLCG/EGI/VO requirements for the (minimal) needed versions. |
Eygene Ryabinkin |
We (SCODs) had forgotten to include the site in the wlcg-tier1-contacts e-group till 2014/05/12, this is why MariaD emailed the questions late and separately. |
KISTI |
We, KISTI, are not sure whether the MW versions are being published by BDII as well or not but currently we do not publish the information anywhere (we keep the information up-to-date on the internal twiki) |
The link has been useful for us e.g. EMI-2 and EMI-3 migration. We usually do not keep the latest version of middleware unless there is a security issue. |
Sang-Un Ahn |
|
KIT |
As many others we publish the MW versions in BDIIs only. |
Almost all our MW services have baseline versions or higher. We perform updates to the recommended versions following the release cycles, but sometimes also upon requests from user communities. The updates come in production after thorough tests and validation by all supported VOs. For our dCache instance we try to stick to a so-called "Golden Release" in order to avoid often upgrades. If you need we could provide exact versions for all services we are running. |
Pavel Weber |
|
NDGF |
Automatically in the infosystem. The MW versions not being published in the infosys we don't publish anywhere else. |
Not at all. |
Mattias Wadenstein |
|
NL_T1 (NIKHEF) |
Isn't this information now amply published in the BDII on a per-service basis? We anyway do not publish this information publicly anywhere else. |
We follow the EGI UMD release cycles, with the caveat that upgrades to services that are actually working and for which no service change requests are explicitly requested by users (including those outside wLCG) are upgraded based on personnel availability. Also any changes are done only after internal validation and re-testing at Nikhef before they are deployed to 'production'. Deployed services are available to all our user communities, which makes EGI the more obvious place to coordinate that through UMD. I personally don't remember we referred to the wiki above after 2009 or so ... |
David Groep |
|
NL_T1 (SARA) |
For SURFsara, almost the same applies as for Nikhef. We publish our MW versions in the BDII. |
We follow the UMD release cycles, with the exception of dCache, which we get directly from www.dcache.org. We regularly report our dCache version on the twiki pages of the WLCG ops coordination meeting. |
Onno Zweers |
|
PIC |
BDII |
We install versions equal or greater than the ones listed in the baseline (with some excepcional cases, which are widely discussed, where interoperability of some underlying services is affected). We promptly react and upgrade, particularly to fix security problems. |
Pepe Flix |
|
RAL-LCG2 |
BDII |
We dont really use this twiki page with any regularity. We carry out more or less rolling updates following the MW release cycles. We do pour own testing (and participate in staged rollout and early adopter activities) before deploying new versions of MW to production. |
Ian Collier |
|
Triumf |
The only place we publish the MW versions is information system (BDII) |
We always try to follow the Baseline versions' table. However, we do some tests before upgrading the middleware, and only upgrade to the version listed in the Baseline versions' table or newer after our tests passed. |
Di Qing |
Unclear for MariaD how the individual updates are reflected in the global version number. |
Procedure Guidelines VOs Sites
Actions for the VOs
- Each VO to select:
- The MW products from the Product Teams table which are relevant to their workflows and which the VO is willing to verify for readiness. Rate them by order of importance.
- Their own applications to be tried with these MW products.
- Volunteer sites (approved by the WLCG MB) where:
- The PTs already test their products' new versions before releasing AND
- There are expert VO contacts.
- The VO clearly lists their above choices in a table. Here is the ATLAS and CMS set-up.
- The VO writes and maintains a document with concrete instructions to the Volunteer sites on how to configure their parallel infrastructure for readiness verification to use the designated experiment applications with new MW versions. This document should remain linked from the Experiment workflows table.
Actions for the Volunteer Sites
- Each Volunteer Site to:
- Set-up an environment, equipped with sufficient amount of space (>=1TB), where validation jobs routinely run. This can be a, parallel to production, infrastructure, but this is not a strict requirement.
- Find a way for the Monitoring results to appear separately, not to disturb the site availability figures. This could be by giving a separate site name to the validation infrastructure, e.g. the-usual-sitename-Volunteer, if the VO and site processes allow this (the 4th WG meeting
in May 2014 showed this is not possible for ATLAS...
- Take the new client versions (after PT internal testing) from the test trees of the CVMFS area grid.cern.ch. (the WLCG MW Officer will maintain this area).
- Use HammerCloud (HC) for all automatic testing (clients & services).
- Points concluded at the WG's 7th meeting of 2014/11/19:
- When a MW package version is proved to work via the workflow of a given experiment and a new version is out for verification, other experiments which started later should go directly to the most recent version at hand.
- The validation of a new version will (need to) have a deadline in practice, beyond which the affected MW may (need to) get deployed anyway e.g. to fix issues experienced by sites.
- When a product runs at very few sites, it can be considered, de facto, verified, as no large operation risks to suffer from unexpected bugs in production.
WLCG MW Officer's Tasks
See
WLCGMWOfficerTasks.
wlcg-middleware-officer@cern.ch should be used to contact the MW Officer! This allows information to be updated in case of absences.
The WLCG MW Package Reporter
- Documentation update following the March 2016 WG meeting. The rest is historical information which explains the rationale behind the development:
- First presentation at the WG's 4th meeting on 2014/05/15 by Lionel Cons here
.
- Update presentation at the WG's 5th meeting
on 2014/07/02, including detailed documentation by Lionel Cons with links to the installation manual HERE.
- Rationale for the non-use of Pakiti prepared on GDB request on 2015/06/11:
- The use of Pakiti for the needs of the MW Readiness WG has been considered. It has not been selected for the following reasons:
- Pakiti v.2 can't be used as it is today. It does not scale (as acknowledged on their web site here
"current version of the Pakiti reaches its limits").
- 90% of Pakiti's code is in the GUI which is security-centric and not suited to our needs.
- The data collection part is not well developed in Pakiti, but it is the part we need most.
- Nobody actively works on Pakiti now but, Lionel is in touch with its developers, so we hope to converge in the autumn, if Pakiti evolves to cover our requirements.
Presentations
Completed Tasks' overview (archive)
Task |
Deadline |
Progress |
Affected VO |
Affected Siites |
Comment |
Cream-CE and BDII Readiness verification |
Middle of November |
The latest Cream-CE and Bdii update have been installed at LNL-T2 site ( Legnaro). The configuration for the Job submission is ongoing ( Andrea S is following this up). The latest Cream-ce ha been also installed at INFN-NAPOLI for ATLAS, Panda jobs have been configured |
CMS and ATLAS |
LNL-T2 and INFN-NAPOLI |
50%. ATLAS installation ready. Waiting for jobs. CMS to verify v. 1.16.4 at Legnaro. |
dCache Readiness verification |
Middle of November |
Setting up verification workflow at TRIUMF and NDGF for ATLAS ( 2.6.x and 2.10.x version) and PIC for CMS ( 2.6.x). TRIUMF and NDGF installation ready, PIC is ongoing |
CMS and ATLAS |
TRIUMF, EDGF, PIC |
75% dCache 2.6.35 and 2.11.0 verified. Still not update from PIC. Now verifying 2.6.38 at TRIUMF and 2.11.4 at NDGF |
STORM Readiness verification |
Middle of November |
QMUL and INFN-T1. QMUL installation ready, INFN-T1 is ongoing. |
ATLAS |
QMUL and INFN-T1 |
50% |
MW package reporter integration with Pakiti |
End of January |
the WLCG package reporter has been rebranded after the discussion with EGI security team as Pakiti v3 and released to EPEL ( under review). It will be used both by EGI security team and MW readiness. |
n/a |
n/a |
Packiti v.3 under EPEL review |
MW Readiness Collector REST API |
End of November |
A first design and prototype are ready |
n/a |
n/a |
Lionel is working on it. Things will change due to new Collector/Reporter-Pakiti integration design. |
MW Readiness "dashboard" design and first prototype |
End of november |
This effort can start after the tool's redesign following the Collector/Reporter-Pakiti integration discussions of mid-November 2014. |
n/a |
n/a |
Lionel and Andrea M. with Maarten supervision. Lionel is working on the reported info visualisation part ("dashboard") JIRA:MW12 . AndreaM is testing the design of a DB to hold the info of rpms running at the site. JIRA:MW15 |
MW clients' deployment in grid.cern.ch |
a.s.a.p. |
On the long run, all clients in the PT table |
LHCb |
CERN |
Feedback from LHCb, given their experience would be very useful. The work is done by AndreaM (MW Officer) |
Action List from the July 2nd meeting |
The next meeting October 1st |
20% |
ATLAS, CMS, LHCb |
OSG, Edinburgh, GRIF, CERN, ? |
Read the Actions |
DPM Readiness verification |
End of July |
The full 1.8.9 release was postponed so it was decided to use 1.8.8+ memcache plugin 1.6.4 recently released. It was successfully installed at GRIF and Edinburgh. At GRIF we have successfully verified the CMS workflows ( Phedex + HC) and at Edinburgh the Atlas production panda jobs |
ATLAS/CMS |
DPM Collaboration sites |
Done |
"WLCG MW Package Reporter" |
July 2014 |
The tool is now installed at Edinburgh, GRIF and LNL-T2. Following their feedback, fixes were applied to improve the code. |
ATLAS & CMS at first |
CERN at first |
We look forward to CERN running the Package Reporter. Their feedback will be very useful. The work is done by Lionel (MW Package Reporter developer) |
Editing & Planning |
2 July 2014 meeting |
100% |
All 4 |
the Volunteer Sites |
MariaD in the WG twiki. Done |
HTCondor |
2 July 2014 meeting |
50% |
ATLAS & CMS |
OSG |
MariaD, Maarten & AndreaM to prepare with RobQ and the experiment contacts in the WG (DaveC, AndreaS) the Readiness verification procedure for HTCondor. The ATLAS and CMS position is ready and attached to the HTCondor line of the PT Table. |
CVMFS Client upgrade |
End of July |
100% |
ALL |
ALL |
Andrea Manzi opened 75 tickets to sites, the vast majority of them already upgraded and some are planning before the servers upgrade in August. |
DPM Readiness verification |
Middle of November |
DPM 1.8.9 installed at GRIF and Edinburgh |
CMS and ATLAS |
GRIF and Edinburgh |
completed |
Background material:
WLCG ops coord reports:
2018-01-18
This is the status of
jira ticket updates
since the last Ops Coord of 20171207:
- MWREADY-152
- DPM 1.9.1/1.9.2 tested at GRIF. Found a regression on 1.9.1 fixed and released on v1.9.2.
2017-12-07
NTR
2017-11-02
This is the status of
jira ticket updates
since the last Ops Coord of 20171005:
2017-10-05
NTR
2017-09-14
This is the status of
jira ticket updates
since the last Ops Coord of 20170714:
- MWREADY-128
- ongoing, CC7 UI planned to be included in UMD4 in November.
- MWREADY-145
- completed. CC7 WN metapackage included in UMD4.
- MWREADY-147
- ARC-CE 5.3.2 under testing at Brunel.
- MWREADY-148
- New CREAM-CE for C7 under testing at LNL for CMS. Some issues found related to puppet modules ( new supported conf method) and bnotifier component already fixed by the devs.
- MWREADY-149
- FTS 3.7.1 verification at CERN. Completed. Several bugs discovered ( 3.7.2, 3.7.3 and 3.7.4 have been installed and tested as well )
- MWREADY-150
- new StoRM release ( 1.11.12) tested at QMUL. Nothing to report , apart from one issue with the version of lcmaps available on UMD4 , they used the one on UMD3. ( this issue is present on the current production installation)
2017-07-06
This is the status of
jira ticket updates
since the last Ops Coord of 20170518:
- MWREADY-146
- dCache 2.16.34 verification for ATLAS @ TRIUMF with IPV6 as well - completed ( there has been a problem when TRIUMF updated the production unfortunately not spotted in the testing instance)
- MWREADY-145
- The latest version of the WN metapackage for C7 has been released ( v 4.0.5 - renamed to wn), and tested by Liverpool. The metapackage is under inclusion in UMD4 (GGUS:128753
)
- MWREADY-147
- ARC-CE 5.3.1 under testing at Brunel.
- MWREADY-148
- New CREAM-CE for C7: we agreed with M. Sgaravatto to do the testing for CMS at LNL.
2017-05-18
This is the status of
jira ticket updates
since the last Ops Coord of 20170406:
- MWREADY-146
dCache 2.16.34 verification for ATLAS @ TRIUMF with IPV6 as well - ongoing
- MWREADY-128
- A new version of the UI bundle has been released to EGI preview with new CREAM-CLI for C7. Tested successfully at TRIUMF
- MWREADY-145
- Dependency clashing between WN bundle and latest HTCondor ( classads vs condor-classads). We will most probably remove the LB libs to solve this issue.
- MWREADY-9
- /cvmfs/grid.cern.ch/Grid is now mirroring the AFS WLCG Grid Applications area. Requested by LHCb
2017-04-06
This is the status of
jira ticket updates
since the last Ops Coord of 20170302:
- MWREADY-143
FTS 3.6.x for ATLAS, CMS and LHCb at CERN - completed
- MWREADY-140
ARC-CE 5.2.2 on C7 for CMS at Brunel - completed
- MWREADY-141
dCache 3.x at PIC for CMS . Testing a new version of dCache which should fix an issue with IPV6 together with dCache devs
Regarding the UI/WN bundle for C7:
- new wiki pages created: (EL7 WN, EL7 UI)
- A new version of the WN bundle is ready including the LB deps ( needed by Cream CE jobs). We opened a ticket to CREAM in oder to remove this deps on future CREAM releases (GGUS:127020
). This is also needed cause one of those deps conflicts with the latest HT Condor versions.
- We asked EGI to include in the preview repo also cvmfs and yaim-clients rpms so to be added as deps to the bundle
- Liverpool and Manchester has joined the testing activity for ATLAS and LHCb (MWREADY-145
)
- Brunel is also planning to do tests for CMS and ATLAS (MWREADY-144
)
2017-03-02
This is the status of
jira ticket updates
since the last Ops Coord of 20170126:
- MWREADY-142
FTS 3.5.8 for ATLAS & CMS at CERN - completed
- MWREADY-143
FTS 3.6.0 for ATLAS, CMS and LHCb at CERN - ongoing, also LHCb is performing the verification. LHCB discovered a backward incompatibility issue between the previous version of the FTS client and the new server.( Fixed )
- MWREADY-140
ARC-CE 5.2.2 on C7 for CMS at Brunel - ongoing, 5.2.1 verification stopped cause a blocking bug was discovered by devs
- MWREADY-135
WN for C7/SL7 at TRIUMF for ATLAS - on-going, some discussion with CREAM-CE/LB are needed. TRIUMF found that CREAM-CE jobs requires LB client libs installed on the WN, but LB clients are not supported on C7.
- MWREADY-128
UI for C7/SL7 at TRIUMF for ATLAS - on-going, upgrade to FTS 3.5 is needed because of a broken deps. EGI has been contacted.
- MWREADY-141
dCache 3.0.4 at PIC for CMS - on-going.Testing also the new OpenID connect interface. Test with CEPH postponed.
2017-01-26
This is the status of
Actions from the 19th MW Readiness WG meeting of 20161102.
- 20161102-05: Christoph to investigate EL7 UI testing by CMS. Keep Andrea S. informed as maintainer of the workflow twiki.
- 20161102-01: Andrea S. to update the CMS workflow twiki.
This is the status of
jira ticket updates
since the last Ops Coord of 20161201:
- MWREADY-138
DPM 1.9.0 on C7 at GRIF_LLR for CMS - completed
- MWREADY-104
DPM 1.9.0 SRM-less for ATLAS at LAPP Annecy - on-going. Atlas verified the modification to the pilot to use dav for stage-in/out
- MWREADY-142
FTS 3.5.8 for ATLAS & CMS at CERN - on-going
- MWREADY-140
ARC-CE 5.2.1 on C7 for CMS at Brunel - on-going ( ARC-CE 5.2.0 completed)
- MWREADY-135
WN for C7/SL7 at TRIUMF for ATLAS - on-going
- MWREADY-141
dCache 3.0.2 at PIC for CMS - on-going
2016-12-01
This is the status of
Actions from the 19th MW Readiness WG meeting of 20161102.
- 20161102-05: Christoph to investigate EL7 UI testing by CMS. Keep Andrea S. informed as maintainer of the workflow twiki.
- 20161102-04: Andrea M. to update the pakiti documentation.
- 20161102-03: Maria to remove the out-of-date Tasks overview from the WG twiki. DONE twiki up-to-date and announced on 20161201.
- 20161102-02: Stefan to appoint a LHCb member to join the WG. DONE Marcello is appointed.
- 20161102-01: Andrea S. to update the CMS workflow twiki.
- 20160518-02: EL7 experiments' intentions Done via Ops Coord on Sep 1st - see details on the agenda
This is the status of
jira ticket updates
since the last Ops Coord of 20161103:
- MWREADY-104
DPM 1.9.0 SRM-less verification for ATLAS at LAPP Annecy - on-going
- MWREADY-139
FTS 3.5.7 for ATLAS & CMS at CERN - completed
- MWREADY-140
ARC-CE 5.2.0 on EL7 for CMS at Brunel - on-going
- MWREADY-30
ARGUS for CMS at Brunel - no update for a month - status unknown
Maria Dimou is changing duties in CERN IT and leaves this WG in the competent hands of the MW Officer Andrea Manzi. Grateful to the excellent work by the Volunteer Sites and the experiment contacts!
2016-11-03
The 19th MW Readiness WG meeting took place yesterday Nov. 2nd.
Agenda
,
MInutes. Summary:
- The pakiti client is in cvmfs now. Details here.
- LHCb will participate in the FTS verification effort, a way to avoid, as much as possible, suprises like the checksum problem (GGUS:124136
) met on Sept. 28th. They will also participate in the verification of the CE and storage types that they use.
- CMS will discuss internally participation in the EL7 UI bundle/rpm testing.
- The experiment plans around EL7 migration will be discussed in this WG. Today's situation is, mostly, with the exception of an ATLAS update
as reported at the dedicated Ops Coord
meeting of Sept. 1st.
- The WG Mandate was reviewed and confirmed as still valid.
- The date for the next meeting is not yet defined Please email the e-group of the WG
as soon as a vidyo meeting is desirable and to accelerate exchanges in jira. Our tracker is https://its.cern.ch/jira/projects/MWREADY
. The jira dashboard view
always shows a snapshot of open tickets.
- Please observe the actions and communicate progress to the e-group.
2016-09-29
- The agenda of the 2/11 meeting http://indico.cern.ch/e/MW-Readiness_19
is taking shape and the twiki is reachable from there. Maria will prepare the table of jira tickets' status closer to the date, so please, record all progress in jira or email the e-group wlcg-ops-coord-wg-middleware at cern.
- WN and UI rpm for EL7 have been prepared ( with the clients/lib available now on EL7) and pushed to UMD preview repo for testing ( MWREADY-135
and MWREADY-128
). Looking for sites available for the validation
- We'd like to debate at this meeting the future of the WG. It completes 3 years of life in December. Some products are now verified for Readiness "be default" see examples here. Other products and 2/4 experiments never embarked this effort. Participation is declining. It is a good moment to review the continuation/transformation/dissolution of the WG.
- This idea was circulated in email on 22/9. Alessandra's feedback is the WG should remain alive even if meetings are not very frequent. Example reason: CentOS7 will require some coordination and it seems you are the bridge with EGI. The MW Readiness jira tickets are useful, e.g. https://its.cern.ch/jira/browse/MWREADY-128
and https://its.cern.ch/jira/browse/MWREADY-135
2016-09-01
- Not so many activities during the summer holidays
- MWREADY-133
FTS 3.4.7 verification for ATLAS & CMS completed at CERN
- MWREADY-136
FTS 3.5.0 for ATLAS & CMS at CERN - on-going
- MWREADY-135
WN bundle for C7 ( on preparation, most probably ready by the end of the month)
- MWREADY-137
ARC-CE 5.1.1 on C7 to be tested by Raul at Brunel. The fix for opendap on C7 is breaking the ARC-CE InfoSys
- Due to summer holidays, WLCG Workshop & CHEP preparations and aftermath, the proposed date for the next meeting is Wed Nov. 2nd @ 4pm CET. Please email the e-group of the WG for comments.
- We miss very much LHCb participation in the WG meetings and email exchanges. E.g. we know nothing on their CentOS7 plans, despite 2 reminders during this meeting's preparation.
2016-07-07
The 18th meeting of the WG took place yesterday. Full minutes
here. Summary:
- Old and inactive tickets in jira will be closed by the MW Officer after last verification with Product owners and/or Volunteer site managers. See here which ones.
- We need to know about upcoming MW products' releases, we also have a permanent poll open for new Volunteer sites especially for MW Readiness verification on CentOS7. See here what we know so far for the near future.
- The agenda topic on WG Mandate review and meeting frequency discussion was postponed due to lack of participants in this meeting. For more people joining this effort, we hope to get some publicity at the WLCG Workshop in October.
- Due to summer holidays, WLCG Workshop & CHEP preparations and aftermath, the proposed date for the next meeting is Wed Nov. 2nd @ 4pm CET. Please email the e-group of the WG for comments.
2016-06-02
- The last WG meeting took place on May 18th see here the Minutes.
- Based on the T0 and T1 feedback the meeting concluded that the testbeds at the volunteer sites will be the only ones required to run the pakiti client. This was also presented at the May 24th MB. The need to have a tool that securely reveals what is installed at a site will be followed-up by the Information Systems' Evolution TF. Detailed replies from sites are present in here.
- Information on the CentOS7 experiments' intentions is needed. Please prepare for the next meeting on...
- proposed date July 6th at 4pm CEST.
2016-04-28
- JIRA:MWREADY-122
ATLAS & CMS: CERN FTS 3.4.3 verification completed ( on both SL6 and CentOS7), couple of small issues found to be fixed in 3.4.4
- JIRA:MWREADY-123
CMS: dCache 2.15.4 verification at PIC ongoing
- JIRA:MWREADY-30
Argus 1.7.0 on CentOS7 verification at CERN ongoing. Half of the production cluster is running this new version.
- Input received so far on the pakiti client installation expansion - by CERN, JINR, RAL, GRIF and NL_T1- is now included in a dedicated section for discussion at the next WG meeting
of May 18th. Then we'll conclude on the issue.
- Background: Tier1s were invited at the MWR and the the Ops Coord meetings on 16 & 17/3 to tell the e-group wlcg-ops-coord-wg-middleware at cern.ch whether they agree to install the pakiti client on their production service nodes, so that the versions of MW run at the site be known to authorised DNs site managers taken from GOCDB and expert operations' supporters. The developer Lionel Cons stops further work on the tool. Site replies on their intention to expand the use of pakiti client can be found here. Maria D. will put 2 reminders to sites in the twikis of the April 7th and 28th Ops Coord meetings, as part of the MW Readiness WG report.
2016-04-07
Tier1s were invited at
the MWR and the
the Ops Coord meetings on 16 & 17/3 to tell the e-group wlcg-ops-coord-wg-middleware at cern.ch whether they agree to install the pakiti client on their production service nodes, so that the versions of MW run at the site be known to authorised DNs site managers taken from GOCDB and expert operations' supporters. The developer Lionel Cons stops further work on the tool. Site replies on their intention to expand the use of pakiti client can be found
here. Maria D. will put 2 reminders to sites in the twikis of the April 7th and 28th Ops Coord meetings, as part of the MW Readiness WG report. All the input received will be concatenated and attached to the MW Readiness WG
agenda
of
May 18th. Then we'll conclude on the issues.
2016-03-17
- The 16th MW Readiness WG meeting took place yesterday March 16th 2016. Agenda https://indico.cern.ch/e/MW-Readiness_16
, notes MWReadinessMeetingNotes20160316, Summary:
- Tier1s are invited to tell the e-group wlcg-ops-coord-wg-middleware at cern.ch whether they agree to install the pakiti client on their production service nodes, so that the versions of MW run at the site be known to authorised DNs site managers taken from GOCDB and expert operations' supporters.
- SRM-less DPM test on-hold until ATLAS pilot code is changed as per JIRA:MWR-104
- Excellent progress with gfal2 testing (various configurations) as per JIRA:MWR-101
& JIRA:MWR-117
- Proposed date for the next meeting is Wed May 18th at 4pm CEST.
2016-02-18
The
JIRA dashboard
shows per experiment and per site the product versions pending for Readiness verification. Changes since the Ops Coord. meeting of Jan. 21st are:
- The 15th MW Readiness WG meeting took place on Jan. 27th. Please read the minutes' summary here.
- JIRA:MWREADY-107
CMS and ATLAS: CERN FTS 3.4.1 verification completed, some issues identified and fixed in 3.4.2
- JIRA:MWREADY-114
CMS and ATLAS: CERN FTS 3.4.2 verification started
- JIRA:MWREADY-108
CMS: GRIF-LLR dpm-xrootd 3.6.0 verification completed.
- JIRA:MWREADY-109
ATLAS: INFN-T1 Storm 1.11.10 verification ongoing, everything looks ok
- JIRA:MWREADY-101
CMS: GRIF-LLR gfal2 verification, lots of progress on testing gfal2 + Phedex + stageout
- Suggested date for our WG meeting is Thursday March 17th 2016 at 15h30pm CET. NB!!! Different day-of-the-week & different time!!! The WLCG Ops Coord slot is free. Comments?
2016-01-21
The
JIRA dashboard
shows per experiment and per site the product versions pending for Readiness verification. Changes since the Ops Coord. meeting of Jan. 7th are:
2016-01-07
The
JIRA dashboard
shows per experiment and per site the product versions pending for Readiness verification. Changes since the Ops Coord. meeting of Dec. 17th are few due to the year end holidays. Details:
2015-12-17
The
JIRA dashboard
shows per experiment and per site the product versions pending for Readiness verification. Changes since the Ops Coord. meeting of Dec. 3rd:
- JIRA:MWREADY-91
CMS: PIC completed the dCache 2.13.12 verification, now switched to dCache 2.14.5.
- JIRA:MWREADY-97
ATLAS: BRUNEL and GRIF-IRFU completed BDII 5.2.23 verification for CENTOS7. EGI releasing this BDII version in UMD.
- JIRA:MWREADY-99
ATLAS & CMS: FTS 3.4.0 verification using the FTS CERN pilot is ongoing
- JIRA:MWREADY-102
CMS: PIC started dCache 2.14.4 verification. They reported a problem to dCache, already fixed, and they are have now installed 2.14.5
- JIRA:MWREADY-103
ATLAS: Triumf to start dCache 2.10.47 verification.
In order to push for the transition from lcgutils to gfal2/gfal2-util ( lcgutils is deprecated since 2 years), we have started also to discuss with ATLAS & CMS about the usage of gfal2 in production. Still 99 % of the sites are using lcgutils for stagein/out of data.
We therefore decided to start verifications of gfal2/gfal2-utils:
- JIRA:MWREADY-100
ATLAS: Napoli will verify gfal2 and gfal2-utils. Already did some test demonstrating that everything is working fine.
- JIRA:MWREADY-101
CMS: Grif is a good candidate to verify gfal2 and gfal2-utils.
Reminder: Next meeting
January 20th 2016 at 4pm CET. Agenda
http://indico.cern.ch/e/MW-Readiness_15
2015-12-03
The
http://indico.cern.ch/e/MW-Readiness_14
meeting yesterday, Dec. 2nd, was virtual. Summary:
- New MW versions are now being under test via the ATLAS workflow. The dCache v.2.10.44 and v.2.14.0 and HTCondor v.8.4.1 verifications are already completed.
- BDII v. 5.2.23 is the new MW product being verified at Brunel on CentOS7 and completed at GRIF-IRFU.
- The ARC-CE v.5.0.3 verification is completed for CMS.
- The ARGUS Collaboration met twice since our last meeting: on Nov. 6th and Dec. 2nd.
- Please comment on the suggested date for the next meeting: Wednesday 20th January 2016 at 4pm CET. Objections with alternative dates should be sent to wlcg-ops-coord-wg-middleware@cernSPAMNOTNOSPAMPLEASE.ch
2015-11-19
- DPM 1.8.10 installed and verified @ Edinburgh for ATLAS
- dCache 2.10.44 installed and verified @ TRIUMF for ATLAS
- EOS testing @ CERN is paused. The new version Citrine has been installed in pre-prod, but is not yet ready for testing.
- BDII verification on CENTOS7 will start next week @ Brunel
- MW readiness app v0.3 deployed in prod https://wlcg-mw-readiness.cern.ch/releasenotes/
- Next meeting Wednesday 2nd December at 4pm CET.
2015-11-05
Summary of the 13th WG meeting held on 28 October:
- The xrootd monitoring plugin for dCache v. 2.13.x was installed and tested at PIC together with dCache 2.13.9
- One host of the CERN FTS-3 pilot will be running CentOS7. In this way ATLAS and CMS will be able to test FTS3, via their workflows, on both OS environments.
- DESY has been contacted to arrange ATLAS and CMS tests on their nightly rebuilt dCache instance
- PIC-CERN PhEDEx transfers failing are not yet understood; they are possibly due to a bug in one of the involved MW components (EOS, dCache, FTS-3, Globus, ...) or a misconfiguration somewhere. Experts at CERN are looking into this.
- The MW Readiness App v.3 will move to production real soon now. Volunteer Sites will be called to comment on its functionality.
- There was an ARGUS Collaboration meeting on Oct. 9th. The next one will be on Nov. 6th. The periodical sudden bursts of high load on the CERN ARGUS servers still persist and are not yet explained. A number of other issues from which CMS suffered are now understood. There is one more FTE now working in the ARGUS dev. team. Moving to the upcoming release for CentOS7 may help solve issues, if any, due to historical dependencies: the latest builds use more recent versions of
jetty
etc.
- Less than usual participants joined this meeting from the Volunteer Sites. Suggested date for the next one is Wednesday 2nd December at 4pm CET. Objections with alternative dates should be sent to wlcg-ops-coord-wg-middleware@cernSPAMNOTNOSPAMPLEASE.ch
2015-10-22
- This JIRA dashboard
shows per experiment and per site the product versions pending for Readiness verification. Details:
- JIRA:MWREADY-84
CMS: at Brunel ARC-CE 5.0.3 test stalled - no jobs displayed.
- JIRA:MWREADY-85
ATLAS: at NDGF Rucio tests fail since ipv6 has been enabled. so dCache verifications are pending at NDGF
- JIRA:MWREADY-90
ATLAS: at TRIUMF dCache 2.10.42 verification completed.
- JIRA:MWREADY-91
CMS: at PIC dCache 2.13.9 verification ongoing.
- JIRA:MWREADY-87
ATLAS: at Edinburgh for DPM 1.8.10 verification - site manager silent.
- JIRA:MWREADY-82
ATLAS: at GLASGOW DPM 1.8.10 CentOS7 verification - missing tests' set-up.
- JIRA:MWREADY-81
CMS: at CERN EOS 0.3.129-aquamarine installed on the PPS - verification pending.
- Issue found:
- JIRA:MWREADY-89
. When moving transfer tests to the FTS CERN Pilot ( with a different gfal2 configuration w.r.t FTS@RAL ) we discovered an issue with DPM deployed with gridftp redirection. Reported to the devs
- Reminder: Next meeting October 28th at 4pm CET. Agenda http://indico.cern.ch/e/MW-Readiness_13
2015-10-01
- A puppet-pakiti module configuring the pakiti-client cron with the parameters needed for WLCG MW Readiness is available at https://gitlab.cern.ch/wlcg-mw-readiness/puppet-pakiti
- StoRM testing for ATLAS in process at INFN_T1. Failing jobs under investigation. JIRA:MWREADY-61
.
- The dCache 2.13.x xrootd monitoring plugin has been prepared by the developer Ilija Vukotic, tested by PIC and pushed to the WLCG repo.
- DPM 1.8.10 verification for CMS completed at GRIF. JIRA:MWREADY-83
.
- DPM 1.8.10 verification on CentOS7 started at GLASGOW. Now arranging a test set-up for the ATLAS workflow JIRA:MWREADY-82
.
- EOS 0.3.129-aquamarine verification pending for CMS at CERN JIRA:MWREADY-81
.
- Next meeting reminder October 28th at 4pm CET. Agenda http://indico.cern.ch/e/MW-Readiness_13
2015-09-17
The WG met yesterday. Full minutes in
MWReadinessMeetingNotes20150916. Summary:
- The new DPM version is being tested via the ATLAS workflow by the Edinburgh Volunteer site.
- Many new sites showed interest to participate in MW Readiness testing with CentOS7. It is useful to anticipate the MW behaviour in the event of new HW purchase. DPM validation on CentOS/SL7 is already ongoing at Glasgow.
- ATLAS and CMS are asked to declare whether the xrootd 4 monitoring plugin is important for them or not. As it is now, it doesn't work with dCache v. 2.13.8
- Despite the fact that FTS3 runs at very few sites we decided to test it for Readiness. In this context, ATLAS & CMS are now using the CERN FTS3 pilot in their transfer test workflows.
- PIC successfully tested dCache v.2.13.8 for CMS.
- CNAF has obtained Indigo-DataCloud effort to strengthen the ARGUS development team. The ARGUS collaboration will meet again early October. The problems faced at CERN with a CMS VOBOX are being investigated in ticket GGUS:116092
.
- The next MW Readiness WG vidyo meeting will take place on Wednesday 28 October at 4pm CET.
2015-09-03
- Not so much to report about verifications cause not many MW versions were made available during the summer
- Development of the MW Readiness App continued in the past weeks, with graphical enhancements to the site, ssb integration, access to host packages, dev version is at https://mw-readiness-dev.cern.ch/
( CERN only)
- We would like to remind to volunteer sites that the deadline to move to the new pakit-client + server conf has passed ( end of August) and we still miss some upgrades. See instructions sent by mail in July at:
- An agenda of our next (16/9 at 4pm CEST) meeting is on page http://indico.cern.ch/e/MW-Readiness_12
. Please send additional items to the e-group wlcg-ops-coord-wg-middleware at cern...
- An update on the WG activities will be presented during the next week GDB.
2015-07-30
- Thanks to Edinburgh and Grif for offering to verify selected MW products for Readiness over CentOS7/SL7 this autumn. Progress/comments via our JIRA tracker
.
- A provisional agenda of our next (16/9 at 4pm CEST) meeting is on page http://indico.cern.ch/e/MW-Readiness_12
. Please send additional items to the e-group wlcg-ops-coord-wg-middleware at cern...
PAST REPORTS
2015-07-16
- The ATLAS and CMS Volunteer sites are active verifying new MW versions, especially dCache v.2.13.3
and StoRM.
- The MW Officer Andrea M. is now polling sites' interest in testing CentOS7/SL7 versions of MW.
- The new pakiti-client v.3.0.1 is now available. It contains a new tag to distinguish publishing of packages run for MW Readiness. See explanation in jira ticket MWREADY-67
. All documentation linked from our twiki. Direct link HERE.
- Next vidyo meeting on Wed September 16th at 4pm CEST.
2015-07-02
- Argus Future meeting tomorrow July 3rd, at 11am CEST, focused on progress with EL7 support Agenda
.
- Latest version of pakiti-client (v3.0.1) with tag support has been pushed to EPEL stable, we will contact soon the volunteer sites for the upgrade.
- dCache testing for ATLAS at Triumf paused to perform a re-configuration that will fix a problem with the SRM space token.
- PIC and Brunel are collaborating for PhedeX tests. This allows PIC to better compare one site against another.
- Reminder: The next MW Readiness WG vidyo meeting will take place on Wednesday September 16th at 4pm CEST. Please comment a.s.a.p. if this date is not good!
2015-06-18
- The 11th meeting of the WG took place yesterday.
- A lot of good work is on-going from most Volunteer sites.
- Special credit is due to Edinburgh and GRIF for their detailed DPM testing on behalf of ATLAS and CMS respectively.
- Similarly, great effort is invested by Triumf and NDGF for multiple dCache versions testing for ATLAS and from PIC for dCache testing for CMS.
- Fine-tuning configuration at CNAF for StoRM testing for ATLAS.
- New pakiti-client version 3.0.1 is imminent in EPEL Stable. The updated documentation is available to all Volunteer Sites, together to a new configuration file to be used due new PKG DB servers deployment. This new pakiti-client version gives the possibility to specify a tag ( --tag option). MW Readiness nodes should start publishing their packages with the tag MWR. Andrea M. will contact the sites for this upgrade.
- The MW Readiness App https://wlcg-mw-readiness.cern.ch/
is now available on a production instance. Check here
the Baseline MW versions' mgnt view.
- EL7 support and the move to Java 8 are now urgent for ARGUS. The CERN testbed will be available real soon now for testing under heavy load and other scenarios.
- The next MW Readiness WG vidyo meeting will take place on Wednesday September 16th at 4pm CEST. Please comment a.s.a.p. if this date is not good!
2015-06-04
- The MW Readiness App moved to production node https://wlcg-mw-readiness.cern.ch/
. Remember this presentation'
from the last WG meeting on May 6th. This tool will, eventually, replace today's, manually maintained, Baseline table and more.
- The pakiti client is installed on the dCache CMS instance for MW Readiness at PIC. The nodes are correctly published in the MW Package Collector and viewable from authorised people only.
- NB!! Next vidyo meeting on June 17th at 4pm CEST. Draft agenda http://indico.cern.ch/e/MW-Readiness_11
2015-05-21
- Very good work in the Readiness verification for dCache v.2.10.28 by Triumf. This is how this version is now part of the Baseline.
- Progress is being made for the StoRM verification for ATLAS at CNAF. The workflow documentation is at hand. All relevant experts in sync and follow-up is in jira ticket MWREADY-61
.
- The MW Officer records in jira ticket MWREADY-59
progress on the MW Readiness App development. Remember this presentation'
from the last WG meeting on May 6th. This tool will, eventually, replace today's, manually maintained, Baseline table.
2015-05-07
- The 10th MW Readiness meeting took place yesterday May 6th Agenda
.
- After one year of full Readiness Verification activity, the WG is making a check-point of goals and priorities.
- ATLAS and CMS were invited to review their workflow twikis for possible changes in the MW products to verify.
- LHCb and ALICE are invited to declare if and for which products they plan to contribute to the MW Readiness WG.
- Possible stress tests were discussed, for products already verified from within experiment workflows. The decision was to leave this to the MW Officer, the experiment involved and the site on a case by case basis, as per the original definition in the WG documentation, point 2.5..
- EOS test for CMS at CERN just started.
- ARGUS testbed at CERN is set-up and ready to start.
- NDGF, PIC, CNAF and Triumf are reminded to install the pakiti client Instructions.
- The presentation on software being developed
for our activities intends to show product versions being under test and their matching to the relevant rpms and, in addition, an automated way to display Baseline versions instead of the current, manually updated, table.
- The next vidyo MW Readiness WG meeting will take place on Wednesday June 17th at 4pm CEST
2015-04-23
- Please remember the actions from our 18 March meeting.
- Work by the MW Officer Andrea Manzi and the Pakiti client (Package Reporter) expert developer Lionel Cons is starting on the collection and visualisation of product versions' installed per Volunteer Site. Details in JIRA:56
- Please give input to Maria Dimou for the preparation of the next vidyo meeting on Wed May 6th at 4pm CEST.
2015-04-02
Excellent progress made by the Volunteer sites for selected MW product versions, as planned at our last meeting on March 18th. Thanks to the MW Officer Andrea Manzi for testing with the sites and to the pakiti client developer Lionel Cons for the good technical collaboration with the other EGI-funded experts. Details:
- dCache problem found in v.2.10.18 is solved in v.2.10.23. dCache versions 2.11.14 and 2.12.3 also contain the fix.Triumf is now testing dCache 2.10.23 while NDGF 2.12.3. ( both for ATLAS).
- StoRM 1.11.8 has been verified at QMUL for ATLAS.(only one small issue found)
- CREAM-CE 1.16.5 is tested at INFN-Napoli for ATLAS.
- EOS is in the pipeline for testing at CERN for CMS. Please add news in JIRA:MWREADY-40
.
- Please do the Actions from our March 18th meeting.
- Remember Wednesday May 6th at 4pm CEST is the next MW Readiness WG meeting date.
2015-03-19
- The 9th WG meeting
took place yesterday. The Summary is available HERE.
- Please book your calendars for the next meeting on May 6th at 4pm CEST at CERN and with vidyo!.
2015-03-05
- Verified Middleware since the previous meeting:
- Storm 1.11.6 for ATLAS
- dCache 2.11.8 for ATLAS
- DPM-Xrootd 3.5.2 both for ATLAS and CMS
- Ongoing verifications:
- CREAM-CE 1.16.5 for CMS
- Storm 1.11.7 for ATLAS
- dCache 2.10.18 for ATLAS ( possible issue found)
- dCache 2.12.0 for ATLAS
- tests under setup
- ARC-CE 5.0.0 ( under release ) for CMS
- Remember our next meeting in a month Wed March 18th at 4pm CET. Please check the Action List.
2015-02-19
- Thanks to the new sites to installed the MW Package Reporter. Instructions here.
- The more Package Reporter installations we have, the better we can prototype and suggest a functional results' display view.
- Very good progress is being made at QMUL for the StoRM Readiness verification via the ATLAS workflow.
- The MW Officer is working on the MW database view that will fetch and display candidate for Readiness verification release versions from the repositories.
- Remember our next meeting in a month Wed March 18th at 4pm CET. Please check the Action List.
2015-02-05
- Full minutes of the last MW Readiness WG meeting are now availabe from MWReadinessMeetingNotes20150121
- Thanks to the MW Officer, ATLAS, CMS and the Volunteer Sites a lot of progress is being made for all Actions, namely:
- Condor-G verification is in the pipeline via the ATLAS pilot factory. JIRA:MWREADY-39
- Brunel and Liverpool Universities offered to participate in the Readiness verification of the ARC CE. JIRA:MWREADY-37
.
- The MW Officer starts investigating with ATLAS & CMS the use of the Prometheus test system, a small dCache instance (currently a single node), offered by DESY, for people who want to help test whether the next major release of dCache (currently 2.12) has any problems. JIRA:MWREADY-36
.
- StoRM 1.11.6 is in the pipeline for testing at QMUL. JIRA:MWREADY-18
.
- Grif starts DPM testing with xrootd 4 for CMS.
- dCache 2.11.8 testing for ATLAS starts at NDGF JIRA:MWREADY-38
. Triumf will probably soon follow.
- EOS testing entered the MW Readiness activity for CMS https://its.cern.ch/jira/browse/MWREADY-40
JIRA:MWREADY-40]].
- Thanks to the Volunteer Sites which installed the new version of the MW Package Reporter. More are very welcome.
2015-01-22
- The MW Readiness WG met yesterday Jan 21st. Agenda http://indico.cern.ch/e/MW-Readiness_8
- Excellent participation and follow-up by the Volunteer Sites (Edinburgh, Napoli, Legnaro, QMUL, CNAF, Triumf, NDGF) and the MW Officer Andrea Manzi. Please follow the slides
for details.
- The new version of the Package Reporter is ready, within the deadlines. The new design principles are in line with EGI security requirements. A maximum of code shared with Pakiti. The site is offered configuration options for the reporting. Please follow the presentation here
by the developer Lionel Cons for details. Very simple installation instructions are documented here.
- Next meeting Wed 18 March at 4pm CET. Please note!
2014-12-18
- The MW Readiness WG status presentation by the MW officer at the December 2014 GDB
gives the status of the effort up to last week.
- A slow-down in participation by sites is observed.
- The MW Readiness WG will participate in the ARGUS testing under load and/or with peculiar CA attributes.
2014-12-04
- DPM 1.8.9 verification completed, the bug discovered on dmlite lib has been fixed and dmlite 0.7.2 is on EPEL stable now.
- dCache 2.11.0 verification for ATLAS completed
- dCache 2.6.38 and 2.11.4 verifications for ATLAS are ongoing
- Progresses can now be followed also on the MW readiness jira dashboard
- the WLCG package reporter has been rebranded after the discussion with EGI security team as Pakiti v3 and released to EPEL ( under review). It will be used both by EGI security team and MW readiness.
- Our Tasks overview is updated.
- Full minutes of our last meeting are now published here. Please observe the actions.