WLCG MW Readiness WG 19th meeting Minutes - November 2nd 2016

WG twiki

Agenda

Summary

  • The pakiti client is in cvmfs now. Details here.
  • LHCb will participate in the FTS verification effort, a way to avoid, as much as possible, surprises like the checksum problem (GGUS:124136) met on Sept. 28th. They will also participate in the verification of the CE and storage types that they use.
  • CMS will discuss internally participation in the EL7 UI bundle/rpm testing.
  • The experiment plans around EL7 migration will be discussed in this WG. Today's situation is, mostly, with the exception of an ATLAS update as reported at the dedicated Ops Coord meeting of Sept. 1st.
  • The WG Mandate was reviewed and confirmed as still valid.
  • The date for the next meeting is not yet defined Please email the e-group of the WG as soon as a vidyo meeting is desirable and to accelerate exchanges in jira. Our tracker is https://its.cern.ch/jira/projects/MWREADY. The jira dashboard view always shows a snapshot of open tickets.
  • Please observe the actions and communicate progress to the e-group.

Attendance

  • local: Maria Dimou (chair & notes), Maarten Litmaath (ARGUS report), Andrea Manzi (MW Officer), Vincent Brillault (WLCG Security), Julia Andreeva (WLCG Ops), Stefan Roiser (LHCb).
  • remote: Christoph Wissing & Daniele Bonacorsi (CMS), Matt Doidge (Lancaster), Vincenzo Spinoso (EGI), Frederique Chollet (LAPP).
  • apologies: Andrea Sciabà (CMS), Raul Lopes (Brunel), Jeremy Coles (GridPP).

Minutes of previous meeting

The minutes of the last (18th) meeting HERE are accepted.

Verification status report

The MWREADY JIRA dashboard shows the latest status info of open tickets. Summary of progress since our last meeting is in the tables below. Maria closed JIRA:MWR-36 and JIRA:MWR-100 as per last meeting's decision to close idle tickets for a great amount of months.

ATLAS workflow Readiness Verification Status:

MW Product version Volunteer Site(s) Comments Verification status
DPM 1.9.0 IN2P3_LAPP JIRA:MWR-104 On-going
dCache 2.16.x NDGF JIRA:MWR-131 NDGF run 2.15.x for now ticket invalid
dCache 2.13.31 Triumf JIRA:MWR-130 completed
WN bundle & rpm for EL7 4.0.0 Triumf? JIRA:MWR-135 good technical exchanges in the ticket. rpm and bundle avail. UMD and cvmfs On-going

CMS workflow Readiness Verification Status

MW Product version Volunteer Site(s) Comments Verification status
EOS 4.0.12-citrine CERN JIRA:MWR-121 re-installation Pending
dCache 2.16.4 PIC JIRA:MWR-134 Completed
ARC-CE 5.1.1 (EL7) Brunel JIRA:MWR-137 (see Site Report) Completed
DPM 1.9.0(EL7) Grif_LLR JIRA:MWR-138 On-going
ARGUS 1.7.0 Brunel JIRA:MWR-30 Raul is now also involved for testing on EL7 (see Site Report) On-going

Verifications for both ATLAS & CMS

MW Product version Volunteer Site(s) Comments Verification status
FTS 3.5.7 CERN JIRA:MWR-139 started last week On-going
EL7 UI bundle/rpm 4.0.0 ? JIRA:MWR-128 rpm in UMD and tarball in cvmfs since end of Sept unknown

During the discussion about this table:

  • LHCb is encouraged to participate in the MW Readiness verification effort, e.g. with FTS testing, to start with. The way to go is to:
    • Announce to our WG a contact person in the experiment
    • This person will contact those Volunteer Sites which support LHCb and prepare the test environment (dedicated batch queues, announcement of the end-points, as appropriate).
    • The set-up will be sent to the WG chair, Maria for update of the Experiment workflows' section of the WG twiki.
  • Christoph will discuss internally in CMS about the EL7 UI testing. There are some MW components missing from the bundle so far. Data Management parts are included but for example CREAM CE is missing. Andrea M., MW Officer will be informing the e-group as additions arise.

During this discussion Stefan noticed that the section Tasks overview of the twiki is out-of-date. Maria will move this section to the archive part of the same twiki.

Discussions around EL7 following the Sept 1st Ops Coord theme

  • ATLAS update from 25 October
  • Stefan said LHCb uses in operation and for a long time already SL6 binaries on EL7 (simulation workflow).
  • Maarten confirmed that ALICE has the same approach. They built every package on SL5 and this works on EL7.
  • Christoph said there are no changes in CMS to the ones presented at WLCG Ops Coord on Sept 1st (all slides linked from the agenda).
  • Maarten said that in case pure EL7 builds cannot be used for quite a while, CMS experts are looking into containers that present an SL6 environment to the jobs.
  • Julia said the MW Readiness WG should be the forum where updates from the experiments on EL7 migration are reported.
  • Maarten said we cannot hide behind the original statement that SL6 would be the official OS until the end of Run 2, because some sites (e.g. NDGF, IN2P3) will need to run EL7 on new HW and/or feel a steadily mounting pressure from other customers asking for the OS to be upgraded.

WLCG MW Readiness Software Status

  • No more developments are planned. Info by Vincent and Andrea M.:
The pakiti client is now available also via CVMFS grid.cern.ch . In order to send data to the MW readiness collector site managers can mount the cvmfs grid.cern.ch and use this command in their cron:
    /cvmfs/grid.cern.ch/pakiti/bin/pakiti-client --site <site_name> --conf /cvmfs/grid.cern.ch/pakiti/conf/WLCG-MWR.conf
Andrea M. will update the Pakiti documentation accordingly. See also GGUS:124207.

Sites' feedback

  • Brunel
    • 3 issues encountered with ARC-CE 5.1.1 + HTCondor 8.5.6
      • GGUS:123947, HTCondor 8.5.6 changed the default condor_q output such that only the current user's jobs are returned. This broke the job monitoring in ARC. setting CONDOR_Q_ONLY_MY_JOBS=false fixes the issue.
      • GGUS:124253, ( on going investigation) It seems that in the presence of job flocking the ArcCE that initially receives the job submission removes the job's directory, this affects APEL Accounting
      • http://bugzilla.nordugrid.org/show_bug.cgi?id=3604, reported by Thomas Hartmann. job submission breaks when updating globus-gssapi-gsi from 11.22-1 to 12.5-2. Problem fixed, needs a rebuild of ARC-CE.
    • 1 issue reported to ARGUS
      • GGUS:124315 : Configuration problem when using pure IPV6 WN
    • moved production DPM DB to MariaDB on EL7, no issues so far

Special topic

Major releases coming out this year ( that we are aware of)

  • CREAM-CE ( with EL7 support)
  • dCache 3.0.0 ( already released, Running at NDGF-T1)
  • DPM 1.9.0 ( already released)

MW readiness mandate review and products review?

Feedback received until 24/10 is:

  • The MW Readiness WG is still useful and should be kept alive with a meeting frequency 'dictated' by the MW products' changes.
  • It is via this group that the info on EGI package status gets to the sites.
  • The WLCG Ops community counts on the Volunteer Sites of the MW Readiness WG for CentOS7 testing
  • We've been lucky to have a calm MW development in the past few months but we wouldn't be able to do without this verification process under a future intense release activity.

During the discussion, Maria re-read the WG Mandate which was agreed as still valid. About active participation by the experiments, Stefan said that LHCb only tests services, as clients come from cvmfs. Individual package versions come at random times, so we can't decide on the frequency of this meeting because the testing process is continuous. The major milestones that need to be achieved should dictate the frequency of this meeting. Julia and all agreed on this. Maria and Andrea said that our report at the WLCG Ops Coord meeting monthly and the Monday 3pm is regularly communicating progress on our work. Meetings will be called when there is something special to discuss.

Report from recent ARGUS meetings

  • main items for MW Readiness:
    • CERN is running Argus 1.7 in production since Aug 11
    • Release notes have been provided
    • EGI has verified the update for inclusion in the upcoming UMD 4.3.0 (Nov)
    • Staged Rollout reports have been provided by Brunel and CERN

Future releases will keep being tested on the QA nodes at CERN but the development will mostly concern new functionality not necessarily concerning us, so the tests will simply make sure that what we need still works. Tests on IPv6-only deployment takes place at Brunel.

Actions

Action items Done from past meetings can be found HERE.

  • 20161102-05: Christoph to investigate EL7 UI testing by CMS. Keep Andrea S. informed as maintainer of the workflow twiki.
  • 20161102-04: Andrea M. to update the pakiti documentation.
  • 20161102-03: Maria to remove the out-of-date Tasks overview from the WG twiki. DONE twiki up-to-date and announced on 20161201.
  • 20161102-02: Stefan to appoint a LHCb member to join the WG. DONE Marcello is appointed.
  • 20161102-01: Andrea S. to update the CMS workflow twiki.
  • 20160518-02: EL7 experiments' intentions Done via Ops Coord on Sep 1st - see details on the agenda

Next meeting

  • No meeting planned for now. MW Releases, updates in Ops Coord and the Mon 3pm will dictate when we should fix a date for a meeting.

AOB

Stefan said the existing Volunteer Sites which happen to support LHCb should be approached to also take care of LHCb services' verification. This is minuted in the relevant sections earlier in these minutes to keep the whole issue complete.

-- MariaDimou - 2016-09-21

Edit | Attach | Watch | Print version | History: r127 < r126 < r125 < r124 < r123 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r127 - 2018-02-28 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback