DRAFT

WLCG Operations Coordination Minutes, March 2th 2017

Highlights

Agenda

Attendance

TO BE FIXED AFTER THE MEETING

  • local: Alberto (monitoring), Alejandro (FTS), Alessandro (ATLAS), Andrea M (MW Officer + data management), Andrea S (IPv6), Andrew (LHCb + Manchester), Jérôme (T0), Julia (WLCG), Kate (WLCG + databases), Maarten (WLCG + ALICE), Maria (FTS), Marian (networks + SAM), Vincent (security)
  • remote: Alessandra (ATLAS + Manchester), Catherine (IN2P3 + LPSC), Christoph (CMS), David B (IN2P3-CC), Di (TRIUMF), Frédérique (IN2P3 + LAPP), Gareth (RAL), Kyle (OSG), Marcelo (LHCb), Oliver (data management), Renaud (IN2P3-CC), Ron (NLT1), Stephan (CMS), Thomas (DESY-HH), Vincenzo (EGI), Xin (BNL)
  • Apologies: Nurcan (ATLAS), Ulf (NDGF-T1)

Operations News

  • This year's WLCG workshop will be held June 19-22 in Manchester

  • The WLCG Data Management steering group had a kick-off meeting
    • The mandate, list of tasks and priorities are being finalized

  • PIC is developing an APEL parser for HTCondor
    • Easily adaptable by other sites
    • More in the next meeting

  • In order to avoid inconsistencies in the naming of the service types along the WLCG IS chain (GocDB, OIM, SAM, experiment-specific systems like Dirac), agreed with representatives from OSG, EGI and GocDB team and IS evolution task force on the policy for introducing of the new service types. The request for the introducing of the new service types should be sent to is-approvals@cernNOSPAMPLEASE.ch list. The name for a new service type will be agreed among the members of this list and then can be introduced in GocDB , OIM, etc... Information about new service types will be then broadcasted to the experiments and members of the IS evolution task force which includes members who can be concerned as for example members of the monitoring team.

  • The next Ops Coordination meeting will be on April 6

Middleware News

  • Useful Links:
  • Baselines/News:
    • Baselines updated: removed dCache 2.10, moved dCache 2.13 baselines to 2.13.51 which fixes an issue with RFC proxy for certain CAs and improve bulk deletions, FTS moved to 3.5.7
    • dCache 2.10.x support ended in 2016. We discussed with EGI and prepared a broadcast together ( already sent), still 16 instances running this version. EGI will open tickets soon.
  • Issues:
    • High risk CVE-2017-6074 Linux kernel privilege escalation vulnerability (https://wiki.egi.eu/wiki/SVG:Advisory-SVG-CVE-2017-6074). Sites should apply the kernel patches or applied the mitigations as reported in the advisory.
    • 2 issues discovered in the latest Xrootd release ( 4.6.0) both client and server side. Sites/Experiments are suggested not to upgrade to this version and wait for 4.6.1 under preparation.
  • T0 and T1 services
    • ASGC
      • DPM upgrade to v 1.8.11
    • BNL
      • Enabled dual stack FTS
    • CERN
      • check T0 report
      • FTS upgrade to v. 3.5.8 and gfal2 2.13.1
    • IN2P3
      • Migration of the core dCache servers to  CentOS7 (postgres 9.5, dcache 2.13.54)
    • JINR
      • dCache minor upgrade 2.13.51 -> 2.13.54, Postgres minor upgrade 9.4.9 -> 9.5.1
    • KIT
      • FAX decommissioned and dCache updated to 2.13.51 for ATLAS on 1st Feb
    • NL-T1
      • SURFsara upgraded dCache from 2.13.49 to 2.13.51 on Feb 2.
    • PIC
      • Enstore upgraded, dCache upgrade planned for March 3rd
    • RAL
      • Castor 2.1.15-20 update recently completed. All data now on T10KD dives/media.
      • gfal2 upgraded to v 2.13.1 on FTS nodes
    • TRIUMF
      • dCache upgraded to v 2.13.51

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • The activity levels typically have been high or very high
    • New record of 111k concurrent jobs reached on Feb 15
  • No major issues on the grid
  • A new version of AliEn was put into production
    • In particular it checks and logs the CVMFS status on the WN
      • To ensure that user jobs have the required SW available

ATLAS

CMS

  • finished DIGI-RECO campaign for Moriond17 (total volume 10B evts)
  • completed re-miniAOD of 2016 data during February
  • Phase 1 and 2 upgrade Monte Carlo generations in progress
  • analysis backlog worked down thanks to lower production activity
  • we are preparing to move the central services of the CMS global pool / workflow management system back to CERN
    • new HTCondor version is less resources demanding
    • high-performance VMs provided by CERN still under evaluation
  • EOS issues: file metadata-inode association lost during nameserver failover T0_CH_CERN GGUS:126358 and metadata pointing to wrong inode T2_CH_CERN.
  • moving forward on IPv6 and CentOS 7
    • we are now technically ready to run Release Validation on native CentOS 7
  • Had recently some sites that run into issues with check sums used by FTS3 transfers
    • Seems to be an issue of too Globus GridFTP server not dealing properly with requested adter32 sum
    • Some relation to recent FTS3 updates/re-configurations at CERN?

LHCb

  • High activity levels (from 70K to 90K jobs in average)
  • VOMS was suspending user ship for most of our users because of a “Acceptable Use Policies” expiring on Tuesday. It was a bug in VOMS. An Alarm ticket was raised and all the AUP signatures have been restored for all users which expired at the same day.
  • The Oracle DB migration and security patch went smoothly without any problems or significant consequences.

Discussion

Ongoing Task Forces and Working Groups

Accounting TF

  • Latest meeting has been held on the 9th of February. Main topic discussed was a possibility to integrate accounting information for the opportunistic resources into APEL. LHCb is quite advanced in this respect. ATLAS and CMS might use a different approach (importing smry data from their experiment-specific accounting systems). However, there are still issues to be resolved in order to make more progress. The main one is benchmarking. Another one is topology description for the opportunistic resources which might be digested by the EGI accounting portal from CRIC. One common problem to be addressed is how to avoid double counting in case info will come both from the site and experiment-specific system. ALICE does not look to be interested in having opportunistic usage accounted by APEL.
  • The main topic of the meeting next Thursday is a review of possible implications for accounting in case DB12 benchmark is introduced.
  • In parallel started WLCG space storage accounting implementation discussion with the representatives of the DM steering group and WLCG monitoring team

Information System Evolution TF

  • Latest meeting has been held on the 23 of February. Agreed with the EGI and OSG colleagues on the policy for introduction of the new service types in the WLCG IS chain in order to ensure naming consistency. Discussed the proposal for the storage service description structure in CRIC.


IPv6 Validation and Deployment TF


Machine/Job Features TF

  • See the talk in the Indico agenda.

Monitoring

  • NTR. Sorry for not being able to attend the meeting, please contact A.Aimar for any monitoring-related matter.

MW Readiness WG


Network and Transfer Metrics WG


Squid Monitoring and HTTP Proxy Discovery TFs

  • http://grid-wpad/wpad.dat is in production at CERN. It supports IPv6 addresses, but we can't yet enable IPv6 on the servers because although most of CMS has switched to using it, some use cases are still using an old PAC file on the same servers that only supported IPv4. The largest remaining case is expected to be migrated by 9 March.
  • http;//wlcg-wpad.cern.ch/wpad.dat does not yet support IPv6, that is planned to be added later this year.

Traceability and Isolation WG

Last meeting on 2016/03/01 (https://indico.cern.ch/event/610915/):

  • OSG has made significant progress on testing/integrating/using Singularity:
    • Singularity deployed in 15 OSG sites, used in more that 1M job this week
    • CMS integration to follow, solution for RHEL7 worker nodes
  • Early discussion started on user data workflow for VO

Theme: SLAs and usage of different kinds of computing resources

Theme: Machine/Job Features update

Action list

Creation date Description Responsible Status Comments
01 Sep 2016 Collect plans from sites to move to EL7 WLCG Operations Ongoing The EL7 WN is ready (see MW report of 29.09.2016). ALICE and LHCb can use it. NDGF plan to use EL7 for new HW as of early 2017. Other ATLAS sites e.g. Triumf are working on a container solution that could mask the EL7 env. for the experiments which can't use it. Maria said that GGUS tickets are a clear way to collect the sites' intentions. Alessandra said we shouldn't ask a vague question. Andrea M. said the UI bundle is also making progress.
Jan 26 update: this matter is tied to the EL7 validation statuses for ATLAS and CMS, which were reported in that meeting.
March 2 update: the EMI WN and UI meta packages are planned for UMD 4.5 to be released in May
03 Nov 2016 Review VO ID Card documentation and make sure it is suitable for multicore WLCG Operations Pending Jan 26 update: needs to be done in collaboration with EGI
03 Nov 2016 Check status, action items and reporting channels of the Data Management Working Group WLCG Operations Pending  
26 Jan 2017 Create long-downtimes proposal v3 and present it to the MB WLCG Operations Pending  

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Comments Deadline Completion
29 Apr 2016 Unify HTCondor CE type name in experiments VOfeeds all InfoSys Proposal to use HTCONDOR-CE.   Ongoing

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Comments Deadline Completion

AOB


This topic: LCG > WebHome > WLCGCommonComputingReadinessChallenges > WLCGOperationsWeb > WLCGOpsCoordination > WLCGOpsMinutes170302
Topic revision: r13 - 2017-03-02 - ChristophWissing
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback