Week of 160229

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Tier-1 downtimes

Experiments may experience problems if two or more of their Tier-1 sites are inaccessible at the same time. Therefore Tier-1 sites should do their best to avoid scheduling a downtime classified as "outage" in a time slot overlapping with an "outage" downtime already declared by another Tier-1 site supporting the same VO(s). The following procedure is recommended:
  1. A Tier-1 should check the downtimes calendar to see if another Tier-1 has already an "outage" downtime in the desired time slot.
  2. If there is a conflict, another time slot should be chosen.
  3. In case stronger constraints cannot allow to choose another time slot, the Tier-1 will point out the existence of the conflict to the SCOD mailing list and at the next WLCG operations call, to discuss it with the representatives of the experiments involved and the other Tier-1.

As an additional precaution, the SCOD will check the downtimes calendar for Tier-1 "outage" downtime conflicts at least once during his/her shift, for the current and the following two weeks; in case a conflict is found, it will be discussed at the next operations call, or offline if at least one relevant experiment or site contact is absent.

Links to Tier-1 downtimes

ALICE ATLAS CMS LHCB
  BNL FNAL  

Monday

Attendance:

  • local: Jiri (ATLAS), Maria D (SCOD), Maria A (WLCG), Julia (WLCG), Maarten (ALICE), Xavi (Storage), Nils (T0 services).
  • remote: Francesco (CNAF), Rolf (IN2P3), Sang Un (KISTI), Dmytro (NDGF), Onno (NL-T1), Kyle (OSG), Di Qing (TRIUMF) , Michael (BNL), Pavel (KIT), Tiju (RAL), Stefano (CMS).

Experiments round table:

  • ATLAS reports (raw view) -
    • Amazon EC2 scale test to be continued this week
    • Problems with Taiwan site probably due to network. We observe problems with CVMFS there and with data transfers to/from NDGF
    • Reprocessing finished, another will start today (HI data)

Jiri will open a GGUS ticket summarising the current case and referencing other tickets from similar recent cases. As soon as the ticket is created, the submitter can take it from the default (TPM) and select the 3rd Level Support Unit "WLCG Network Monitoring". If permissions are not enough for the submitter Maarten or Maria D. can do this if put in Cc. Marian Babik and his "Network metrics WG" have good methods now to get data from PerfSONAR and are behind this GGUS SU (also interfaced to SNOW).

There was a discussion on the massive email notifications generated from voms-admin about VO expiration and AUP re-signing. The current voms-admin implementation sends out notification after a VO member expiration date is reached. Then one gets 2 weeks to re-sign the AUP.

On the VOMS notification issues, Stefano said that the developer Andrea Ceccanti should be asked to implement a reminder randomisation algorithm so that the VO Admins don't get all the users' questions at the same time.

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Data Processing
      • Active pre-staging, no other processing activity at the moment
      • MC and User jobs

Sites / Services round table:

  • ASGC: not connected
  • BNL: NTR
  • CNAF:We completed the upgrade of glibc library to fix some vulnerabilities.
  • FNAL: not connected
  • GridPP: not connected
  • IN2P3: NTR
  • JINR: not connected
  • KISTI: NTR
  • KIT: Downtime, registered in GOCDB, starting tomorrow at 7am UTC and lasting till the day after at 17hrs for TSM upgrade (no tape access possible).
    • Downtime postponed to next week Tuesday, since the operator responsible fell ill.
  • NDGF: The computing is a bit reduced because Linköping site has SLURM upgrade tomorrow, and is draining the resource right now. The submission will be opened again tomorrow in the evening, after the upgrade.
  • NL-T1: NTR
  • NRC-KI: not connected
  • OSG: NTR
  • PIC: not connected
  • RAL: We are planning to upgrade FTS3 to v3.4.2 this week; test instance on Wednesday 10:00 - 11:00 UTC and production instance on Thursday 10:00 - 11:00 UTC
  • TRIUMF: NTR

  • CERN batch and grid services:
    • FTS service upgraded to v 3.4.2, everything looks ok
  • CERN storage services: NTR
  • Databases: no report
  • GGUS: NTR
  • Grid Monitoring: no report
  • MW Officer: It is safe to run FTS v 3.4.2 as per the experience mentioned above.

AOB:

  • Next meeting will be on Monday 7th March. Remember that there will be no more meetings on Thursdays!
  • The T0 services, including OpenStack will be represented in this meeting.
  • The next Ops Coord will be exceptionally on Thu 17th March due to clashes with ATLAS and CMS events this week. Sites will be asked, in due time, to comment on 2016 pledges.

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2016-03-01 - XavierMol
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback