Week of 150914

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Luca (SCOD+Storage), Maarten (ALICE), Christoph (CMS), Ulrich (Batch), Kate (DB), Xavi
  • remote: Andrzej (ATLAS), Vladimir (LHCb), Matteo (CNAF), Lisa (FNAL), John (RAL), Onno (NL-T1), Micheal (BNL), Renaud (IN2P3), Dimitri (KIT), Sang (KISTI), Christian (NDGF), Di Qing (TRIUMF), Rob (OSG)

Experiments round table:

  • ATLAS reports (raw view) -
    • T0/Central services
      • Large production volume, but slowness in transferring out results to T1s in UK,IT,NL clouds.
      • Status of FTS software update? BNL on Monday. CERN confirmed update too. What is planned at RAL?
        Update should allow more efficient use of FTS services by ATLAS.
      • Low safety margin in transatlantic connectivity. Is there any planning in WLCG networking working group in case of loss of last working link?
    • Comments during the meeting:
      • Maarten said the network experts are already doing what they can. Michael stressed that repairing a transatlantic link is time consuming and labor intensive, moreover this activity is heavily affected as well by the weather. Michael will get an update soon on the status of this activity. Currently ESnet is taking care of repairing the link, in case they have problem with the last 100Gb link they will take care to reroute the traffic to other links or to other providers.
      • For FTS related questions, Maarten suggested to follow up the discussion on the FTS mailing list

  • CMS reports (raw view) -
    • Slot utilization ~80k jobs in the Global Pool over the weekend
    • Otherwise no major issues to report

  • ALICE -
    • NTR

Sites / Services round table:

  • ASGC:
  • BNL: NTR
  • CNAF: problem with a perimeter router (CPU full), the CPU was busy in deleting and applying security acls. The problem is fixed, investigation ongoing with CISCO for the root cause.
  • FNAL: NTR
  • GridPP:
  • IN2P3: declared full day downtime on Tuesday 22.09.2015
  • JINR:
  • KISTI: NTR
  • KIT: NTR
  • NDGF: NTR
  • NL-T1: NTR
  • NRC-KI:
  • OSG: NTR
  • PIC:
  • RAL: follow up on FTS
  • TRIUMF: NTR

  • CERN batch and grid services: NTR
  • CERN storage services: NTR
  • Databases: DB replication of the AMI database to IN2P3 is stopped due to an ongoing intervention on the database, the ATLAS AMI-tag is therefore not up to date. This issue will be solved once the DB intervention is over. (17:00 CEST)
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

  • Networking (update provided after the meeting by Michael Ernst): Update as of Monday Morning US Eastern Time:
    • ESNET-20150807-002 40G Boston to Amsterdam
      The fiber has been repaired and the circuit has been up and stable since 10:30 UTC this morning.
    • ESNET-20150910-005 100G from Washington to CERN
      Fiber testing has completed, and confirmed a fiber cut approximately 260 Km from the French coast. The cable ship is expected to be at the site on Tuesday September 15th. Repairs typically take several days.
    • ESNET-20150910-001 100G from NY to London
      The ESnet circuit continues to experience occasional brief flaps, but apparently other customers on the optical system are not being affected. ESnet is continuing to work with the vendor in an attempt to get them to accelerate the replacement of a partially failing amplifier. The replacement amplifier is on-site, so ESnet does not anticipate extended downtime if it fails completely before it is replaced.
    • NEWY-LOND 100G circuit continues to function normally.

AOB:

Thursday

Attendance:

  • local:
  • remote:

Experiments round table:

  • ATLAS reports (raw view) -
    • New transfers were moved from RAL FTS to RAL test FTS (with different DB backend). The FTS DB was struggling with ~1M transfers.
    • Found some sites still using old voms server to generate gridmap files - will be followed up in ops coord meeting

  • CMS reports (raw view) -
    • Git at CERN became unresponsive in the night from Tuesday to Wednesday, GGUS:116219
      • CRAB Server seems to depend on proper functioning of Git
    • Ongoing investigations of "UNKNOWN" SAM results for CEs at CERN, GGUS:116069
      • No obvious issues with job submission for production or analysis though

  • ALICE -
    • continued high analysis activity in preparation for Quark Matter 2015, Sep 27 - Oct 3

  • LHCb reports (raw view) -
    • Data Processing:
      • Validation of data reconstruction, MC and user jobs
    • T0

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF:
  • FNAL:
  • GridPP:
  • IN2P3:
  • JINR:
  • KISTI:
  • KIT:
  • NDGF:
  • NL-T1:
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL:
  • TRIUMF:

  • CERN batch and grid services:
  • CERN storage services:
  • Databases:
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

Topic attachments
ISorted ascending Attachment History Action Size Date Who Comment
Unknown file formatpptx MB-Sep-15.pptx r1 manage 2865.7 K 2015-09-14 - 10:28 PabloSaiz  
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r17 - 2015-09-17 - DavidCameron
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback