Week of 150323

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Luca (SCOD+Storage), Ian (Batch and Grid), Lorena (Databases), Maarten (ALICE), Stefan (LHCb), Alessandro (ATLAS)
  • remote: Christoph (CMS), Felix (ASGC), Dmytro (NDGF), Onno (NLT1), Rolf (IN2P3), Matteo (CNAF), Tiju (RAL)

Experiments round table:

  • ATLAS reports (raw view) -
    • FTS lost messages: moved some transfers to the CERN pilot. still we observe lost messages from BNL.

  • CMS -
    • Cosmic run with magnetic field about to start: CRAFT (Cosmic Run At Four Tesla)
    • Not much production/processing in the system
      • Scaling exercises taking quite some CPU slots at the sites though
    • Trouble with staging from CASTOR at CERN since Friday, GGUS:112490
      • Experts looking at it

  • ALICE -
    • CASTOR at CERN: staged files appear to be garbage-collected prematurely
      • being looked into

  • LHCb reports (raw view) -
    • operations dominated by MC jobs
    • no issues to report for this meeting

Sites / Services round table:

  • ASGC: NTR
  • BNL:
  • CNAF: NTR
  • FNAL:
  • GridPP:
  • IN2P3: NTR
  • JINR:
  • KISTI:
  • KIT: NTR
  • NDGF: NTR
  • NL-T1: Planned 2h downtime next Thursday for SRM due to maintenance on power feeds.
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL: NTR
  • RRC-KI:
  • TRIUMF: NTR

  • CERN batch and grid services: SAM tests failed during the weekend for CREAM CE. Expert are investigating the problem that is causing an increase of memory (and swap) used.
  • CERN storage services: we are following different CASTOR issues
  • Databases: NTR
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

OSG apologize for not being able to join due to its "All Hands Meeting" the entire week.

Thursday

Attendance:

  • local:
  • remote:

Experiments round table:

  • CMS -
    • Quite some trouble with overloaded Frontier infrastructure
      • Not fully clear - might be caused by some scaling exercise running many jobs
    • Trouble with staging from CASTOR at CERN since Friday, GGUS:112490
      • Any updates?

  • ALICE -
    • CASTOR at CERN: instabilities affecting both online and offline activities
      • Oracle row lock contention due to concurrent activities
      • bunch size of staging requests was not optimal (fixed)
      • disk pool was not being rebalanced, which led to unnecessary garbage collection (fixed)
      • garbage collection removed newly staged files because they were not (yet) accessed recently!
        • will be improved
      • good support from the CASTOR team!

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF:
  • FNAL:
  • GridPP:
  • IN2P3:
  • JINR:
  • KISTI:
  • KIT:
  • NDGF:
  • NL-T1:
  • OSG:
  • PIC:
  • RAL:
  • TRIUMF:

AOB:

  • NOTE: European summer time (CEST) starts on Sun March 29.

-- AndreaSciaba - 2015-02-27


This topic: LCG > WebHome > WLCGCommonComputingReadinessChallenges > WLCGOperationsMeetings > WLCGDailyMeetingsWeek150323
Topic revision: r9 - 2015-03-26 - MaartenLitmaath
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback