Week of 151019

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Herve (storage), Maarten (SCOD + ALICE)
  • remote: Alessandro (ATLAS), Asa (ASGC), Christian (NDGF), Di (TRIUMF), Francesco (CNAF), Gareth (RAL), Lisa (FNAL), Michael (BNL), Oliver (CMS), Onno (NLT1), Pepe (PIC), Rolf (IN2P3), Sang-Un (KISTI), Stefano (LHCb)

Experiments round table:

  • ATLAS reports (raw view) -
    • FTS "stalled connection": now problem manifested itself also with RAL instance. FTS developers suggested a patch, which is now being applied.
    • Tier-0 TZDISK number of inode issue, ALARM ticket created and immediately this number was increased.

  • CMS reports (raw view) -
    • 25ns Data taking, normal operation
    • Production activity slowly tailing off
    • CERN:
      • GGUS:117001 : All SAM pilot tests for all CEs for CERN only in (same) error. Maybe the same as in GGUS:116468? Argus?
        • Maarten: being looked into

  • ALICE -
    • high activity

  • LHCb reports (raw view) -
    • Data Processing:
      • Data processing of pp data at T0/1/ sites, Monte Carlo mostly at T2, user analysis at T0/1/2D sites
      • Stable number of running jobs for processing of data at T0/1
      • Data from pHe fully processed.
    • T0
      • Ticket about FTS409 stalled: solved reporting some log (GGUS:116965)
      • Ticket about aborted SRM at CERN (GGUS:116897) and (GGUS:116877) should be closed. Wait for some more data transfer from the pit before confirming the problem is solved.
      • Maarten: there also was the alarm ticket GGUS:116973 that was opened on Sun because of a problem with EOS
    • T1
      • SARA storage is in scheduled downtime
    • T2
      • FTS transfer failing at UKI-SOUTHGRID-RALPP (GGUS:116995). Investigation ongoing.

Sites / Services round table:

  • ASGC: ntr
  • BNL: ntr
  • CNAF:
    • Some problems on ATLAS disk (one ddl currently offline), 350 TB data are temporary not available. More news as soon as possible (we are still performing some checks)
  • FNAL: ntr
  • GridPP:
  • IN2P3: ntr
  • JINR:
  • KISTI: ntr
  • KIT:
  • NDGF:
    • A tape robot lost its arm during the night. An IBM technician arrived on site recently and the spare parts will arrive at 17:30 CEST. For now the disks are keeping up with the writes, and the whole thing should be transparent if we get it fixed tonight. There might be some long load times for Atlas and Alice data.
    • Tomorrow there will be a short downtime as dCache pools are updated. Only short interruptions during reboots. Some Atlas and Alice data unavailable.
    • Both are registered in gocDB.
  • NL-T1:
    • SARA downtime for network maintenance being used also for moving some dCache components to cure space manager timeout errors
    • Vidyo first broke our connection to this meeting, then connected us to another room; on the 3rd attempt we were back in the right room
      • Maarten: we will report this to the Vidyo team
  • NRC-KI:
  • OSG:
    • the bad version of JGlobus that broke BeStMan and EOS installations last week has been replaced with the correct rpm, so it should again be safe to upgrade
  • PIC:
    • we uploaded a revised version of the Nagios Plugin for SAM to github. Some bugs were fixed, the rpm can be found here:
    • we are planning a downtime for the 24th of November to upgrade dCache to the latest golden release. Is this ok for the experiments?
  • RAL:
    • during the weekend there were some problems with disk servers used by ATLAS and LHCb, who were subsequently notified
  • TRIUMF: ntr

  • CERN batch and grid services:
  • CERN storage services:
    • Nothing to report
  • Databases:
  • GGUS:
    • GGUS was temporarily unavailable Mon afternoon due to a power cut at KIT
  • Grid Monitoring:
  • MW Officer:

AOB:

Thursday

Attendance:

  • local:
  • remote:

Experiments round table:

  • ALICE -

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF:
  • FNAL:
  • GridPP:
  • IN2P3:
  • JINR:
  • KISTI:
  • KIT:
  • NDGF:
  • NL-T1:
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL:
  • TRIUMF:

  • CERN batch and grid services:
    • HTCondor Grid pool becomes production on the 2nd of November.
    • Beginning 02/11/2015 we will begin the official decommissioning of the ARC CEs leading to our HTCondor Pool.
    • Leaving just one CE type, the HTCondorCE. The decommision phase will be lengthened by request of a VO, to give them extra time.
  • CERN storage services:
  • Databases:
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx MB-Oct-15.pptx r1 manage 2856.7 K 2015-10-19 - 09:55 PabloSaiz  
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2015-10-22 - IainBradfordSteers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback