Week of 140210

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Andrew (LHCb), Stefan (SCOD), Maarten (ALICE), Belinda (Storage), Alessandro (ATLAS), Alexandre (Monitoring)
  • remote: Alexander (NL-T1), Lisa (FNAL), Rolf (IN2P3), Eric (CMS), Dimitri (KIT), Tiju (RAL), Christian (NDGF), Sang-Un (KISTI), Matteo (CNAF), Kyle (OSG), Felix (ASGC)
  • apologies: Pepe (PIC)

Experiments round table:

  • ATLAS
    • Central Services
      • one HyperVisor had troubles INC:492443 . we discovered that many of our APF at CERN were hosted on the same HV (which is not optimal for reliability purposes). We have requested to CERN IT to have the possibility to have services on different HV explicitly RQF:0302736

  • CMS
    • T1/T2/Others: Business as usual. Smooth running before this weekend.
    • Now production is shut down for DBS upgrade beginning today
    • Analysis will continue during this time
    • We are encouraging all our sites to switch to FTS3 server at RAL for load testing. Begins in a week or so.

  • ALICE -
    • job efficiencies will be fluctuating with the amounts of analysis jobs being run in preparation for the Quark Matter 2014 conference (May 19-24)

  • LHCb
    • Mostly simulation and user jobs. Smooth running over most of the grid.
    • T0: Problems with CASTOR user mapping. SRMv2 service failing SUM tests.
    • T1: NTR

Sites / Services round table:

  • ASGC: our RAC service was quite unstable during the weekend, that caused our CASTOR SRM to be unstable, it was fixed on Sunday.
  • NL-T1: NTR
  • FNAL: NTR
  • IN2P3-CC: NTR
  • KIT: NTR
  • RAL: NTR
  • NDGF: NTR
  • KISTI: NTR
  • CNAF: NTR
  • OSG: Was there a BDII server down over the week-end? Maarten: Not that we know, but please open a ticket to have it investigated.
  • PIC: NTR
  • Storage : The user mapping issue for CASTOR has been resolved. Problem was a hardcoded LDAP server IP addresses and those machines were changed. This was affecting SL6 quattor managed machines.
  • Monitoring: NTR

AOB:

Thursday

Attendance:

  • local: Stefan (SCOD), Andrew (LHCb), Felix (ASGC), Alexandre (Monitoring), Belinda (Storage), Maarten (ALICE), Ulrich (Grid Services),
  • remote: Dennis (NL-T1), John (RAL), Lisa (FNAL), Saverio (CNAF), Sang-Un (KISTI), Kyle (OSG), Jeremy (GridPP), Pavel (KIT), Stefano (CMS), Rolf (IN2P3), Christian (NDGF), Pepe (PIC), Alessandro (ATLAS)

Experiments round table:

  • ATLAS
    • Central Services
      • NTR
    • T1
      • NTR

  • CMS
    • CMS main Data Bookeeping system migrated from DBS2 to DBS3
    • T1/T2/Others: Production restarted and ramping up, analysis never stopped
    • No new problems

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Mostly simulation and user jobs. Smooth running over most of the grid.
    • T0: Problems from Monday seem to be resolved
    • T1:
      • Pilots being killed at SARA
      • RAL SRMv2 SUM failures but jobs seem ok.

Sites / Services round table:

  • NL-T1: NTR
  • RAL: Brief network glitch this morning, lasted hour and fixed now
  • FNAL: NTR
  • CNAF: NTR
  • KISTI: Problems on CREAM-CE for 512 bit proxies were fixed
  • OSG: NTR
  • GridPP: NTR
  • KIT: NTR
  • IN2P3: NTR
  • NDGF: Downtime tomorrow morning for network equipment, data will be unavailable for short period.
  • PIC: Commissioned tape/disk separation for CMS, since yesterday data is moved to new endpoint (70 TB so far), all working ok
  • ASGC: NTR
  • Storage: NTR
  • Monitoring: WLCG google earth has been updated and uses new transfer dashboard, including ALICE traffic. Question to ATLAS: We see heavy transfers for ATLAS since 8pm yesterday up to 1kHz accesses. Alessandro: Could be related to renaming, but this should be via WebDAV port. Need to check further offline.
  • Grid Services: NTR
  • CERN: Upcoming upgrade of the WMS servers to EMI3 Update 13 -- ITSSB entry

AOB:

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2014-02-13 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback