Week of 140602

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Stefan (SCOD), Felix (ASGC), Ken (CMS), Michail (LHCb), Maria (WLCG), Maarten (ALICE),
  • remote: Dimitri (KIT), Sang-Un (KISTI), Alexander (NL-T1), Lisa (FNAL), Rolf (IN2P3), Saerda (NDGF), Elisabeth (OSG), Gareth (RAL)

Experiments round table:

  • ATLAS
    • Central Services
    • T0/T1s
      • FZK transfer failures GGUS:105803 , there was no site representative on the last WLCG meeting.
        • Maria asks if the note about no representative means that the issue is not handled? Stefan, Maarten say the ticket was last updated 12:40 today so most probably under control
      • TRIUMF tape staging problems GGUS:105886 .
      • SARA-MATRIX read from DATATAPE problem GGUS:105898 .

  • CMS
    • The weekend has been quiet too.
    • Brief bit of non-quiet on Friday afternoon when Savannah went down, GGUS:105895. It was back after a few hours, but on the ticket we received no explanation of what went wrong or how it got fixed.

  • ALICE -
    • NDGF: yet more jobs referencing files that do not exist in dCache; being debugged

  • LHCb
    • Main activity: MC and user jobs
    • T0: NTR
    • T1:
      • GRIDKA:
        • File access problems reported for user jobs, couldn't be reproduced at the site, investigation ongoing
        • Issue with one WN CVMFS is fixed.

Sites / Services round table:

  • KIT: DT last week including router replacement. All production services, dCache and WNs were updated to SL 6.5. After DT network issues were found which are fixed now.
  • KISTI: NTR
  • NL-T1: NTR
  • ASGC: NTR
  • FNAL: NTR
  • IN2P3: NTR
  • NDGF: NTR
  • OSG: NTR
  • RAL: Tuesday next week DT for castor upgrade

AOB:

Thursday

Attendance:

  • Local: Joel (LHCb), Stefan (SCOD), Guiseppe (CMS), Belinda (Storage), Felix (ASGC), Maarten (ALICE), Andrea (WLCG), Marcin (Databases), Maria (WLCG), Gavin (Grid Services)
  • Remote: Pavel (KIT), Antonio (CNAF), Lisa (FNAL), Rolf (IN2P3), Dennis (NL-T1), Salvatore (CNAF), Sang-Un (KISTI), Rob (OSG), Pepe (PIC)
  • A Posteriori/Email: Jeremy (GRIDPP), John (RAL)

Experiments round table:

  • ATLAS
    • Central Services
    • T0/T1s

  • CMS
    • Very quiet period
    • MC Production: starting to use also HLT and T0 (4k + 5k cores), to speed up CSA14 production
    • smooth CMSWEB services upgrade on Tuesday 2nd
    • All CMS sites (T1+T2) now support SHA-2 certificates
    • Closed write access to CASTOR for users on June 2nd. Unintentionally disabled also tape recall for users - CASTOR team deploying a hotfix.
    • T0
      • NTR
    • T1
      • FNAL: scheduled short EOS interruption on June 5th, 10:00-11:00 CDT

  • ALICE -
    • NDGF: the mismatches between the AliEn file catalog and dCache were due to an unexpected side effect of a change in the AliEn code (there is a special case for dCache SEs); should be OK now

  • LHCb
    • Main activity: MC and user jobs. Moving VOBOXes (physical machine to VM)
    • T0: add to the trusted list new VOBOX for LFC
    • T1: CNAF : problem of storage.

Sites / Services round table:

  • ASGC: Incident yesterday afternoon, DNS server unavailable for 5 hours,
  • KIT: Problem with ATLAS jobs, error rate of 20 %, problem is fixed. It was due to wrong routing in the center, jobs should be fine now when contacting the SRM. Still problems with ATLAS transfers, tests are working, production is not, ~ 2.400 files expected to be seen but are not.
  • CNAF: Yesterday scheduled WARNING DT from storage went bad. Obliged to call for UNSCHEDULED DT. After few hours thought the problem was solved. But yesterday evening another problem was found. Currently for ALICE its working, for LHCb still in DT for the storage. Hope to recover today.
  • FNAL: NTR
  • IN2P3: NTR
  • NL-T1: SARA DCache upgrade on 16th (Monday) will take whole day. NIKHEF UMD upgrade caused errors for ATLAS/LHCb jobs, fixed now.
  • KISTI: DT on Sunday for 8 hours starting from midnight, network intervention between Amsterdam -> New York, services could be unavailable during that time.
  • OSG: NTR
  • PIC: Reminder: next week Tuesday 10th, DT for dCache upgrade
  • BNL: NR
  • JINR: NR
  • NDGF: NR
  • PIC: NR
  • RAL: Castor upgrade scheduled for next week (10 June, 7.50 - 15.00)
  • RRC-KI: NR
  • TRIUMF: NR
  • GRIDPP: UK CA moved to issuing SHA2 certificates as of 28 May, no issues reported so far.
  • CERN batch and grid services:
    • The FTS3 pilot fts3-pilot.cern.ch will be reconfigured from 08:00 UTC on Friday 6th June to support IPv6 connectivity in addition to IPv4 connectivity. OTG0010930. The intervention should be largely transparent except for a network interface restart. Duration 2 hours.
    • CvmFS cvmfs-stratum-one.cern.ch migrate from hardware/quattor->openstack/puppet on Monday 9th June from 08:00 UTC OTG0010991. Should be transparent to all.
    • CERN Argus service will be upgraded today to latest pre-release patch which has been in QA for a while.
      • fixes instabilities seen recently with the service
  • CERN storage services: Planned updates went well CASTOR (ALICE, ATLAS, LHCb), EOS (ATLAS, ALICE). Planned EOS / LHCb upgrade for next week, will contact VO.
  • Grid Monitoring: NR
  • GGUS: NR
  • Databases: NTR
  • MW Officer: NTR

AOB:

  • ATTENTION: next meeting on Tuesday June 10 !
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2014-06-05 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback