Week of 140303

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Simone (SCOD), Ken (CMS), Raja (LHCb), Alex (CERN - Monitoring), Zbignev (CERN - Databases)
  • remote: Onno (NL-T1), Sang-Un (KISTI), Lisa (FNAL), Rolf (IN2P3) Thomas (NDGF), Lucia (CNAF), Tiju (RAL), Pepe (PIC), Antonio (CNAF), Dimitri (KIT), Rob (OSG).

Experiments round table:

  • ATLAS reports (raw view) -
    • T1s
      • INFN-T1: file transfer errors with open error: Permission denied, ongoing, ggus 101528
      • SARA-MATRIX squid service is unavailable, solved, disk full due to logrotate misconfig ggus 101730
      • many transfer failed from RU/RRC-KI-T1_DATADISK with error "SRM_FILE_UNAVAILABLE", ongoing ggus 101694
      • NDGF-T1 has many transfer failures with [SRM_FILE_UNAVAILABLE] File is not online, power-outage resulting in power supply damage, solved/repaired, GGUS:101693

  • CMS reports (raw view) -
    • Another very quiet period, I struggle with coming up with something to say....
    • CERN
      • GGUS:101718 -- held glideins, advice given to use additional CE's
    • T1 sites
      • GGUS:101716 -- FTS3 is down right now at RAL, we are waiting for an update
      • GGUS:101741 -- SAM SRM test failed at KIT over the weekend, but KIT seems no problem?? (Contrary to what is written in the ticket, the log file from the does does indicate that the transfer failed.)
      • GGUS:101739 -- trouble with HammerCloud jobs at CNAF over the weekend, apparently resolved.
      • GGUS:101723 -- HC problems at RAL too, but it seems to have gone away without diagnosis, so ticket closed.
      • GGUS:101731 -- transfer problems at FNAL, probably related to the big disk migration
      • And also we remember that FNAL has a downtime today for the disk migration!

  • LHCb
    • MCsimulation, user jobs and Stripping.
    • T0: Aborting pilots at ce203 (GGUS:101670)
    • T1:
      • CNAF : Storm problem on Saturday night (GGUS:101737). Fixed by a restart - many thanks for the quick response! Understanding is pending.
    • Other : GGUS (GGUS:101768) - lhcb-geoc mailing list no longer added when team ticket is created. Fixed now - thanks!

Sites / Services round table:

  • CERN storage:
    • CASTOR nameserver DB update (combinied with upgrade to latest CASTOR release) tomorrow 2014-03-04, whole morning
    • upcoming per-experiment outages (also DB updates) next week

  • FNAL: storage downtime today and tomorrow for intervention
  • NDGF: two power outages on thursday and friday at two different sites. Everything OK now.
  • RAL: intermittent problems with virtualization cluster, affecting many services (including FTS3).
  • KIT: dCache team investigating issue reporting from CMS.

Databases: tomorrow CASTOR DB intervention for the namespace, as announced last week. 2h downtime from 8:00 AM.

AOB: none

Thursday

Attendance:

  • local: Simone (SCOD), Ben (CERN Grid Services), Jan (CERN Storage), Alessandro (ATLAS), Radja (LHCb), Marcin (CERN Databases)
  • remote: Michael (BNL), Kyle (OSG), Dennis (NL-T1), Gareth (RAL), Sonia (CNAF)

NB: the meeting had to be run over Vidyo as Alcatel was not working properly. Apologies for people who could not attend because of this

Experiments round table:

  • Central Services
    • T0/T1:
      • INFN-T1 GGUS:101528 file export issue. Seems due to ACL problems. In the GGUS Thursday morning INFN-T1 says the problem is fixed, updated this page at 9:30CET still no effect visible but we have to wait more.
    • ATLAS internal
      • IN2P3-CC full: this is due to a task which is producing too many data. working on it.
      • working on global Data Placement to solve the disk space crisis ongoing.

  • LHCb reports (raw view) -
    • MCsimulation, user jobs and Stripping.
    • T0: Aborting pilots at ce203 (GGUS:101670) - CE now in downtime.
    • T1:
      • GridKa : Networking problem (GGUS:101807) caused problems writing to SE. Problem solved by reverting the change.
    • Other : GGUS (GGUS:101768) - lhcb-geoc mailing list no longer added when team ticket is created and verification not working. Fixed now - and waiting for another ticket to be verified before marking this ticket.

Sites / Services round table:

  • NL-T1: SARA had a crashed pool node since yesterday, spotted only today. Fixed.

  • RAL: problem at the virtualization infrastructure has been worked out and now seems fine. The FTS3 server was also impacted and the traffic was diverged mostly to the CERN FTS for ATLAS. Suggestion to keep it as it is till next week. Then increase the load progressively.

  • CERN Storage: next week CASTOR intervention for CMS, ATLAS and Alice. LHCb the week after. The public stager was updated today, quicker than expected. No problem to report.

  • Kyle: ATLAS T3 tickets still open about Storage Space availability. Was there any follow up? No, but it will be reiterated,

  • CERN Grid Services On tuesday CERN FTS3 increased number of nodes.

  • Databases: CASTOR nameserver and public databases migrated. See CERN Storage report for next upgrades.

  • GGUS: (MariaD) Savannah:142369 will take care of the ALARMs' related configuration and testing for the new Russian T1.

AOB: -- SimoneCampana - 20 Feb 2014

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2014-03-06 - SimoneCampana
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback