Week of 140224

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: MariaD (SCOD), Felix (ASGC), Luca (DM), Belinda (Storage), Eddy (Dashboards), Zbyszek (DB), Maarten (ALICE).
  • remote: Onno (NL_T1), Oli (CMS), Lisa (FNAL), Sang-Un (KISTI), Rob (OSG), Ulf (NDGF), Peppe (PIC), Tiju (RAL), Bart (?)

Audioconf didn't work on Monday. MariaD booked new call codes last minute and reported the problem here. Not sure it is viewable by everybody. Nevertheless, thanks for all participants' patience!!.

Experiments round table:

  • ATLAS reports (raw view) -
    • apologies people from ATLAS can't connect today
    • T1s
      • Many Tier1s full due to particular workflow very disk space consuming e.g. IN2P3-CC GGUS:101467 it's on ATLAS
      • SARA-MATRIX squid down GGUS:101495, solved by the site
      • NIKHEF-ELPROD file not accessible GGUS:101494 from the site"These and other files were lost due to an earlier system failure which was reported about to ATLAS already in ticket GGUS:100542 ". We will take care of this and close the ticket.

  • CMS reports (raw view) -
    • CERN
      • EOS had some downtime Sunday afternoon CERN time, what went wrong? https://sls.cern.ch/sls/service.php?id=EOSCMS Only few users noticed and when we checked, EOS was back again. Luca commented that a glibc bug, that causes a crash when a given semaphore is triggered, was revealed. It was reported to RedHat. In addition, a new xrootd release will appear in 2 days, requiring a new build, so a new, better and stable situation can be expected in 1 week.
      • Indico lost all CMS permission settings Monday morning
      • The main vidyo server was down in the morning as well so "join now" links on indico didn't work

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • MCsimulation, user jobs and Stripping.
    • T0: NTR
    • T1: NTR

Sites / Services round table:

  • ASGC: ntr
  • BNL: not connected
  • CNAF: not connected
  • IN2P3: ntr
  • KISTI: ntr
  • FNAL: ntr
  • OSG: ntr
  • KIT: not connected
  • RAL: ntr
  • NDGF: ntr
  • NL_T1: Squid is now ok but the reason of the problem is still unclear.

AOB:

Thursday

Attendance:

  • local: MariaD (SCOD), Steve (Grid Services), Belinda (Storage), Ken (CMS), Maarten (ALICE), Zbyszek (Databases)
  • remote: Rolf (IN2P3), Ulf (NDGF), Sang-Un (KISTI), John K. (RAL), Guenter (GGUS), Lisa (FNAL), Saverio (CNAF), Rob (OSG), Dennis (NL_T1), Peppe (PIC).

Experiments round table:

  • ATLAS reports (raw view) -
    • apologies: people from ATLAS can't connect today

  • CMS reports (raw view) -
    • It's actually been very quiet!
    • CERN
      • CMSR intervention today is completed, no major complaints noted.
      • Still working on the global xrootd redirector issue (GGUS:101414) -- I think we're talking past each other on this, I need to sit down with Jan so we can discuss what we want. MariaD suggested not to close the ticket till the proper monitoring is in place or, if this development takes time, to close but place a reference to the relevant dev. ticket (ggus, jira, savannah...).
    • T1 sites
      • GGUS:101643 -- this came from a user and looks like it should now go to a different site. (Why is the user using glite to submit rather than CRAB and its glide-ins?) Ken will try to re-assign to the right site or to the TPM with a comment.
      • GGUS:101522 -- glide-ins being held at IN2P3. Rolf said this is a T2 and shouldn't be discussed in this meeting.

  • ALICE -
    • NTR

  • LHCb reports (raw view) - Nobody connected.
    • MCsimulation, user jobs and Stripping.
      • T0: NTR
      • T1: NTR

Sites / Services round table:

  • BNL: Nobody connected.
  • KIT: Nobody connected.
  • NDGF: ntr
  • KISTI: On-going collaboration with CERN IT CS group tp test the configuration of their PerfSONAR service installation.
  • RAL: ntr
  • FNAL: Suffering for weeks from a network issue that causes switches to crash, including the ones for the Tier1 services. A network fix is planned for next Monday 2014/03/03. A downtime is announced between 11am and 10pm CST on that day to do an upgrade of the FNAL EOS installation and dCache.
  • IN2P3: Actively working on the network problem reported in GGUS:101637. It affects BNL-French and BNL-Italian transfers.
  • OSG: ntr
  • NL_T1:
    • SARA announced a short downtime for tomorrow 2014/02/28 for work on the dCache file servers.
    • NIKHEF: Storage nodes and DPM was down yesterday for a short time. DPM work is planned for Tuesday 2014/03/04.
  • CNAF: ntr
  • PIC: ntr

  • GGUS: Major Release yesterday 2014/02/26, especially due to the GGUS-xGUS (interfaces originally developed for the NGIs). Please read https://ggus.eu/?mode=didyouknow#2014-02-26 and follow the links to the detailed documentation, including the list of decommissioned long unused features. Bookmarks of long search patterns will no more work. Most other bookmarks will work as the GGUS developers put in place redirects for them. Some of yesterday's ALARM tests had to be repeated as a bug was found in the script that launches the ALARMs that made the email notifications unrepliable. A last issue left with GGUS:101640 is being investigated.

AOB:

Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2014-02-27 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback