Week of 150112

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • Local: Akos (Grid Services), Alessandro F (Storage), Alessandro DiG (ATLAS), Maarten (ALICE), Zbigniew (Databases), Xavier (SCOD)
  • Remote: Antonio (CNAF), Felix (ASGC), John (RAL), Michael (BNL), Pavel (KIT), Sang-Un (KISTI), Rolf (IN2P3), Kyle (OSG), Di (TRIUMF), Pepe (PIC), Alex (NL-T1), Jens (NDGF), Christoph (CMS), Alexei (LHCb)

Experiments round table:

  • ATLAS (Alessandro) reports (raw view) -
    • CentralServices/T0
      • CERN-PROD auth issue GGUS:111079 will be discussed between ATLAS and CERN experts, not clear to us why the directory is owned by atlascdr, to be discussed with storage experts at CERN.
      • INFN-T1 storage issue, GGUS:111097 solved
      • BNL-ATLAS deletion errors, GGUS:111096, under investigation
    • OB: ATLAS managed to almost fill the grid with the new ProdSys2 and Rucio (150k job slots running in parallel) in the last days. Final validation of analysis results coming from Prodsys2 is undergoing from the physics point of view.

  • CMS (Christoph) -
    • Disks rather full at Tier1 sites
      • Started a deletion campaign, no special load expected on SRMs at T1 sites.
      • KIT and PIC being included into dynamic data management
    • CPU accounting:
      • Looks like KIT has no values reported for Nov and Dec 2014 - known issue?

  • ALICE (Maarten) - NTR

  • LHCb (Alexei) reports (raw view) -
    • "Legacy Run1 Stripping" campaign running full steam and progressing well + MC and user jobs.
    • T0:
    • T1:

Sites / Services round table:

  • ASGC: ntr
  • BNL: ntr
  • CNAF: ntr
  • FNAL: np
  • GridPP: np
  • IN2P3:
    • 50 nodes need to be rebooted yesterday, diskservers and batch nodes were involved. This could have had some impact on experiment's activities.
  • JINR: np
  • KISTI:
    • Wrong accounting in APEL since April/2014 due to wrong benchmarking of nodes. Accounting data will be recomputed and corrected retroactively.
    • Worker node exposing wrong information about available resources (always reporting free slots). Being fixed.
  • KIT: nta
  • NDGF: ntr
  • NL-T1: ntr
  • NRC-KI: np
  • OSG:
    • some BDII information not propagating from OSG to WLCG since last Friday (GGUS:111106)
  • PIC: ntr
  • RAL:
    • Electrical power checks/interventions scheduled for this week. No service disruptions expected.
  • TRIUMF: ntr
  • CERN batch and grid services:
  • CERN storage services:
  • Databases: ATLAS patching of online/offline databases finished at 15h20'. Intervention was transparent (rolling). Replication was disabled during the intervention.
  • GGUS: ntr
  • Grid Monitoring: np
  • MW Officer: np

AOB:

Thursday

Attendance:

  • local: Akos (Grid Services), Alessandro (IT/DSS), Luca (IT/DSS), Andrea (IT/SDC), Maarten (IT/SDC+ALICE), Pablo (IT/SDC+GGUS), Xavier (IT/DSS+SCOD)
  • remote: Michael (BNL), John (RAL), Felix (ASGC), Rolf (IN2P3), Dennis (NL-T1), Kyle (OSG), Pepe (PIC), Thomas (KIT)

Experiments round table:

  • CMS (Christoph, twiki notes)
    • No one from CMS can join the call today
    • (Upgrade) MC campaign launched during this week
      • Mainly running at T2s
      • T1s continue processing DIGI-RECO
    • Some troubles with CERN EOS read and write access

  • ALICE -
    • an issue with the central AliEn services led to a sharp decrease in the number of jobs running on the grid
      • should be ramping up again later today

  • LHCb reports (raw view) -
    • "Legacy Run1 Stripping" campaign running full steam + MC and user jobs.
    • T0:
    • T1:

Sites / Services round table:

  • ASGC: ntr
  • BNL: ntr
  • CNAF: np
  • FNAL: np
  • GridPP: np
  • IN2P3: ntr
  • JINR: np
  • KISTI: np
  • KIT:
    • Low activity in the batch system, more than 3k slots idle. ALICE confirmed they will ramp up again after solving the issue with the AliEn services.
  • NDGF: np
  • NL-T1:
    • Currently working on an issue with a fileserver holding ALICE, ATLAS and LHCb data
  • NRC-KI: np
  • OSG:
    • The BDII propagation information problem form OSG to WLCG has been resolved
    • Unclear root cause of the problem mentioned by ATLAS on GGUS:111078 (end-of-file reached during transfers)
    • Will not connect next Monday (Holidays in US)
  • PIC: ntr
  • RAL:
    • No impact during the electrical power checks; more of them next week
  • TRIUMF:np

  • CERN batch and grid services:
  • CERN storage services:
    • The ticket mentioned by CMS GGUS:111170 was assigned to the wrong unit. Start having a look now.
  • Databases: np
  • GGUS:
  • Grid Monitoring: np
  • MW Officer:
    • DPM vulnerability broadcast message sent, sites asked to update to latest configuration More info

AOB:

-- AndreaSciaba - 2014-12-16

Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2015-01-15 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback