Week of 131118

WLCG Operations Call details

To join the call, at 15.00 CE(S)T, by default on Monday and Thursday (at CERN in 513 R-068), do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance:

  • local: AndreaV/SCOD, Alessandro/ATLAS, Maarten/ALICE, Felix/ASGC, Przemek/Databases, Pablo/Dashboard, Belinda/Storage, MariaD/GGUS
  • remote: Sang-Un/KISTI, Michael/BNL, Onno/NLT1, Xavier/KIT, Tiju/RAL, Kyle/OSG, Rolf/IN2P3, Sonia/CNAF, Lisa/FNAL, Ulf/NDGF, Stefano/CMS, Vladimir/LHCb

Experiments round table:

  • ATLAS reports (raw view) -
    • Central services
      • Saturday many ATLAS quattor managed SLC6 voboxes lost possibility to connect. It seems problem was that the kerberos authentication was not working. The problem has been solved in few hours by CERN IT.
    • T0/T1
    • ATLAS internal
      • Sunday early morning SAAB for a while did not act. All the auto-blacklistings were off. Salvatore answered that the problem was somwhere else. Most probably the problem was on the node running SAAB itself. Waiting further clarifications from him.

  • CMS reports (raw view) -
    • No more data from detector until next Spring.
    • Central Production and Analysis running normally.
    • Nothing to report

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Main activities is Simulation at all Sites.
    • T0:
    • T1:
      • GRIDKA: Downtime can start any time to upgrade SE for (GGUS:98571).

Sites / Services round table:

  • Sang-Un/KISTI: ntr
  • Michael/BNL: ntr
  • Onno/NLT1: about the issue reported by ATLAS, waiting for the controller to be replaced and following up with the vendor; meanwhile working to install new hardware that will get rid of these issues
  • Xavier/KIT:
    • lost 28 CMS files due to a broken tape, will prepare SIR
    • also having some problems with disk-only files for CMS
  • Tiju/RAL: ntr
  • Kyle/OSG: ntr
  • Rolf/IN2P3: announce one-day outage on December 10 for major network upgrades
  • Sonia/CNAF: ntr
  • Lisa/FNAL: ntr
  • Ulf/NDGF:
    • OPN link to CERN is down, link to SARA works normally, using backup link to CERN
    • next week robot maintenance on one of three ATLAS tape systems, this is being followed up with ATLAS
    • resources of one cluster are still down after a storm
  • Felix/ASGC: still working on SIR, it will take some more days to be ready

  • Przemek/Databases: ntr
  • Belinda/Storage:
    • transparent Castor updates ongoing till Wednesday
    • EOS ATLAS short intervention at 10 tomorrow
  • Pablo/Dashboard:
    • three sites (BNL, SARA, NDGF) have not yet moved to the new DNS alias for the message broker for FTS monitoring even if they were contacted one month ago on the FTS mailing lists, the old system should be switched off today [Alessandro: was not aware of this (this was not discussed at this meeting) and this would be a major problem for ATLAS that would become blind to these FTS transfers, the old system cannot be switched off today. Michael/BNL: was also not aware of this. MariaD: should open GGUS tickets on the sites. Pablo: ok will postpone the switching off of the old system. Alessandro: this is probably an easy operation, please post the instructions here and the representatives from the three sites at today's meeting can follow up. Andrea: next time please discuss this at this meeting in advance so that this can be escalated. Maarten after the meeting: opened three GGUS tickets for BNL (GGUS:98967), NDGF (GGUS:98970) and SARA (GGUS:98969). ]
  • MariaD/GGUS: ntr

AOB: none

Thursday

Attendance:

  • local: MariaD/SCOD, Pablo/Dashboard, Belinda/Storage, Ben/GridServices, Felix/ASGC.
  • remote: Lisa/FNAL, IanF/CMS, Michal Svatos/Prague, Elisabeth/OSG, Vladimir/LHCb, Saverio/CNAF, Xavier/KIT, Sang-Un/KISTI, Ulf/NDGF, Gareth/RAL, Dennis/NL_T1, Pepe/PIC, Michael/BNL, Rolf/IN2P3.

Experiments round table:

  • ATLAS reports (raw view) This report is from Wednesday 20 November - nobody connected - MariaD read the report - apologies emailed a posteriori by Ale Di Gi :

  • CMS reports (raw view) - IanF was connected but couldn't be heard - MariaD read the report.
    • not much to say, Global Run closed, back to standard data processing
    • no issue coming from network intervention on Tue - maybe a couple failed Acrontab runs

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Main activities is Simulation at all Sites.
      • T0:
      • T1: Vladimir reported the SARA_MATRIX problem is solved.

Sites / Services round table:

  • ASGC: Monday 25/11 intervention around 2am-4pm CET for DPM update and work on the network at the same time. GOCDB is up-to-date.
  • BNL: Big intervention scheduled for December 16-17. This will be a full shutdown of Tier1 services. The network will be re-configured with 100Gbps capabilities and a new routing protocol in their LAN. The necessary changes for SHA-2 compliance will take place at the same as well as a dCache upgrade.
  • FNAL: ntr
  • OSG: ntr
  • NDGF: dCache upgrade on Monday 25/11. Yesterday very big incident with the power feed on a major NDGF site that destroyed about 2500 fuses. All services are now recovered.
  • NL_T1: ntr
  • IN2P3: ntr
  • RAL: Intervention on Tuesday 26/11 am affecting the FTS2 service and the ATLAS 3D Frontier.
  • PIC: ntr
  • KIT: ATLAS pool were down last night between 4-8am. On Thursday 28/11 there will be a dCache LHCb upgrade.
  • CNAF: ntr
  • KISTI: ntr

  • CERN:
    • Grid Services: CERN VOMS Intervention and Upgrade - On Wednesday 27th November from 08:30 to 11:00 the CERN VOMS service on voms.cern.ch and lcg-voms.cern.ch will be updated to a new version. The intervention will be transparent and voms-proxy-init requests will be unaffected. The only visible sign is new registrations will not be processed during the intervention. ITSSB, GOCDB entries will be posted imminently. Please note this is not the removal of VOMRS from the architecture which will happen at a subsequent date.
    • Storage: EOS LHCb will be upgraded on Tuesday 26/11 between 9-11am CET. Service will be unavailable.
    • Dashboards: The new DNS alias for the message broker for FTS monitoring is now used, good outcome of last Mondays discussion.

  • GGUS: Release on Wednesday 2013/11/27 with the ALARM test round as usual.

AOB:

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx MB-Nov.pptx r2 r1 manage 2864.1 K 2013-11-19 - 09:09 PabloSaiz  
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2013-11-22 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback