Week of 130603

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance:

  • local: Simone - SCOD, Ignacio - PES, Luc - ATLAS, Belinda - DSS, Ivan - DASHBOARD, Stefan - LHCb, MariaD - GGUS
  • remote: Xavier - KIT, David -CMS, Wei-Jen - ASGC, Lisa - FNAL, Gareth - RAL, Ulf - NDGF, Paolo - CNAF, Onno - NL-T1, Rolf - IN2P3, Rob - OSG, Alexey - LHCb.

Experiments round table:

  • ATLAS reports (raw view) -
    • NTR for this meeting
  • CMS reports (raw view) -
    • Production activity at moderate levels at T1's, with occasional MC reprocessing as we prepare for the final legacy rereco of the 2011 data
    • Otherwise quiet -- no GGUS tickets of note, though that may be in part due to issues with Savannah-GGUS bridging: SAV:137920
  • ALICE -
    • CNAF: the site got drained of ALICE jobs on Sun due to stale job numbers being reported by one of the CEs (ce04), which gave an appearance of too many jobs being queued already. The bad CE was excluded late Sun evening and job submission resumed.
  • LHCb reports (raw view) -
    • Incremental stripping campaign in progress and MC productions ongoing
    • T0: (GGUS:94346) cvmfs doesn't update local cache correctly
    • General: some sites are publishing 99999 in the BDII for CPU time (used to calculate the queue length for the job). It seems it might be related to upgrade to SL6. LHCb will be contacting the sites. The issue has been discussed with Maria Alandes, who is checking.
    • KIT for LHCb: one TAPE is not accessible and was communicated to LHCb. Should KIT try to recover it or can be given up? (in principle the files are fine, but the software of the tape library refuses to read it).
      • Answer: some files are only in GridKA, so please try recovering them. KIT will increase the pressure on the external support to solve the issue.

Sites / Services round table:

  • NDGF: network problem during the weekend, went unnoticed
  • IN2P3-CC: 2 outages are foreseen. One for the batch system on Wednesday. Queues will be drained at 10:00 PM CEST on Tue (tomorrow), and will be re-opened at 3:00PM CEST of Wed. Longer outage the week after on the 11th of June: many services will be down. In particular The MSS will go through long maintenance and have a even longer outage (from Sunday the 9th to Wednesday the 12th at noon).
  • MariaD has a question for all: there is a thread with PIC concerning notifications of scheduled downtimes. There are 2 Twiki pages, one from 2009 https://twiki.cern.ch/twiki/bin/view/EGEE/SA1_USAG#Site_scheduled_interventions and one from EGI https://wiki.egi.eu/wiki/MAN02#How_to_manage_an_intervention. Maria would like to know if real announcements by sites are made differently from what these pages suggest and in what way, also whether experiments have any comment on these procedures. Will be re-iterated on Thursday.
  • CERN Storage: a series of rolling intervention for Oracle upgrade for the CASTOR backend are underway. Transparent.
  • GGUS: Important GGUS release this Wednesday 2013/06/05, especially due to the amount of Support Units being decommissioned. Check the dev. items' list http://bit.ly/14Jhw0C for details

AOB:

Thursday

Attendance:

  • local: Simone - SCOD, Luc - ATLAS, Belinda - DSS, Ignacio - PES
  • remote: Xavier - KIT, Ulf - NDGF, Salvatore - CNAF, John - RAL, Ronald - NL-T1, Kyle - OSG, Rolf - IN2P3, Wei-Jen - ASGC, Pepe - PIC

Experiments round table:

  • ATLAS reports (raw view) -
    • T0/Central services
      • LFC nodes failure. GGUS:94529 & INC:10002. DB cursors got closed due DB intervention and logic in Cns_list_rep_entry not able to handle. Rolling restart of LFC deamons in all nodes performed. Fixed.
      • Need of new host certificates for DQ2 Central Catalogs with hostname included. https://its.cern.ch/jira/browse/CSOPS-65
    • T1
      • INFN-T1: Lack of space on DATADISK. ATLAS is looking for a solution (some space found) . In addition SRM issues GGUS:94653
  • CMS reports (raw view) -

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Incremental stripping campaign in progress and MC productions ongoing
Sites / Services round table:
  • NDGF: scheduled DT yesterday for reboot of dCache. Some FTS transfers failed.
  • CNAF: the mentioned GGUS from ATLAS is a side effect of the shortage of space. As temporary solution, some more space will be added till ATLAS will clean something.
  • RAL: intervention today at the 3D database (overrunning a bit).
  • IN2P3: downtime fro the batch system is over. No issue with the new version of the batch system is observed. The next downtime (tape libraries) will start on monday (and not sunday as announced).
  • OSG: alarm tests sent yesterday were fine.
  • PIC: on monday there will be a SD to upgrade dCache head node and have some network intervention. On sunday queues will be drained.
  • CERN storage: oracle security update for CASTOR upgrade concluded. No major issue.

AOB:

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2013-06-06 - SimoneCampana
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback