Week of 080811

Open Actions from last week:

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

General Information

See the weekly joint operations meeting minutes

Additional Material:

Monday:

Attendance: local(Harry, Jean-Philippe, Miguel, Ewan, Luca, Daniele, Simone);remote(Derek, Michael).

elog review:

Experiments round table:

CMS (DB): Following up on mostly CMS related issues and preparing for CRUZET 4 from 18 August. One observation - about 6-7 hours after ATLAS started exporting data this weekend we noticed a slowdown in the CMS rate of data export with no obvious reason. HR will look for correlations in lemon.

ATLAS (SC): Had a continuous data run this weekend (12 hours till a luminosity block was written) hence 1 GB/second into CASTOR for 12 hours into a single ATLAS 'data set'. This splits into 16 streams of which 5-6 are particularly big so there were high rate transfers, successful in fact, into the sites receiving those parts. Functional tests were stopped on Sunday but have now resumed. We have seen SRM problems at RAL before their LFC went down as scheduled. D.Ross reported they had some recurrence of the load issues seen last Friday and that today they were performing a global castor upgrade then an ATLAS-only one on Wednesday. SC reported a problem with 2 of their 4 muon calibration sites where both Naples and Nikhef report their disks are full when they should not be. Naples was cleaned up overnight so should only have a few MB of disk occupied.

Sites round table:

Core services (CERN) report: Today's scheduled rollout of the Oracle security patch to the public databases was cancelled for a second look. A new date will be discussed tomorrow.

DB services (CERN) report (LC):

- Atlas offline DB node 1 crashed on Saturday night at 2:50 AM for an HW problem. The node rebooted and went back to prod. Services were not affected

- Atlas offline DB node 2 crashed on Sunday night because of a core dump of the Oracle clusterware. The issues is under investigation and seems related to bug 7187896. The node was rebooted by the operators and went back to prod. Services were not affected

- LHCB offline database node 3 is currently down for an issue that appeared after applying CPU JUL08. This issues has not been observed before and is currently under investigation. Services are not affected as they keep running on the remaining 2 nodes of the LHCBR cluster

- As scheduled tomorrow ATLR and ATONR will be patched with CPU JUL08 (rolling upgrade)

Monitoring / dashboard report:

Release update:

AOB: SC asked about the status of the creation of two new requested ATLAS pools. Miguel said they are about to discuss the strategy of analysis pools so are not creating any before then. SC said one, of 10 TB, is not for analysis and he agreed to send a reminder to Miguel.

Tuesday:

Attendance: local(Jacek, Simone, Jean-Philippe, Andrea, Jamie, Harry, Miguel);remote(Michael, Gonzalo, Jeremy).

elog review:

Experiments round table:

  • ATLAS (Simone) - just 1 point - starting from 17:00 yesterday acron at CERN stopped working - ok from ~10:30 today (was a network switch!). Side-effect: functional test stopped for this period. Otherwise all ok. Sites performing well. Still tail of jobs from cosmic data taking this w/e.

  • CMS (Andrea) - similar! Also affecting by acron outtage - submissions of SAM tests and some monitoring info in SLS - frontier, DBS. Now ok. 2nd point: Daniele discussed elog for CMS. Harry - James away! Follow-up with Julia... Has arranged backup for James..

Sites round table:

Core services (CERN) report:

  • (Miguel) CASTORCMS upgrade this morning - went ok. LHCb tomorrow.

DB services (CERN) report:

  • (Jacek) - LHCb offline cluster problem, just a few minutes before meeting. 3rd node was odwn, try to add back, 1st node went down. Could not login for a few minutes as all services down. Investigating...

Monitoring / dashboard report:

Release update:

AOB:

Wednesday

Attendance: local();remote().

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Thursday

Attendance: local();remote().

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Friday

Attendance: local();remote().

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Edit | Attach | Watch | Print version | History: r7 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2008-08-12 - JamieShiers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback