Week of 080908

Open Actions from last week:

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

General Information

See the weekly joint operations meeting minutes

Additional Material:

Monday:

Attendance: local(Andrea, Daniele, Roberto, Jean-Philippe, Julia, Jan, Miguel, Maria);remote(Derek, Jeremy, Michael).

elog review: #460

Experiments round table:

  • CMS (Daniele): cosmics still rolling from Cessy to Tier-0 and Tier-1 sites; data corruption in some disk servers, temporarily bypassed by writing on the local disks of the machines running the Storage Manager; effort to strengthen the processes and the communications involving online and offline (see #460).
  • LHCb (Roberto): both past and current week are generally quiet, but there are a few problems:
    • found a bug in the LHCb SAM "test" which installs LHCb software on sites, determining a corruption of the software area; the problem is going to be fixed soon, but expect some critical test failures leading to low availabilities at affected sites;
    • MC production: stripping jobs are failing due to crashes of the DaVinci software, which are being investigated;
    • "CCRC-like" production: test jobs were sent to Tier-1 sites, succeeding everywhere but at NIKHEF. Several production jobs hanging at CERN, which Miguel explains being due to them trying to open too many connections.

Sites round table: no report

Core services (CERN) report:

  • Miguel reports about a new bug in CASTOR, affecting all instances, which might determine a data corruption in a daemon (not yet observed). The fix was already deployed in the public instance. Concerning the experiment instances, it has been agreed to deploy the fix tomorrow morning (it should be totally transparent) and to send out a description of the bug and the intervention today.

DB services (CERN) report:

  • Maria reports about a rolling intervention (kernel upgrade) today on the PDBR (affecting also ALICE) and the CMSR (the CMS offline database). Tomorrow the upgrade will be performed on the WLCG database server and the ATLDSC and LHCBDSC downstream databases.

Monitoring / dashboard report: no report

Release update: no report

AOB: none

Tuesday:

Attendance: local(Daniele, Simone, Roberto, Patricia, Jean Philippe, Miguel Dos Santos, Maria, Julia, Gavin);remote(Michael, Derek).

Experiments round table:

  • CMS (Daniele): Transparent intervention at CASTOR@CERN. Yesterday's problems:
    1. T0 Monitoring system resulted in problem at a VOBOX
    2. From sometime in the afternoon problem in the SLS monitoring for the LSF queues. Ticket has been submitted, problem has been solved but CMS would like clarifications.
  • LHCb (Roberto): running stripping activities. Yesterday many problems with jobs failing (the casuse was Dirac configuration services being down). CCRC08-like exercise: sending test jobs to check CondDB access and direct POSIX access to storage (CNAF and PIC are OK, CERN is in pending status, all other sites fail). Dummy Monte Carlo production: problem at the Dirac "optimizer" preventing to submit jobs only T2 and T3.
  • ALICE (Patricia): new Alien version, being upgraded at every site. Sites have been informedm, they can upgrade their own and should do it. In case, ALICE central operation can help.
  • ATLAS (Simone): Some problems exporting data from CERN to T1s, due to missing files in CASTOR. See Miguel' report below

Sites round table:

  • BNL (Michael):
    1. Licence problem at the load balancing in front of Panda Server.
    2. The network provider would like to run inclusive test and bring down the primary CERN-BNL link. BNL feels very uneasy with this, especially since timescale has not been announced nor discussed with BNL

Core services (CERN) report:

  • Miguel: Yesterday transparent upgrade to CASTOR. No problem observed. Network problem yesterday which affected CASTOR instance. Degradation between 7:30 and 10:30.

DB services (CERN) report:

  • Completed the upgrade for Update7 RH4 Problem at SARA, causing degrade for streams for ATLAS. Might affect also LHCb since this is a single instance of the DB.
  • Daniele: the shifter yesterday observed some impact of the upgrade not only for the offline but also the online. Daniele will send details to Maria.

Monitoring / dashboard report:

Release update:

AOB:

Wednesday

thumbs up LHC First Beam day! - http://lhc-first-beam.web.cern.ch/lhc-first-beam/Welcome.html

Attendance: local();remote().

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Thursday

"Jeune Genevois" - CERN closed smile

Attendance: local();remote().

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Friday

Attendance: local();remote().

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Edit | Attach | Watch | Print version | History: r7 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2008-09-09 - SimoneCampana
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback