January 2012 Reports

To the main

31th January 2012 (Tuesday)

Experiment activities: MC11 Monte Carlo productions

New GGUS (or RT) tickets

  • T0
  • T1
    • CNAF: address of the LFC service disappeared in the DNS

30th January 2012 (Monday)

Experiment activities: MC11 Monte Carlo productions

New GGUS (or RT) tickets

  • T0
  • T1
    • CNAF: problem with pilot submission (GGUS:78713). Solved immediately.
    • GRIDKA: Failure to log on to LHCb Tier-1 VO-box (GGUS:78720). Solved, network problem.
    • RAL: SAM tests for WMS submission failing (GGUS:78760). Solved, misconfiguration on the batch system.


27th January 2012 (Friday)

New GGUS (or RT) tickets

Experiment activities * MC11: Monte Carlo productions

  • T0
    • Yesterday Castor databases upgrade. All went fine
    • This morning about 30% jobs completed and waiting to upload job logs. Might be related to the general network problem of this morning, reported in IT Service Status Board. Later during the day the situation went back to normality

  • T1
    • PIC: issue solved after replacement of faulty drivers. No significant job failure rate during last day

26th January 2012 (Thursday)

New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0
    • Castor databases upgrade this morning. Finished?

  • T1
    • PIC: the problem for uploading output data (mentioned yesterday) has been tracked down to failed connections to PIC LFC catalogue. In some worker nodes, some faulty drivers were causing a timeout in the connection. They have been replaced.
    • IN2P3: this morning high percentage of failed MC jobs, that were identified as stalled. Under investigation (no ticket opened so far)

25th January 2012 (Wednesday)

New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0
    • After Oracle upgrade some issue with conditions replications to Tiers1, fixed around 21h.

  • T1
    • PIC: since Monday a high percentage of jobs fail when trying to upload the output data. Probably due to failed authentication with LFC. Under investigation in collaboration with the site.
    • IN2P3: some failed transfers to IN2P3 this morning from 4am to 8am. Currently working fine.

24th January 2012 (Tuesday)

New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0
    • Oracle upgrade ongoing ( from 10h to 14h), main services affected are the LFC and the Bookkeeping. Users have been notified in advance.
  • T1
    • PIC: yesterday opened a GGUS ticket for aborted pilots, promptly fixed by the site.

23rd January 2012 (Monday)

New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0

  • T1
    • PIC: unscheduled downtime yesterday (Sunday). After all services were restarted (around 8pm), still some problem observed with LFC streams. Site availability has turned red again today at about 12h.
    • IN2P3 : migration of the data from the old space tokens to the new ones is ongoing (GGUS ticket ): T0D1 and T1D0 space tokens migrated, T1D1 ongoing.
    • SARA : problem of CVMFS on some node solved on Friday 20th (GGUS:78391)

19th January 2012 (Thursday)

Experiment activities: New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0

  • T1
    • IN2P3 : problem with SRM
    • IN2P3 : migration of the DATA from the OLD space token to the new one is working . We need to cross check if it is ok, before we give the recipe to the other LHCb T1 using dCache.
    • SARA : problem of CVMFS on some node (GGUS:78391)

16th January 2012 (Monday)

Experiment activities:

New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

* T0 * SAM jobs failing when accessing CEs (GGUS: 78185), fixed by restarting Gatekeeper * T1 * T2 * Most pilots aborted at SINP.ru (GGUS:78275) and USC.es (GGUS:78277)

13th January 2012 (Friday)

Experiment activities:

New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0
    • SAM jobs failing when accessing CEs (GGUS: 78185)
  • T1

11th January 2012 (Wednesday)

Experiment activities: New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0
    • Problem with access to castor after downtime today, the problem was swiftly fixed by castor team (GGUS:78103)

  • T1

10th January 2012 (Tuesday)

Experiment activities: New GGUS (or RT) tickets

Experiment activities

  • MC11: Monte Carlo productions

  • T0
    • For Wednesday downtime no special treatment needed for queues by PES

  • T1

9th January 2012 (Monday)

Experiment activities: New GGUS (or RT) tickets Experiment activities
  • MC11: Monte Carlo productions

  • T0
    • Preparing for downtime tomorrow. Will jobs in the system be killed or suspended?
  • T1

5th January 2012 (Thursday)

Experiment activities: New GGUS (or RT) tickets Experiment activities
  • MC11: Monte Carlo

  • T0

  • T1

9th December 2011 (Friday)

Experiment activities: New GGUS (or RT) tickets

Experiment activities

  • MC 11: start up of the huge campain of Monte Carlo

  • T2

-- JoelClosier - 06-Feb-2012

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2013-03-04 - JoelClosier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback