December 2012 Reports

To the main

20th December 2012 (Thursday)

  • Reprocessing running smoothly all files submitted, shall be finished by the week-end
  • Simulation workflows on the online farm have been successfully validated, starting ramp up of production activities now

  • T0:
    • VOMS : (GGUS:89497) voms-admin command timing out, being investigated
    • LHCb pilots failing on the grid during the network glitch, b/c they tried to access an afs hosted web service
    • Redundant pilots (GGUS:87448), fixed on 3 CEs, started submitting pilots to those and no issues so far
  • T1:
    • RAL: picking up jobs again after the network problem yesterday
    • GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425)
    • GRIDKA: new CEs are publishing 999999999 in the BDII for max CPU time (GGUS:89857)

19th December 2012 (Wednesday)

  • Reprocessing running smoothly. All reprocessing for 2012 submitted. LHCb will start to test the ONLINE FARM to be used during Christmas..

  • T0:
    • VOMS : (GGUS:89497) voms-admin command timing out, being investigated
  • T1:

18th December 2012 (Tuesday)

  • Reprocessing running smoothly. All reprocessing for 2012 submitted. LHCb will start to test the ONLINE FARM to be used during Christmas..

  • T0:
    • VOMS : (GGUS:89497) GGUS ticket not answer until today
  • T1:
    • RAL : Outage for a short period.

17th December 2012 (Monday)

  • Reprocessing running smoothly. All reprocessing for 2012 submitted. LHCb will start to test the ONLINE FARM to be used during Christmas..

  • T0:
  • T1:
    • CERN : some pilot failing (still same issue which is treated in some GGUS ticket)

14th December 2012 (Friday)

  • Reprocessing running smoothly. All reprocessing for 2012 submitted.
  • T0:
  • T1:
    • PIC: Jobs failing to access data due to TURL resolving errors. (GGUS:89664) Reason: SRM instabilities. Huge queue of Get Requests from different experiments. Max queue length increased and SRM restarted. Problem solved quickly.

13th December 2012 (Thursday)

  • Reprocessing running smoothly.
  • T0:
  • T1:
    • IN2P3-T2: Lots of stalled jobs. Pilot output "[Job has been terminated (got SIGXCPU); reason=152]" indicating CPU limits, although jobs have been running for merely 5 hours
    • CERN: Pilots still failing at CERN. (GGUS:88796) submitted in November, still not resolved. Affects around 10% of our jobs. Erorr: Invalid CRL: The available CRL has expired.

12th December 2012 (Wednesday)

  • Reprocessing running smoothly.
  • T0:
  • T1:
    • NL-T1: Jobs failing to access files at SARA due to incorrect TURL resolution. Solved quickly. (GGUS:89511)
    • CERN: Pilots still failing at CERN. (GGUS:88796) submitted in November, still not resolved. Erorr: Invalid CRL: The available CRL has expired (affects only some WNs); Also VOMS not responding yesterday. (GGUS:89497) Today seems to work fine, though there was no response from the ticket.

11th December 2012 (Tuesday)

  • Reprocessing running smoothly.
  • T0:
  • T1:
    • NL-T1: Dowtime SARA finished
    • PIC files lost from archive in the migration with no other replicas. Not a big issue, files were old and supposed to be deleted anyway.
    • RAL has another T2 attached now: LCG.Krakow.pl

10th December 2012 (Monday)

  • Started reprocessing activities, which means there will should be significant staging at the T1s.
  • T0:
  • T1:
    • NL-T1: We expect the downtime of SARA to be finished today, but we would like a notification in case of delay.
    • PIC had an ARCHIVAL deletion of 930 files by error, due to a bug after enstore update.

07th December 2012 (Wednesday)

  • Prompt reconstruction at CERN + attached T2s. Monte Carlo at T1s and T2s
  • The problem of agents submitting pilots at the sites seems related with network issues between our voboxes.
  • T0:
  • T1:
    • GRIDKA: OK for the SE downtime of 15-17 Jannuary. Please fill the GOCDB and remind us few days before.
    • NL-T1: The SE down of 10th December is ok. Do you plan to stop just tape backend or also disk?

06th December 2012 (Wednesday)

  • Normal operation activities. Waiting for the new databases to start the last reprocessing step.
  • Still some problem with agents responsible of submitting pilot to the sites. Investigation is ongoing.
  • T0:
  • T1:
    • NL-T1: Bunch of failed FTS transfers just before lunch.

05th December 2012 (Wednesday)

  • Normal operation activities. Waiting for the new databases to start the last reprocessing step.
  • T0:
  • T1:
    • NTR

04th December 2012 (Tuesday)

  • Prompt reconstruction: CERN + 5 Tier2 sites
  • MC productions at T2s and T1s (until reprocessing will restart)
  • Had some problem because a partition on a vobox got full. Hot-fixed. Plan to reshuffle a bit the distribution of databases among the voboxes.
  • T0:
  • T1:
    • RAL: Some problem in the early morning in with FTS transfer from CERN. It seemed to be a corruption in FTS database. It has been fixed quickly.
    • IN2P3: Lots of FTS transfer failure during the night (also between IN2P3/IN2P3 and IN2P3/IN2P3-T2). Problem disappeared in the morning.

03th December 2012 (Monday)

  • Reprocessing until last stop finished. New DataBases for the last step (from 30th of November) will be ready around Thursday this week.
  • Prompt reconstruction: CERN + 5 Tier2 sites
  • MC productions at T2s and T1s (until reprocessing will restart)
  • T0:
    • Is planned the upgrade of the LFC tu EMI.
  • T1:
    • RAL: Some problem in accessing data. A disk server is down and need a fsck before to put it again in production (not before tomorrow). [Tiju announces that it was put back in production this morning]
    • CNAF: Installed the new disk storage (many thanks to CNAF people)
    • GRIDKA: transfer failure to/from several sites. No clue from the site. [Pavel adds that experts are working on the problem: they increased the debug level and do see some transfers failing (but not all of them)]
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2013-01-09 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback