Difference: ProductionOperationsWLCGJune12Reports ( vs. 1)

Revision 12012-09-12 - JoelClosier

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ProductionOperations"

June 2012 Reports

To the main

29th June 2012 (Friday)

  • Users analysis at T1s ongoing
  • MC production at all sites
  • Some catch up of processing tail at SARA.
  • New GGUS (or RT) tickets
  • T0:
    • Moving DIRAC accounting services to new machines. Not yet back in production - grid operations going on as usual, though some background LHCb operations are paused as a result.
    • CERN power failure : LHCb VOboxes not affected. However, some problems accessing files in castor. Quite a few failed jobs as a result. Also 340TB missing (GGUS:83713) from LHCbDisk

  • T1:
    • FZK-LCG2: Failed user jobs (GGUS:83608). As requested, dCache client upgraded to v2.47.5-0 (applies to all dCache sites). Wait and see.
    • NL-T1 : SARA problem with file access understood and fixed. (GGUS:83676). Jobs going through now.
    • IN2P3 / GridKa file corruption : PIC and SARA contacts also checking.

28th June 2012 (Thursday)

  • Users analysis at T1s ongoing
  • MC production at all sites
  • New GGUS (or RT) tickets
  • T0:
    • Moving DIRAC accounting services to new machines. Not yet back in production - grid submission as usual.
  • T1:
    • FZK-LCG2: Failed user jobs (GGUS:83608). As requested, dCache client upgraded to v2.47.5-0 (applies to all dCache sites). Wait and see.
    • NL-T1 : SARA problems ongoing. New ticket opened (GGUS:83676) - old ticket referred to 5 different problems and the latest failures seem different from all of those.
    • IN2P3 / GridKa file corruption : Requested PIC contact to check at PIC.

27th June 2012 (Wednesday)

  • Users analysis at T1s ongoing
    * MC production at all sites

    * <strong>New GGUS (or RT) tickets </strong>

    * T0:
    * Moving DIRAC accounting services to new machines. Should be completed this evening.

    * T1:
    * FZK-LCG2: Failed user jobs (GGUS:83425) - any update? More failed user jobs (GGUS:83608). Slow pool?
    * NL-T1 : SARA problems ongoing. Now not able to access files there, even though the srm agrees it exists and the files are accessible locally. (GGUS:83584)
    * Corrupted files (IN2P3 & FZK) : Not clear why the problem with jobs was seen only at IN2P3 but not at GridKa : Possibly just chance. LHCb will be cleaning up the corrupted files found there. The source of the corruption is not clear, though there is possibly a clue in the two (three?) bursts of writing times when it occured. Possibly need to extend the check to other sites (at least those using dCache).

26th June 2012 (Tuesday)

  • Users analysis at T1s ongoing
    * MC production at all sites

    * <strong>New GGUS (or RT) tickets </strong>

    * T0:
    * Moving DIRAC accounting services to new machines. Will take ~24 hours.

    * T1:
    * FZK-LCG2: Looking forward to new dCache instance soon for LHCb. Also requested more storage in current configuration. Failed user jobs (GGUS:83425) - any update?
    * NL-T1 : SARA srm problems - but they have gone on for >3 months now. (GGUS:83584)

25th June 2012 (Monday)

  • Users analysis at T1s ongoing
    * MC production at all sites

    * <strong>New GGUS (or RT) tickets </strong>

    * T0:

    * T1:
    * FZK-LCG2: Looking forward to new dCache instance soon for LHCb
    * IN2P3 : CVMFS problem (GGUS:83528)
    * NL-T1 : SARA srm problems - but they have gone on for >3 months now.

22th June 2012 (Friday)

21th June 2012 (Thursday)

20th June 2012 (Wednesday)

19th June 2012 (Tuesday)

18th June 2012 (Monday)

15th June 2012 (Friday)

14th June 2012 (Thursday)

13th June 2012 (Wednesday)

  • Users analysis and prompt reconstruction and stripping at T1s ongoing
    * MC production at Tiers2

    * <strong>New GGUS (or RT) tickets </strong>

    * T0:
    * CERN:

    * T1:
    * RAL : Scheduled Downtime
    * CNAF: Pilots failed; Fixed without GGUS ticket


12th June 2012 (Tuesday)

11th June 2012 (Monday)

7th June 2012 (Thursday)

  • Users analysis and prompt reconstruction and stripping at T1s ongoing
    * MC production at Tiers2

    * <strong>New GGUS (or RT) tickets </strong>

    * T0:
    * CERN:

    * T1:
    * Set Inactive all the CONDDB access in order to prepare the retirement of the 3D streaming
    * IN2P3 : ask to suspend jobs during the schedule intervention on Monday for GE upgrade


6th June 2012 (Wednesday)

  • Users analysis and prompt reconstruction and stripping at T1s ongoing
    * MC production at Tiers2

    * <strong>New GGUS (or RT) tickets </strong>

    * T0:
    * CERN: (GGUS:82874) SRM BUSY for transfering file from PIT to CASTOR

    * T1:
    * IN2P3: (GGUS:82751) Fixed
    * Set Inactive allthe readonly instance of LFC in order to prepare th eretiremnt of the 3D streaming

5th June 2012 (Tuesday)

4th June 2012 (Monday)

  • Users analysis and prompt reconstruction and stripping at T1s ongoing
    * MC production at Tiers2

    * <strong>New GGUS (or RT) tickets </strong>

    * T1:
    * IN2P3: 26k files unavailable due to hardware problem (GGUS:82751) the same disk server again failing
    * SARA : downtime between 9h and 12h UTC

1st June 2012 (Friday)

-- JoelClosier - 12-Sep-2012

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback