Difference: ProductionOperationsWLCGOct11Reports (1 vs. 2)

Revision 22012-01-19 - JoelClosier

Line: 1 to 1
 

October 2011 Reports

To the main

Revision 12011-11-08 - JoelClosier

Line: 1 to 1
Added:
>
>

October 2011 Reports

To the main

28th October 2011 (Friday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Prompt reconstruction and stripping at CERN and Tier-1 sites.
    • 1st round of reprocessing at T1 sites and T2 sites almost over (some tail going on - especially at GridKa)
    • Next round of reprocessing to start next week.

  • T0
    • Going through stripping backlog slowly.

  • T1 sites:
    • Possible problem with SE - following up offline with site admin (Xavier) to understand what is happening.

  • T2 sites:
    • Aborted pilots at a few sites (Weizmann, Lancaster)

27th October 2011 (Thursday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • CERN : Spike of failed jobs this morning between 6AMUTC and 10AMUTC. These were jobs accessing d0t1 storage and the problem seems to have gone away now.

  • T1 sites:
    • First stage of reprocessing almost over.

  • T2 sites:
    • Minor problems at JINR, RHUL. Being followed by GGUS tickets

26th October 2011 (Wednesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

25th October 2011 (Tuesday)

Experiment activities:

New GGUS (or RT) tickets:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites
    • Starting to actively synchronise files on LHCb Tier-1 SEs with expectation (LFC, DIRAC catalogs)

  • T1 sites:
    • IN2P3 : "Scheduled" downtime to update Chimera server, but batch queues were not drained. There were also problems with LHCb jobs at IN2P3 even before the downtime officially started.

  • T2 sites:
    • GRIF in downtime until end of this week. Update appreciated on how soon it will be back - used for LHCb reprocessing.

24th October 2011 (Monday)

Experiment activities:

New GGUS (or RT) tickets:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • CERN : Running a lot more jobs now, but not fully clear if the fairshare system has been fixed

  • T1 sites:
    • IN2P3 : (GGUS:75610) : srm problem on 23 Oct. Fixed after alert from LHCb.
    • RAL : SE in unscheduled downtime.

  • T2 sites:
    • Manchester : Pilots aborted (GGUS : 75614)

21st October 2011 (Friday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • LFC issue: (GGUS:75533)
    • CERN : after discussion we are running more than 3500 jobs now. Thanks to LSF people for the new tuning.

  • T2 sites:

20th October 2011 (Thursday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T2 sites:

19th October 2011 (Wednesday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0

  • T1 sites:
    • CERN : (GGUS:75374) during the last week we never get in average our LSF share at CERN : why ?
    • CERN : (GGUS:75467) file not staged from the point of view of SRM
    • PIC : (GGUS:75462) srm / FTS error

  • T2 sites:

18th October 2011 (Tuesday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0

  • T1 sites:
    • CERN : (GGUS:75374) during the last week we never get in average our LSF share at CERN : why ?
    • NIKHEF : downtime
    • Congratulation to the site which reply in few hours for the reallocation of space

  • T2 sites:

17th October 2011 (Monday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T1 sites:
    • CERN : (GGUS:75374) during the last week we never get in average our LSF share at CERN : why ?
    • PIC : (GGUS:75344) : problem with SRM
    • RAL : problem with one disk server.
    • IN2P3: (GGUS:75382) reallocation of free space from LHCb_MC_DST and LHCb_MC_M-DST
    • PIC: (GGUS:75383) reallocation of free space from LHCb_MC_DST and LHCb_MC_M-DST
    • SARA: (GGUS:75384) reallocation of free space from LHCb_MC_DST and LHCb_MC_M-DST
    • Gridka: (GGUS:74915) reallocation of free space from LHCb_MC_DST and LHCb_MC_M-DST

  • T2 sites:
    • IN2P3 : reconfiguration of CE is done.

14th October 2011 (Friday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T2 sites:

13th October 2011 (Thursday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T1 sites:
    • CNAF : (GGUS:75162) : problem with CREAM CE
    • GRIDKA : (GGUS:75261) srm problem yesterday evening.
    • GRIDKA: (GGUS:74915 ) space token migration (51TB have just been added)
    • NIKHEF : (GGUS:75279) problem of WMS publication in the BDII

  • T2 sites:

12th October 2011 (Wednesday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites
    • Again no notification received since more than 24h about downtime (GGUS:75243)

  • T0
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T1 sites:

  • T2 sites:
    • IN2P3-t2: (GGUS:75131) done but no jobs are running . We discuss with IN2P3 concerning this issue and Pierre will make a summary of our discussion.

11th October 2011 (Tuesday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

10th October 2011 (Monday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

7 October 2011 (Friday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • Castor: Problems with access to disk pools via xrootd protocol (GGUS:74751) Closed, problem with content of file.
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)
    • CERN: Pilots aborted at ce130.cern.ch (GGUS:75068). Reopened.

  • T1 sites:
    • GRIDKA: Observing problems with jobs with many input files when accessing storage via protocol, mostly user jobs affected by this.
    • RAL: (GGUS:75004) Pilots aborted at lcgce08.gridpp.rl.ac.uk. Solved.

6 October 2011 (Thursday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN
    • Reprocessing at T1 sites and T2 sites

  • T0
    • Castor: Problems with access to disk pools via xrootd protocol (GGUS:74751)
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T1 sites:
    • GRIDKA: Observing problems with jobs with many input files when accessing storage via protocol, mostly user jobs affected by this.
    • NIKHEF: (GGUS:74976) Jobs failed, pilots aborted. Solved
    • RAL: (GGUS:75004) Pilots aborted at lcgce08.gridpp.rl.ac.uk

5 October 2011 (Wednesday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN only
    • Reprocessing at T1 sites and few T2 sites

  • T0
    • Castor: Problems with access to disk pools via xrootd protocol (GGUS:74751)
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T1 sites:
    • GRIDKA: Observing problems with jobs with many input files when accessing storage via protocol, mostly user jobs affected by this.
    • SARA: (GGUS:74875) We have a problem with staging requests that are not responding for SARA-RAW. Solved.
    • IN2P3: (GGUS:74961) Can not get pilots status at CREAMCEs. Solved.

4 October 2011 (Tuesday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN only
    • Reprocessing at all T1 sites and few T2 sites

  • T0
    • Castor: Problems with access to disk pools via xrootd protocol (GGUS:74751)
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T1 sites:
    • Gridka: Observing problems with jobs with many input files when accessing storage via protocol, mostly user jobs affected by this.
    • SARA: (GGUS:74875) We have a problem with staging requests that are not responding for SARA-RAW.

3 October 2011 (Monday)

Experiment activities:

  • Experiment activities
    • Reconstruction and stripping at CERN only
    • Reprocessing at all T1 sites and few T2 sites

  • T0
    • Castor: Problems with access to disk pools via xrootd protocol (GGUS:74751), waiting for user reply
    • Nagios: No probes being sent to prod-lfc-lhcb.ro (GGUS:74775)

  • T1 sites:
    • IN2P3: Increased number of stalled jobs (GGUS:74733), upgrade of CREAM-CE has fixed the problem.
    • Gridka: Observing problems with jobs with many input files when accessing storage via protocol, mostly user jobs affected by this.
    • Gridka: missing results from nagios tests for CE, fixed yesterday around 8PM
    • SARA: (GGUS:74875) We have a problem with staging requests that are not responding for SARA-RAW.
  • T2 sites:

-- JoelClosier - 08-Nov-2011

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback