Difference: ProductionOperationsWLCGJune13Reports (1 vs. 2)

Revision 22013-09-02 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="ProductionOperationsWLCG2013Reports"

June 2013 Reports

To the main
Added:
>
>

27th June 2013 (Thursday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1:
    • GridKa: Ticket (GGUS:95135) for slow staging rate. It is the only site lagging behind with the restripping, with rates of 5TB/day since the last weekend. Before that, we had a good staging rate of ~250MB/sec for a couple of days.
    • SARA: No more jobs failing with "segmentation violation". (GGUS:95056)
    • RAL: CVMFS problem (Stratum 1 is down ATM), site contacts are aware and working on a solution. No GGUS ticket submitted yet. MC jobs affected.
    • CNAF: Ticket (GGUS:95059) closed.
  • T2:
    • IN2P3-CPPM (GGUS:94890) recurring problem of aborted pilots. blparser stuck/restarted almost every day

24th June 2013 (Monday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1:
    • GridKa :Server for the major staging pool is out of operation, so we're staging at a reasonable rate, but 30% of jobs fail to access data from the pool. (Input Data Resolution)
    • SARA: Jobs fail with "segmentation violation". (GGUS:95056). All jobs failed at worker node v33-39.gina.sara.nl.
    • CERN: SetupProject.sh timeouts around midnight, now seems to have recovered.
    • CNAF: Pilots failing at CNAF (GGUS:95059), resolved quickly. (Service that sometimes gets stuck on some WNs)

20th June 2013 (Thursday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1:
    • GridKa : Pilots going through GridKa brokers fail due to sandbox being lost (GGUS:94462)
    • GridKa : Jobs at GridKa failing due to local directory problems (GGUS:94471). This affects many jobs at GridKa, and merging jobs with large numbers of input files have very low success rate.
    • RAL : Problem starting jobs. Seems to have gone away for now (starting this morning). Internal tickets opened.
  • Other: Web Server overloading significantly alleviated with 3rd web frontend (GGUS:94824). Affects jobs at many sites, some more than others.

17th June 2013 (Monday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1: *GridKa : IDLE pliots preventing new jobs starting (GGUS:94898). Ticket opened yesterday and escalated to ALARM this morning. Solved now - possibly due to a black hole node. *GridKa : Pilots going through GridKa brokers fail due to sandbox being lost (GGUS:94462) *GridKa : Jobs at GridKa failing due to local directory problems (GGUS:94471). This affects many jobs at GridKa, and merging jobs with large numbers of input files have very low success rate.
*Other
Web Server overloading continues (GGUS:94824) - possible problem now due to the afs server underneath. Under investigation and affects jobs at many sites, some more than others.

14th June 2013 (Thursday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1:
    *IN2P3
    Alarm ticket (GGUS:94810) "SE return wrong tURL". Problem was fixed very quickly
*Other
Web Server overloaded (GGUS:94824), under investigation.

10th June 2013 (Monday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1:
    *GRIDKA
    Problem with staging during weekend; Solved

6th June 2013 (Thursday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1:

3th June 2013 (Monday)

  • Incremental stripping campaign in progress and MC productions ongoing
  • T0:
  • T1/T2: many sites have queues with "999,999,999" MaxCPU time

Revision 12013-03-04 - JoelClosier

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ProductionOperationsWLCG2013Reports"

June 2013 Reports

To the main
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback