January 2014 Reports
To the main
30 Jan 2014 (Thursday)
- Mostly simulation and user jobs. Smooth running over most of the grid.
- T0: NTR
- T1: Brief problem at SARA on 28th when two rogue worker nodes caused a lot of jobs to fail (GGUS:100576
, GGUS:100577
). Fixed quickly.
- T2: Failed pilots at ARAGRID-CIENCIAS (Spain - GGUS:100625
).
27 Jan 2014 (Monday)
- Mostly simulation and user jobs. Smooth running over most of the grid.
- T0: NTR
- T1: Brief scheduled downtime of IN2P3 for "node reconfiguration"
- T2: Downtime of CBPF (Brazil) due to powercut. Admins still trying to bring up services there.
23 Jan 2014 (Thursday)
- Mainly MC, few users jobs.
- T0: Yesterday, all jobs failed at CERN ONLINE: "Failed to upload output data".
- T1: NTR
20 Jan 2014 (Monday)
- Mainly MC jobs (less than 3% with erros), few users.
- T0: Ticket opened on Saturday (https://ggus.eu/ws/ticket_info.php?ticket=100368
) concerning MC erros: ":[Errno 28] No space left on device" Problem has been identified and fix is ongoing.
- T1: NTR
- T2:
- CBPF(T2-D) SEs banned. FTS3 was stuck due to this site SE issues.
16 Jan 2014 (Thursday)
- Only MC and user jobs
- T0: NTR
- T1: NTR
- T2:
- problems of transfers to CBPF (FTS3) partially understood (gridftp not returning performance marked -> disabled), but still transfers >~ 3650 seconds failing, although timeout is 7200 s)
13 Jan 2014 (Monday)
- Heavy ion reprocessing completed but 2 files (too complex events, timing out)
- MC and user jobs only: no tape recalls required, only disk access for user jobs and upload from MC jobs and MC-merging jobs
- T0:
- T1:
- Problems of SRM instability at IN2P3. Downtime over, but still SE under scrutiny
- T2:
- Problems with CBPF storage: limited number of concurrent transfers to 4
9 Jan 2014 (Thursday)
- Main activities are Monte Carlo and User jobs
- T0:
- Move to new SRM (SHA2 enabled) was not successful and switched back to the previous version.
- T1:
6 Jan 2014 (Monday)
- reprocessing of ProtonIon collisions almost finished (GRIDKA & CERN)
- At other sites main activities are simulation & user jobs
- T0:
- T1:
- GRIDKA: problems with staging of files, issue resolved after vendor intervention (GGUS:99972
)
--
JoelClosier - 31 Mar 2014
This topic: LHCb
> ProductionOperations >
ProductionOperationsWLCG2014Reports > ProductionOperationsWLCGJan14Reports
Topic revision: r1 - 2014-03-31 - JoelClosier