January 2013 Reports
To the main
31st January 2013 (Thursday)
30th January 2013 (Wednesday)
29th January 2013 (Tuesday)
- Activity is going as last week. NTR
28th January 2013 (Monday)
- Activity is going as last week. Very intense activity: reprocessing, prompt-processing, MC and user jobs. We are hitting our limits in running jobs.
- T0:
- T1:
25th January 2013 (Friday)
- Activity is going as yesterday. Very intense activity: reprocessing, prompt-processing, MC and user jobs. We are hitting our limits in running jobs.
- T0:
- LFC upgrade to EMI2 -> some issue related to Oracle version. Investigating
- Transferring job output from the pit to EOS -> still investigating.
- INC:226288
- After some tuning the throughput increased. Still to evaluate if it is enough.
- T1:
24th January 2013 (Thursday)
- Very intense activity: reprocessing, prompt-processing, MC and user jobs. We are hitting our limits in running jobs.
- T0:
- LFC upgrade to EMI2 ongoing
- Problem in transferring job output from the pit to EOS. We are able to ping the EOS system but transfer fails.
- INC:226288
- Started tests for data replication from Castor to EOS. The throughput is not as good as expected.
- T1:
- RAL: some jobs failing because not able to setup the environment (timeout of the script). The problem is not affecting some particular WNs. Not a major issue. Already in touch with the contact person. They are investigating.
23nd January 2013 (Wednesday)
- 2011 data reprocessing started yesterday and extended today morning. First jobs already went through all the steps
- Share of CPU resources changed:
- 80% of reprocessing performed at T1s and the rest at CERN
- Processing of current pA data taking shared equally between GRIDKA and CERN
- T0:
- LFC upgrade to EMI2 tomorrow
- GGUS:90752
- Jobs failing because the script used to setup the environment are failing. It seems a timeout problem connected with CVMFS. We tried to login on the affected WNs but we did not managed.
22nd January 2013 (Tuesday)
- Mostly Simulation jobs on all Tier levels
- 2011 data reprocessing to be started today/tomorrow
- T0: CERN : LFC upgrade on Thursday
21st January 2013 (Monday)
- Mostly Simulation jobs on all Tier levels
- 2011 data reprocessing to be started today/tomorrow
- T0:
- VOMS failed on the week-end, many grid services affected, (no GGUS b/c need certificate for ticket submission), many thanks for prompt resolution by IT/PES after calling operator who sent SMS to support team.
- After change of the GRIDKA SRM endpoint also monitoring (SLS,SUM) have been updated accordingly
- T1:
- GRIDKA: transfers to be watched, as there is very little data currently transferred also no errors. pA data will be replicated to GRIDKA this week which will be a good test for FTS.
18th January 2013 (Friday)
- Mostly Simulation jobs on all Tier levels
- This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
- T1:
- GRIDKA: After few fixes LHCb started transfers. The error level is less we had before, and we have no SRM errors anymore
--++ 17th January 2013 (Thursday)
- Mostly Simulation jobs on all Tier levels
- This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
- T1:
- GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425
, GGUS:88906
): Downtime for Separation of LHCb from gridka-dcache.fzk.de SE, hopely will solve this problem. Downtime finished, we are trying to use new SRM endpoint.
16th January 2013 (Wednesday)
- Mostly Simulation jobs on all Tier levels
- This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
- T0:
- CASTOR problem: Possible unavailability detected in c2lhcb/lhcbdisk (Wed Jan 16 08:50:58 2013) 2 nodes seem to be unavailable
- T1:
- GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425
, GGUS:88906
): Downtime for Separation of LHCb from gridka-dcache.fzk.de SE, hopely will solve this problem
15th January 2013 (Tuesday)
- Mostly Simulation jobs on all Tier levels
- This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
- T0:
- VOMS not reachable from outside CERN (GGUS:90295
): Solved
- T1:
- GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425
, GGUS:88906
): Downtime for Separation of LHCb from gridka-dcache.fzk.de SE, hopely will solve this problem
14th January 2013 (Monday)
- Mostly Simulation jobs on all Tier levels
- This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
11th January 2013 (Friday)
- Mostly Simulation jobs on all Tier levels
- Next week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s (validation on the week-end)
- T0:
- T1:
- CNAF: 11 RAW files missing from storage, currently being investigated how this happened.
- GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425
, GGUS:88906
)
10th January 2013 (Thursday)
- Mostly Simulation jobs on all Tier levels
- Next week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
- T0:
- T1:
- CNAF: DT is over for LHCb, all SEs unbanned
- GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425
)
9th January 2013 (Wednesday)
- Mostly Simulation jobs on all Tier levels
- Next week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s
8th January 2013 (Tuesday)
- Mostly Simulation jobs on all Tier levels
7th January 2013 (Monday)
- Mostly Simulation jobs on all Tier levels
- T0:
- SAM jobs to all sites failing because of certificate problem, solved since lunch time
- T1: