January 2013 Reports

To the main

31st January 2013 (Thursday)

  • Nothing new to report

30th January 2013 (Wednesday)

  • Nothing new to report

29th January 2013 (Tuesday)

  • Activity is going as last week. NTR

28th January 2013 (Monday)

  • Activity is going as last week. Very intense activity: reprocessing, prompt-processing, MC and user jobs. We are hitting our limits in running jobs.
  • T0:
  • T1:

25th January 2013 (Friday)

  • Activity is going as yesterday. Very intense activity: reprocessing, prompt-processing, MC and user jobs. We are hitting our limits in running jobs.
  • T0:
    • LFC upgrade to EMI2 -> some issue related to Oracle version. Investigating
    • Transferring job output from the pit to EOS -> still investigating.
    • INC:226288 - After some tuning the throughput increased. Still to evaluate if it is enough.
  • T1:

24th January 2013 (Thursday)

  • Very intense activity: reprocessing, prompt-processing, MC and user jobs. We are hitting our limits in running jobs.

  • T0:
    • LFC upgrade to EMI2 ongoing
    • Problem in transferring job output from the pit to EOS. We are able to ping the EOS system but transfer fails.
    • INC:226288 - Started tests for data replication from Castor to EOS. The throughput is not as good as expected.
  • T1:
    • RAL: some jobs failing because not able to setup the environment (timeout of the script). The problem is not affecting some particular WNs. Not a major issue. Already in touch with the contact person. They are investigating.

23nd January 2013 (Wednesday)

  • 2011 data reprocessing started yesterday and extended today morning. First jobs already went through all the steps
  • Share of CPU resources changed:
    • 80% of reprocessing performed at T1s and the rest at CERN
    • Processing of current pA data taking shared equally between GRIDKA and CERN

  • T0:
    • LFC upgrade to EMI2 tomorrow
    • GGUS:90752 - Jobs failing because the script used to setup the environment are failing. It seems a timeout problem connected with CVMFS. We tried to login on the affected WNs but we did not managed.

  • T1:

22nd January 2013 (Tuesday)

  • Mostly Simulation jobs on all Tier levels
  • 2011 data reprocessing to be started today/tomorrow

  • T0: CERN : LFC upgrade on Thursday

  • T1:

21st January 2013 (Monday)

  • Mostly Simulation jobs on all Tier levels
  • 2011 data reprocessing to be started today/tomorrow

  • T0:
    • VOMS failed on the week-end, many grid services affected, (no GGUS b/c need certificate for ticket submission), many thanks for prompt resolution by IT/PES after calling operator who sent SMS to support team.
    • After change of the GRIDKA SRM endpoint also monitoring (SLS,SUM) have been updated accordingly

  • T1:
    • GRIDKA: transfers to be watched, as there is very little data currently transferred also no errors. pA data will be replicated to GRIDKA this week which will be a good test for FTS.

18th January 2013 (Friday)

  • Mostly Simulation jobs on all Tier levels
  • This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:

  • T1:
    • GRIDKA: After few fixes LHCb started transfers. The error level is less we had before, and we have no SRM errors anymore

--++ 17th January 2013 (Thursday)

  • Mostly Simulation jobs on all Tier levels
  • This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:

  • T1:
    • GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425, GGUS:88906): Downtime for Separation of LHCb from gridka-dcache.fzk.de SE, hopely will solve this problem. Downtime finished, we are trying to use new SRM endpoint.

16th January 2013 (Wednesday)

  • Mostly Simulation jobs on all Tier levels
  • This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:
    • CASTOR problem: Possible unavailability detected in c2lhcb/lhcbdisk (Wed Jan 16 08:50:58 2013) 2 nodes seem to be unavailable
  • T1:
    • GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425, GGUS:88906): Downtime for Separation of LHCb from gridka-dcache.fzk.de SE, hopely will solve this problem

15th January 2013 (Tuesday)

  • Mostly Simulation jobs on all Tier levels
  • This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:
    • VOMS not reachable from outside CERN (GGUS:90295): Solved
  • T1:
    • GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425, GGUS:88906): Downtime for Separation of LHCb from gridka-dcache.fzk.de SE, hopely will solve this problem

14th January 2013 (Monday)

  • Mostly Simulation jobs on all Tier levels
  • This week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:
    • VOMS not reachable from outside CERN (GGUS:90295)
  • T1:

11th January 2013 (Friday)

  • Mostly Simulation jobs on all Tier levels
  • Next week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s (validation on the week-end)

  • T0:
    • VOMS not reachable from outside CERN (GGUS:90295)
  • T1:
    • CNAF: 11 RAW files missing from storage, currently being investigated how this happened.
    • GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425, GGUS:88906)

10th January 2013 (Thursday)

  • Mostly Simulation jobs on all Tier levels
  • Next week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:
  • T1:
    • CNAF: DT is over for LHCb, all SEs unbanned
    • GRIDKA: timeouts of transfers to Gridka Disk storage under investigation (GGUS:88425)

9th January 2013 (Wednesday)

  • Mostly Simulation jobs on all Tier levels
  • Next week restart prompt processing at CERN and 2011 data reprocessing at T1 sites + attached T2s

  • T0:
  • T1:

8th January 2013 (Tuesday)

  • Mostly Simulation jobs on all Tier levels

  • T0:
  • T1:

7th January 2013 (Monday)

  • Mostly Simulation jobs on all Tier levels

  • T0:
    • SAM jobs to all sites failing because of certificate problem, solved since lunch time
  • T1:


This topic: LHCb > WebHome > LHCbComputing > ProductionOperations > ProductionOperationsWLCG2013Reports > ProductionOperationsWLCGJan13Reports
Topic revision: r1 - 2013-03-04 - JoelClosier
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback