November 2011 Reports

To the main

25th November 2011 (Friday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Low activity. Integration of a new framework in the production system
  • T0
    • LFC SLS alarm : some queries are taking longer than expected. (perhaps a problem with Oracle RAC) under investigation with DB group
    • 14 concurrent instances in Lemon while in reality there are 1 or 2. IT is looking at this issue as well
    • concurrent query limit was set to 50 in the OLD setup and it should be a higher limit today with new hardware
  • T2

24th November 2011 (Thursday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Low activity. Integration of a new framework in the production system
  • T1
    • CERN : alarm about high READ for LFC. After investigation we do not understand why the plots are not reflecting the real activity..
  • T2

23th November 2011 (Wednesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Low activity. Integration of a new framework in the production system
  • T1
    • CERN : alarm about high READ for LFC. Can someone give us an explanation because we have no clue what is the issue.
    • SARA : cvmfs is running since one day without any issue
    • CNAF : cvmfs is running since one day with one issue which has been fixed
  • T2

22th November 2011 (Tuesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Next round of reprocessing starting to tail off. Backlog primarily now at IN2P3 which has most of the last round of data.
    • Still expect to launch next round of MC simulations by end of November.

  • T1 *
  • T2

21st November 2011 (Monday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Next round of reprocessing starting to tail off. Backlog primarily now at IN2P3 which has most of the last round of data.
    • Still expect to launch next round of MC simulations by end of November.
    • Waiting for new templates from WLCG to give updated definitions of service criticality for LHCb.

  • T1
    • SARA : SE problem (GGUS:76629) fixed but is it normal that it took 8 hours for an alarm ticket to be worked out... (to be an illustration of the discussion on the last T1 coordination meeting with the difference between "response time" In this case 7 minutes (good) and "max downtime" in this case 13 hours (not good at all))
    • dCache : (GGUS:76561) opened for assistance migrating data from one space token to another.

  • T2
    • Shared software area problems at Auver(GGUS:76586)

18th November 2011 (Friday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Next round of reprocessing starting to tail off. Backlog primarily now at IN2P3 which has most of the last round of data.
    • Still expect to launch next round of MC simulations by end of November.
    • Waiting for new templates from WLCG to give updated definitions of service criticality for LHCb.

  • T1
    • CNAF : Job failures probably related to power cut there.
    • SARA : Early signs of job failures to resolve turls on WNs. Waiting to see how the situation develops.
    • dCache : GGUS ticket (76561) opened for assistance migrating data from one space token to another.

  • T2
    • Shared software area problems at WCSS64(Poland:76569) and Auver(France:76586)

17th November 2011 (Thursday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Next round of reprocessing productions going on.
    • Expect to launch next round of MC simulations by end of November.

  • T0
    • 10 disk server added to lhcb-tape

16th November 2011 (Wednesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Next round of reprocessing productions going on.
    • Expect to launch next round of MC simulations by end of November.

  • T0
    • Hope to have increase in spindles on d0t1 storage by end of today to improve processing efficiency.

  • T1
    • IN2P3: (GGUS:76515) Old reprocessing backlog down to ~200 jobs now. Problems with access to conditions DB at IN2P3 - GGUS ticket being submitted.
    • PIC : Many user jobs failing to resolve input data indicating possible srm problem. Failure rate ~25% at present. LHCb PIC contact looking at it. Also ongoing investigations of very high memory consumption by user jobs.

15th November 2011 (Tuesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Next round of reprocessing productions have just been launched.

  • T1
    • IN2P3: Slow decrease of the reprocessing backlog (not yet finished).
    • PIC : Many user jobs failing to resolve input data indicating possible srm problem. Failure rate ~25% at present. LHCb PIC contact looking at it

14th November 2011 (Monday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Finishing up on reprocessing last but one range of data over the week-end
    • Hopefully tomorrow the last reprocessing productions will be launched (new ConDB tag, distributed to the Tier-1s).

  • T0

  • T1
    • IN2P3: Significant improvement after reducing stripping load there. Hope the reprocessing backlog is finished by tomorrow morning.
    • IN2P3 : (GGUS:75158) : migration of files to the correct space token - still ongoing.
    • RAL : (GGUS:76295) : Pilots aborted at RAL across all CEs.
    • PIC : Many user jobs failing to resolve input data indicating possible srm problem - waiting to understand in more detail before following up with site.

11th November 2011 (Friday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Finishing up on reprocessing last but one range of data over the week-end
    • Beginning of next week the last reprocessing productions will be launched

  • T0

  • T1
    • IN2P3: (GGUS:76248) check possibility of disabling pool to pool replication to increase throughput
    • IN2P3 : (GGUS:75158) : migration of files to the correct space token.
    • GRIDKA : (GGUS:75851) : problem of benchmark on some nodes

10th November 2011 (Thursday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Reprocessing is progressing well
    • Stripping ongoing

  • T0

  • T1
    • IN2P3 : (GGUS:75158) : migration of files to the correct space token.
    • GRIDKA : (GGUS:75851) : problem of benchmark on some nodes

9th November 2011 (Wednesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Reprocessing is progressing well
    • Stripping ongoing

  • T0

  • T1

8th November 2011 (Tuesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Reprocessing is now being ramped up
    • Stripping is continuing at CERN:

7th November 2011 (Monday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Reprocessing is now being ramped up
    • Stripping is continuing at CERN:

  • T0

  • T1
    • PIC : (GGUS:76028): FTS transfer problem. Fixed
    • PIC : problem of space on pool behind LHCb-tape space token.

4th November 2011 (Friday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Started tests for new reprocessing with 372 files last night
    • Mostly OK, but some failures at IN2P3 due to Conditions DB not being installed on AFS and not all the cluster being on CVMFS
    • Reprocessing is now being ramped up
    • Stripping is continuing at CERN: ~15000 to go at a rate of 3.5K per day

  • T0
    • There still seems to be FTS problems at CERN. The last few days there have been no successful transfers to/from CERN. Could be related to an authorisation error we're seeing - see ticket: https://ggus.eu/ws/ticket_info.php?ticket=75936. (Note that I reported this incorrectly yesterday as I didn't realise the plots I was looking didn't show failed attempts, only failed transfers!). Problem is fixed but can we have an incident report, please ?

3rd November 2011 (Thursday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Almost all reconstruction and reprocessing jobs now complete
    • We're trying to push the stripping through (large backlog at CERN)
    • New conditions DB tests running today so will hopefully start the next round of reprocessing tonight or tomorrow morning
    • CERN will be kept out of this to try to get the stripping complete

2nd November 2011 (Wednesday)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Prompt reconstruction and stripping at CERN and Tier-1 sites.
    • Last few calibration jobs going through today
    • Last few hundred jobs of 1st reprocessing running now
    • Looking to start 2nd stage reprocessing on Friday (maybe Monday)

  • T1 sites:
    • Update on site inconsistencies: IN2P3 seems to have the only significant problem. Other T1s seem OK. See https://ggus.eu/ws/ticket_info.php?ticket=75158
    • We are exceeding TAPE pledges at some sites and so, as a short-medium term solution we are going to ban ARCHIVE at sites where this is a problem.
    • We are still having occasional issues with respect to staging and job failures at various sites with input data resolution which we're looking into.

1st November 2011 (Tuesday)

(Note: Reported by new GEOC Mark Slater)

Experiment activities:

New GGUS (or RT) tickets

  • Experiment activities
    • Prompt reconstruction and stripping at CERN and Tier-1 sites.
    • Calibration pushed through at CERN over the weekend
    • 1st round of reprocessing at T1 sites and T2 sites almost over (some tail going on - especially at GridKa)
    • Planned reprocessing at the end of this week when calibrations are complete.

-- JoelClosier - 19-Jan-2012

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-01-19 - JoelClosier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback