May 2011 Reports

To the main

31st May 2011 (Tuesday)

Experiment activities:

  • A lot of data. Processing and reprocessing are running.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 1
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • GRIDKA CREAMCE (GGUS:70835) (On Hold)
    • IN2P3 LFC RO Mirror (Waiting update LFC at CERN)
    • IN2P3 pilots aborted at cccreamceli02 GGUS:71077 (Fixed)
    • SARA

  • T2

30th May 2011 (Monday)

Experiment activities:

  • A lot of data. Processing and reprocessing are running. Certification of new Dirac version.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 1
  • T2: 0
Issues at the sites and services

  • T0
  • T1
  • T2

27th May 2011 (Friday)

Experiment activities:

  • Waiting for beam, data. Processing and reprocessing are running. Certification of new Dirac version.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 0
  • T2: 0
Issues at the sites and services

26th May 2011 (Thursday)

Experiment activities:

  • No beam - no data. Processing and reprocessing are running. Certification of new Dirac version. TransferAgent stuck again, but all files from backlog transfered now.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • GRIDKA CREAMCE (GGUS:70835)
    • IN2P3 LFC RO Mirror is down, as result most MC jobs from French sites failed to upload output data files.
  • T2

25th May 2011 (Wednesday)

Experiment activities:

  • Data Taking is active. Processing and reprocessing are running. Certification of new Dirac version. TransferAgent stuck this night, as result no data tranfer from pit to Castor (under investigation).

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 0
  • T2: 0
Issues at the sites and services

24th May 2011 (Tuesday)

Experiment activities:

  • Data Taking is active. Processing and reprocessing are running.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 1
  • T2: 0
Issues at the sites and services

23th May 2011 (Monday)

Experiment activities:

  • Data Taking is active. Processing and reprocessing are running.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 1
  • T2: 0
Issues at the sites and services

  • T0
  • T1
  • T2

20th May 2011 (Friday)

Experiment activities:

  • Data Taking is active again. Validation of a new reconstruction is OK.

New GGUS (or RT) tickets:

  • T0: 0
  • T1: 4
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • CERN : problem CE ce130.cern.ch (GGUS:70748) and ce203 and ce207 (GGUS:70730)
    • CNAF : gridmap file problem : Yesterday at 12:35 CEST the mapping of Ricardo suddenly changed at CNAF from pillhcb003 to pillhcb027. I don't know the reason, maybe the hard links in /etc/grid-security/gridmapdir went deleted (for all the experiments, not only LHCb). From that time, for all the running jobs the gram job state files of the jobs (owned by pillhcb003) could not be managed anymore by pillhcb027, and this caused the aborts.
For any new job there should be no problem. What remains now is to understand why this happened... but this is another story.

19th May 2011 (Thursday)

Experiment activities:

  • Data Taking is active again. Validation of a new reconstruction is OK. The LFC problem has been fixed by changing the timout and the time between two retries.

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA : Space token re-organistion done and in place. (problem during the night because they have been put in prod before we modify our Configuration Service.
  • T2

18th May 2011 (Wednesday)

Experiment activities:

  • Data Taking is active again. Validation of a new reconstruction ongoing. we have a major issue . 9k jobs are in status "cheking" / "inputdata resolution" and this number is increasing. We try to identify this problem but we have not yet found the cause of it.

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

17th May 2011 (Tuesday)

Experiment activities:

  • Data Taking is active again. Validation of a new reconstruction ongoing

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
  • T2

16th May 2011 (Monday)

Experiment activities:

  • Data Taking is active again. Validation of a new reconstruction

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
  • T2
    • SAM / SE tests failing for GridKa and SARA. Currently investigated by site admins.

13th May 2011 (Friday)

Experiment activities:

  • Technical stop : no data taking

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

12th May 2011 (Thursday)

Experiment activities:

  • Technical stop : no data taking

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • RAL : diskserver migration finished
    • CERN : diskserver migration on going
  • T2 *

11th May 2011 (Wednesday)

Experiment activities:

  • Technical stop : no data taking
  • GGUS ticket against GGUS because we were not able to submit TEAM ticket (GGUS:70459 fixed )

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • NIKHEF jobs kill by memory. Memory will increase to 5Gb
    • IN2P3 : Pilot jobs aborted during the nigth but now back to normal.
    • RAL : diskserver migration on going
    • CERN : diskserver migration on going
  • T2 *

10th May 2011 (Tuesday)

Experiment activities:

  • Technical stop : no data taking
  • MC productions on most T1/T2 sites

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
    • VOMS intervention
    • Replication stream : Oracle intervention
  • T1
    • NIKHEF jobs kill by memory. Could we increase the size limit to 5Gb ?.
    • SARA : SRM intervention
    • PIC : interventions in network equipment and firmware updates
  • T2

9th May 2011 (Monday)

Experiment activities:

  • Technical stop : no data taking
  • MC productions on most T1/T2 sites

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA problem with aborted pilots for reconstruction jobs (GGUS:70170).
  • T2

6th May 2011 (Friday)

Experiment activities:

  • RAW reconstruction of current data almost completed, Stripping/Merging jobs in progress.
  • Data removal / archiving postponed because of backlogs in data management processes.
  • MC productions on most T1/T2 sites

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA problem with aborted pilots for reconstruction jobs (GGUS:70170).
    • RAL jobs looping over 3 unavailable files (GGUS:70158).
    • RAL space token renaming has started yesterday.
  • T2

5th May 2011 (Thursday)

Experiment activities:

  • RAW data distribution and their FULL reconstruction is going on at most Tier-1s.
  • Cleaning of old data to be started (~ 1/2 PB)
  • A lot of MC continues to run.

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA problem with aborted pilots for reconstruction jobs (GGUS:70170).
    • RAL jobs looping over 3 unavailable files (GGUS:70158).
    • RAL has increased disk pools as reported low yesterday
    • Change of space token names has been done at Gridka
    • IN2P3 downtime tonight because of dCache problem
  • T2

4th May 2011 (Wednesday)

Experiment activities:

  • RAW data distribution and their FULL reconstruction is going on at most Tier-1s.
  • A lot of MC continues to run.

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA problem with aborted pilots for reconstruction jobs (GGUS:70170).
    • RAL jobs looping over 3 unavailable files (GGUS:70158).
  • T2

3rd May 2011 (Tuesday)

Experiment activities:

  • RAW data distribution and their FULL reconstruction is going on at most Tier-1s.
  • A lot of MC continues to run.

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA problem with aborted pilots for reconstruction jobs (GGUS:70170).
    • RAL jobs looping over 3 unavailable files (GGUS:70158).
    • PIC Sam SRM tests working again as of tonight. There was a problem due to changes in space token names.
    • CNAF The same change for STs happened. Sam SRM tests changed today.
  • T2

2nd May 2011 (Monday)

Experiment activities:

  • RAW data distribution and their FULL reconstruction is going on at most Tier-1s.
  • A lot of MC continues to run.

New GGUS (or RT) tickets:

  • T0: 1
  • T1: 0
  • T2: 0
Issues at the sites and services

  • T0
  • T1
    • SARA problem with aborted pilots for reco jobs (GGUS:70170). Problem fixed this morning.
    • RAL jobs looping over 3 unavailable files (GGUS:70158).
  • T2

-- RobertoSantinel - 02-Dec-2010

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2011-06-07 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback