29th October (Thursday)

  • Data Processing:
    • Data processing of pp data at T0/1/2 sites. Some T2 attached to T1 in order to speed up the processing.
    • Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Processing pAr data

  • T0
    • Problem with Cream CEs starting late yesterday. (GGUS:117263)
    • Problem with FTS. (GGUS:117206) - solved yesterday, but GGUS ticket not updated.

  • T1
    • Low level of upload failures at RAL - being followed up with the site.
    • SARA srm now down - GGUS:116939
    • RRCKI problems with tape system - GGUS:117267 . Seems to be recurrent at the site.

  • AOB
    • Tickets especially at CERN not being explained / closed.

26th October (Monday)

  • Data Processing:
    • Data processing of pp data at T0/1/2 sites. Some T2 attached to T1 in order to speed up the processing.
    • Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Stable number of running jobs for processing of data at T0/1.
    • Data from pHe fully processed.

  • T0
    • Transfer to EOS stable now.

  • T1
    • Low level of upload failures at RAL - being followed up with the site. Also one CE down at RAL (GGUS:117171) due to hypervisor problems.
    • SARA srm seems under load, possibly related to GGUS:116939 . Wait and see.

  • AOB
    • Various dB interventions at CERN announced to specific individuals only. Would be useful if they were sent to a mailing list (lhcb-geoc) or announced properly as a WLCG service.

22nd October (Thursday)

  • Data Processing:
    • Data processing of pp data at T0/1/2 sites. Some T2 attached to T1 in order to speed up the processing.
    • Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Stable number of running jobs for processing of data at T0/1.
    • Data from pHe fully processed.

  • T0
    • Transfer to EOS is now OK. No more failures observed since Monday.
    • Problems related to the transfer of large number of small files from pit has been solved putting in place a merging procedure before transfer. Working since yesterday and no obvious problem observed so far.
    • A ticket regarding connections to fts407 hanging has been submitted just after lunch (GGUS:117128).

  • T1
    • Few raw files lost at RAL due to a diskserver went down during the weekend. Files are being re-replicated.

19th October (Monday)

  • Data Processing:
    • Data processing of pp data at T0/1/ sites, Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Stable number of running jobs for processing of data at T0/1
    • Data from pHe fully processed.

  • T0
    • Ticket about FTS409 stalled: solved reporting some log (GGUS:116965)
    • Ticket about aborted SRM at CERN (GGUS:116897) and (GGUS:116877) should be closed. Wait for some more data transfer from the pit before confirming the problem is solved.
    • Alarm ticket GGUS:116973 was opened on Sun because of a problem with EOS

  • T1
    • SARA storage is in scheduled downtime

  • T2
    • FTS transfer failing at UKI-SOUTHGRID-RALPP (GGUS:116995). Investigation ongoing.

15th October (Thursday)

  • Data Processing:
    • Data processing of pp data at T0/1/ sites, Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Data processing ramp up at T0/1

  • T0
    • Ticket about aborted SRM at CERN: can be closed for us, but left open presumably for further investigations (GGUS: 116897)
    • Ticket about aborted SRM at CERN: LHCb EOS down - closed (GGUS: 116877)

  • T1
    • Ticket about aborted transfer failures to SARA (GGUS: 116939)

12th October (Monday)

  • Data Processing:
    • Data processing of pp data at T0/1/ sites, Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Data processing stopped for Full Stream and Turbo Calibration due to conditions distribution problem - ready to restart

  • DTs:
    • OUTAGE DTs for CERN and RAL tomorrow, overlapping: can this be avoided?

8th October (Thursday)

  • Data Processing:
    • Data processing of pp data at T0/1/ sites, Monte Carlo mostly at T2, user analysis at T0/1/2D sites
    • Data processing stopped for Full Stream and Turbo Calibration due to conditions distribution problem

  • T0
    • Ticket about aborted pilots at CERN after BLAH related timeouts (GGUS: 116795)

  • T1
    • Recovered from RAL storage DB downtime on Tuesday (thank you!)
    • SARA ticket about aborted pilots due to PBS authorization (GGUS: 116797)

5th October (Monday)

  • Data Processing: * Data processing of pp data at T0/1/ sites, Monte Carlo mostly at T2, user analysis at T0/1/2D sites

A central LHCb cvmfs problem has been tracked down and fixed. This affected some jobs at all sites as new software and conditions versions in October could not be pushed out.

  • T0
    • Failed EOS (GGUS:116608) tranfers appear to be due to wider cvmfs problems (CRLs?)

  • T1
    • RAL DB downtime tomorrow.

1st October (Thursday)

  • Data Processing:
    • Data processing of pp data at T0/1 sites, monte carlo mostly at T2, user analysis at T0/1/2D sites

  • T0
    • Failed transfers to CERN-RAW, re-surfaced on Sunday (GGUS:116321)
    • Observed that pilots seem to get stuck in the CREAM-CEs of LSF and not submitted to batch (GGUS:116473), back to normal, ticket closed

  • T1
    • CNAF: scheduled downtime for storage Mo/Tue
    • GRIDKA: scheduled site outage Tue - Thu (draining by LHCb Mo evening)
    • PIC: scheduled batch & storage downtime Wed (draining started by site 24 h before DT)

-- JoelClosier - 2016-01-07

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2016-01-07 - JoelClosier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback