Minutes 28 June 07

  • Phone:
    • FNAL: Eric
    • CNAF: Barbara
    • NDGF: Olli
    • PIC: Luis
    • GRIDKA: Andreas and Silke
    • LHCb: Marco
    • TRIUMF: Denice
    • MICHIGAN:
    • ATLAS: Sasha

  • CERN:
    • 3D: Dirk, Eva, Miguel
    • ATLAS: Gancho
    • FTS: Gavin

  • Apologies: Alexander, Gordon (FTS feedback sent by email), Carlos (storage intervention for July being planned – expected transparent)

  • Sites status:
    • Tier0 - CERN:
      • Recovery exercise completed successfully.
      • Thursday: PIC synchronization failed because the export was not consistent. It was necessary to repeat the export and the import.
      • Thursday evening, around 19:00, apply process aborted at IN2P3: no data found (problem being investigating by Oracle support). Import required.
        • The data import finished during the weekend.
        • Split capture process in two: original and one new capture process for PIC and IN2P3 resynchronization. Original capture was almost 1 day behind. New capture was enabled on Monday morning: 3 days of backlog to be synchronized.
      • Saturday morning: apply process got stuck at CNAF. Apply process was blocked by another process (also under investigation by Oracle support). The workaround is to bounce the instance. CNAF bounced the instance on Monday afternoon. Capture was 3 days behind.
      • Tuesday evening: network problems at Triumf, database unreachable.
        • Split capture process again: original capture, capture process for PIC and IN2P3 and capture process for Triumf (Wednesday morning).
      • Intervention at GridKA: 27th and 28th, all services affected.
        • Split capture process: original capture, capture process for PIC and IN2P3, capture process for Triumf, and capture process for GridKA.
      • During the splits, the original capture must be recreated and restarted, this adds 2-3 hours of backlog for each intervention.
      • Only 3 capture processes working at the same time, but CPU consumption around 99%. It will not be feasible to start the 4th capture (2 captures => CPU consumption around 85% - 90%)
      • PIC seems to have problems to maintain the rate, consequently the synchronization is being very slow (still 4 days behind). And it is impacting IN2P3 synchronization because both share the same capture.
    • PIC: Luis will take a look on the apply process. Actual storage (8 disk arrays) is not performing well.
    • CNAF: no planned interventions. Next week scalability tests on the ATLAS replica, high load will be expected. Richard is going to run the jobs.
    • GridKA: no additional planned interventions
    • Triumf: last night network came back

  • FTS:
    • Presentation by Gavin available on the agenda
    • There are not plans to deployed FTS on the Tier2 sites.
    • Tier1 sites are responsible for file transfers for tier1 sites and from tier1 sites to tier2 sites.

  • Experiments status:
    • ATLAS:
      • Richard has stopped the jobs today.
      • Next week they will move real data from ATLAS online.
      • Gancho will estimate the amount of data to be shipped.
    • LHCb:
      • Nothing new.
      • Marco will restart jobs tomorrow, after GridKa interventions.

  • CMS: 4 squid servers, GB ethernet, rate: 400 MB/sc


This topic: PSSGroup > PhysicsDatabasesSection > LCG3DWiki > MeetinsAndMinutes > MinuteS28June07
Topic revision: r1 - 2007-07-11 - EvaDafonte
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback