TWiki> LCG Web>ServiceChallengeThreeIN2P3 (revision 18)EditAttachPDF

IN2P3 Progress Log

Thursday 28 july 2005

10:00am

pool_hpsswrite4 dedicated to Tier2 transfers. Transfers impossible with FTS, so done with "srmcp". Started 3 jobs of 100 files of 1 GB.

Wednesday 27 July 2005

pool_hpsswrite1 was OFFLINE a few times during the day. Probably happened when the machine was overloaded.

Tuesday 26 July 2005

7:00pm

Reduce "mover max active" from 10 to 8 because pools 1 and 2 seem to be overloaded.

2:45am

pool_hpsswrite1 OFFLINE. Restarted.

2:00pm

SRM server hanged: timeout on all admin commands. Restarted. Still OFFLINE for dCache, so restart all services.

12:30am

pool_hpsswrite1 OFFLINE. Restarted.

9:30am

pool_hpsswrite1 OFFLINE. Restarted.

Monday 25 July 2005

6:00pm

pool_hpsswrite1 OFFLINE. Restarted.

1:50pm

pool_hpsswrite2 OFFLINE. Restarted.

11:00am

One disk crashed at 4:00am on ccxfer03. Replaced.

Sunday 24 July 2005

2:00pm

Some transfers stuck, killed client processes and reduce "mover max active" to 10 on all dCache pools.

9:15am

CMS stopped PhEDEx transfers, so I started some more srmcp: now 15 concurrent files.

Saturday 23 July 2005

3:30pm

pool_hpsswrite2 OFFLINE again. Restarted.

12:40am

pool_hpsswrite2 on ccxfer02 lost connexion with dCache. Restarted.

8:30am

Run srmcp transfers from castorgridsc: 8 concurrent files on 10 streams.

Friday 22 July 2005

6:50pm

FTS transfers still impossible. Channel is close for the week-end.

11:30am

CMS successfully transferred all night at 25MB/s from castorgrid to tapes (via PhEDEx). 5 concurrent big files (2GB) on 10 streams.

Channel re-opened for tests but FTS transfers still fail.

Thursday 21 July 2005

2:30pm

Channel CERN-IN2P3 now closed until this problem is fixed (and possibly understood...)

1:00pm

After restarting all dCache services and the SRM database, all FTS transfers still fail with the "getRequestStatus timed out" error. "srmcp" transfers work fine. Investigating and waiting for dCache support help...

12:00am

SRM very slow lead to "getRequestStatus timed out" in FTS logs. So restarted SRM.

Wednesday 20 July 2005

4:00pm

Tried
  • Max number of active movers on each dCache pool = 25.
  • Number of files in the FTS Channel = 25
=> All transfers go to 1 single pool and the machine seems overloaded. Throughput falls to 20MB/s. Restored previous values.

10:00am

Tried to increase the throughput (current is ~40MB/s):
  • Increased the max number of active movers on each dCache pool from 8 to 20.
  • Increased the number of files in the FTS Channel from 10 to 18 (glite-transfer-channel-set -f 18 CERN-IN2P3)

Tuesday 19 July 2005

4:00pm

Disk replaced on ccxfer01

Monday 18 July 2005

11:30am

Stopped FTS transfers to do some internal tests with "srmcp"

Saturday 16 July 2005

7:00pm

One disk died on a ccxfer01. The spare one is used without any damages for the transfers.

-- Main.lschwarz - 18 Jul 2005

Edit | Attach | Watch | Print version | History: r22 | r20 < r19 < r18 < r17 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r18 - 2005-07-28 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback