Week of 060814

Open Actions from last week: Harry to organise conference phone

Chair: Harry

Gmod: Judit Novak

Smod: Thorsten


Log: Nothing

New Actions: Understand why no gridview data overnight when lemon shows export continuing. Also the site distribution seems strange - Budapest, Nicpb and Citcms.




Discussion: gridview team reported that the Gridview archiver died and could not restart as it could not delete the old PID file. Also they use the GOCDB sites database. Nicpb is in Estonia and Citcms in Caltech. Both are CMS sites. CMS report they are starting the transfer back to CERN of 30 TB of MC events and ask us to keep the *-CERN FTS channel going.




Discussion: Many no contact alarms. Database services affected.



Actions: Create new FTS channels for top CMS T2 sites (list coming from M.Ernst). Tune gridftp for transfers to CERN from an Italian (Bari ?) CMS T2 site and compare with untuned site.

Discussion: No contact alarms were due to primary DNS failure (but machine still pingable). There is now a fix for vomrs which sees the Oracle 10gR2 cursor bug. Will try to look for corruption before restarting service. M.Ernst reported problems with T2 MC FTS file transfers to CERN. Individual failures clog up the *-CERN channel and they also see frequent poor gridftp performance on individual transfers. See actions.


Log: It was confirmed that LFC suffered the same data corruption as VOMRS for the same Oracle bug. Nilo reported that setting cache size to zero was not a guarranteed workaround but that using fully qualified names was.

Actions: James tuned the Legnaro link using Bari as a control and improved peak performance from 4 MB/s to between 15 and 50 MB/s. He will organise new tcpip defaults for the next Yaim release. However, up to 30% of transfers run at KB/s. He found he could transfer the same such file to a CERN DPM at normal speeds and the CASTOR c2sc4 disk server being used had many open sockets and a 4-5 MB/s background network traffic. To be followed by Olof's team. Sophie is looking for incorrect entries in the CERN LFC's so we can inform the LHC experiments later today.

Discussion: M.Ernst had sent Gavin a list of 7 sites to have separate FTS channels to CERN and these had already been deployed and started though there is a large backlog in the *-CERN channel. CMS will monitor if this setup will meet their current need to transfer MC data back to CERN. Judit (gmod) reported that VOMRS services had restarted. Miguel announced a test instance on lxplus/batch of a new castor client.

