DC06 Activities

DC06 Aims

Challenge (using the LCG production services):
  • Distribution of RAW data from CERN to Tier-1's
  • Reconstruction/stripping at Tier-1's including CERN
  • DST distribution to CERN & other Tier-1's

Problem in last 24 hours

  1. Jobs didn't run at FZK

Outstanding Problems

  1. T0 storage system problem. Problem uploading files to T0
    1. What did change since last Sunday in the CERN Storage System? The transfer efficiency started degrading gradually until the current situation where none of the transfers to castorgrid succeeds. Why is there no way to upload files through gridftp to CERN? Why going through castorgrid all transfers time out?
    2. All transfer to are hanging: overload of the scheduler (or stager)?
  2. lcg-utils: improvement/development.
    1. Need a lcg-util for uploading files to grid without registering it. (lcg-cp that does accept SURL as destination or a lcg-cr that doesn't register).
    2. lcg-gt should be able to analyze a list of protocols rather than invoking lcg-gt as many times as there are protocols to be checked
    3. lcg-cr should be able to register a file in the catalog with an additional option that allows to specify the host field.
    4. lcg-cr should delete the physical replica when the registration on the file catalog doesn't succeed. (for consistency of the file catalog). It should also delete the temporary entry in the FC when the transfer fails or the replica already exists.
    5. lcg-gt should return TURL that are compatible with ROOT application. Using the gfal_plugin there wouldn't be any problem, but the current incompatibility should be fixed.
    6. The deployed dCache client in gLite 3.0.0 has a bug in libgsiTunnel that prevents using gsidcap (only protocol available at some sites). This implies a LCG AA release with the current library.
    7. getbestfile method in GFAL flawed logic when checkling on domain names (example is GRIDKA where SE and WNs sit on different domain) so doesn't match. Going through the .BrokerInfo and then through the IS would be the solution. Reported and to be further discussed with developers.
    8. overall requests for making the deployment of lcg-utils much more light with officially maintained tarball distribution that could be shipped with LHCb jobs. Discussion with developers.
  3. Endpoints name convention:
    1. The major issue here is that VO namespace should be sacrosanct. Jeff's proposal (site-related-path/VO-path) presented at last ops meeting is reasonable but should be enforced
    2. srm host-name should be generic enough that it doesn't change with time unless forced by special events (e.g. not be the machine cryptic name, not contain the local storage technology). Prefer srm. to dcache05srm. or castorsrm.SURLs are registered in the FC that necessitate lengthy updates when endpoint names are changed.

Ongoing Tests

  1. Check of different physics input file to recons code (Angelo)
  2. Replication of fake LFN (Andrew)
  3. New DIRAC release (Andrei)

Resource Issues

  1. NIKHEF situation: The center has been downgraded to T2 role because it's currently impossible accessing files stored in the WAN connected Storage at SARA from WN via dcache. Await patched version of the dCache client is gonna be released for test. This version wouldn't require Inbound connectivity on the WN because it wouldn't require calls client back. Until further news, NIKHEF sits out DC06 activity.
  2. IN2P3 does use gsidcap for the tape storage. This also prevents LHCb using Lyon as expected since the production version of ROOT was not able to use gsidcap until lnext release of AA. The disk endpoint of IN2P3 will be used instead.
  3. All trasfer from to all T1 (except from RAL) fail with message "No site found for host" which means the service.xml file must be manually updated to take into account about this host. Site admins are all aware of this request since a week.


Previous Problems

