TWiki> LHCb Web>LHCbComputing>DC06Activity (revision 9)EditAttachPDF

DC06 Activities

DC06 Aims

Challenge (using the LCG production services):
  • Distribution of RAW data from CERN to Tier-1's
  • Reconstruction/stripping at Tier-1's including CERN
  • DST distribution to CERN & other Tier-1's

Problem in last 24 hours

  1. Issues with slow transfer rates into CERN
  2. local SE problems at CNAF; unable to access data out of SE - jobs running but ticket not closed yet
  3. Lyon workaround to get data from disk SE with dcap protocol failed. (This is to avoid having to use gsidcap)
  4. RAL jobs running slowly

Outstanding Problems

  1. T0 storage system problem. Problem uploading files to T0
    1. What did change since last Sunday in the CERN Storage System? The transfer efficiency started degrading gradually until the current situation where none of the transfers to castorgrid succeeds. Why is there no way to upload files through gridftp to CERN? Why going through castorgrid all transfers time out?
    2. All transfer to are hanging: overload of the scheduler (or stager)?
  2. lcg-utils: improvement/development.
    1. Need a lcg-util for uploading files to grid without registering it. (lcg-cp that does accept SURL as destination or a lcg-cr that doesn't register).
    2. lcg-gt should be able to analyze a list of protocols rather than invoking lcg-gt as many times as there are protocols to be checked
    3. lcg-cr should be able to register a file in the catalog with an additional option that allows to specify the host field.
    4. lcg-cr should delete the physical replica when the registration on the file catalog doesn't succeed. (for consistency of the file catalog). It should also delete the temporary entry in the FC when the transfer fails or the replica already exists.
    5. lcg-gt should return TURL that are compatible with ROOT application. Using the gfal_plugin there wouldn't be any problem, but the current incompatibility should be fixed.
    6. The deployed dCache client in gLite 3.0.0 has a bug in libgsiTunnel that prevents using gsidcap (only protocol available at some sites). This implies a LCG AA release with the current library.
    7. getbestfile method in GFAL flawed logic when checkling on domain names (example is GRIDKA where SE and WNs sit on different domain) so doesn't match. Going through the .BrokerInfo and then through the IS would be the solution. Reported and to be further discussed with developers.
    8. overall requests for making the deployment of lcg-utils much more light with officially maintained tarball distribution that could be shipped with LHCb jobs. Discussion with developers.
  3. Endpoints name convention:
    1. The major issue here is that VO namespace should be sacrosanct. Jeff's proposal (site-related-path/VO-path) presented at last ops meeting is reasonable but should be enforced
    2. srm host-name should be generic enough that it doesn't change with time unless forced by special events (e.g. not be the machine cryptic name, not contain the local storage technology). Prefer srm. to dcache05srm. or castorsrm.SURLs are registered in the FC that necessitate lengthy updates when endpoint names are changed.

Ongoing Tests

  1. Increase nos of jobs at GridKa, CNAF, CERN & PIC (Ricardo)
  2. Replication of fake LFN (Andrew)
  3. Check on slow running jobs at RAL (Raja)
  4. Brunel performance & benchmarking (Nick via Marco)
  5. disk-SE problems & running jobs at Lyon (Andrei)
  6. CASTOR/network into CASTOR Grid (Roberto)
  7. High load on CERN RB (Roberto)
  8. Problems with authorisation even though proxy was still valid (Roberto)

Resource Issues

  1. NIKHEF situation: The center has been downgraded to T2 role because it's currently impossible accessing files stored in the WAN connected Storage at SARA from WN via dcache. Await patched version of the dCache client is gonna be released for test. This version wouldn't require Inbound connectivity on the WN because it wouldn't require calls client back. Until further news, NIKHEF sits out DC06 activity.
  2. IN2P3 does use gsidcap for the tape storage. This also prevents LHCb using Lyon as expected since the production version of ROOT was not able to use gsidcap until lnext release of AA. The disk endpoint of IN2P3 will be used instead.
  3. All trasfer from to all T1 (except from RAL and CERN) fail with message "No site found for host" which means the service.xml file must be manually updated to take into account about this host. IN2P3 anf GRIDKA also changed their own service.xml. Waiting from CNAF and SARA


Previous Problems

Edit | Attach | Watch | Print version | History: r87 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2006-07-26 - NickBrook
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback