Infrastructure Status:

  • Mostly stable two weeks at Nebraska, Caltech, UCSD.
  • dCache sites (Wisconsin and Purdue) had a poor week in the heartbeat tests. It appears dCache Xrootd door is sometimes slow (taking >5 minutes) on specific files; this seems to be triggered by the heartbeat tests, although

Action Item Status

  • Developing policy for what files are accessible via Xrootd: Waiting on FKW?
    • all of what is in central space
  • FNAL validation:
    • Catalin has been improving local monitoring; FNAL has been doing fine in Nagios tests.
    • BB: Ran three CRAB jobs against FNAL. First failed and discovered some misconfigured servers. Second and third had a high error rate; couldn't replicate issues with running CMSSW from an interactive node.
    • Needs deep investigation; I don't consider it validated yet. JobRobot jobs will help us out.
  • Monalisa progress: (waiting on MT input; he mentioned he might be late to the meeting)
  • Progress on JobRobot: We "rediscovered" an old bug in handling Andrea's certificate (due to an extra delegation not normally present in a CRAB job). Andrea was able to fix this by updating to a more recent CMSSW release. There has been one test run by Andrea: 100% successful.
  • Xrootd 3.0.3 testing: Validated the new features. Discovered, reported, and resolved a few bugs in our loadtests that are incorporated in this release.
    • Xrootd.org is now providing RPMs for xrootd in this release. We will switch to these for 3.0.3 rather than maintain our own.
  • Nagios tests:
    • Random file tests are stalled. Brian has not made time to do this over the last 2 weeks.
    • Still missing email alerts to folks (except for Brian). However, the current "churn" of the test results is still too high to hand to admins. Working to reduce the unnecessary alarms.

AOB

  • Discussed a security concern with the dCache folks. It seems the Xrootd protocol lacks necessary safety features for allowing writes over WAN. We weren't planning on using writes for this project, but we should be more explicit to tell sites to disable writes. No site currently allows writes.
  • Reworked how to integrate dCache sites. Wrote new documentation and released new sample configuration files. Appears more robust; does much better on the JobRobot. Suffers/gains from having dCache involved: if dCache stops responding on a file (such as the Nagios test file), then it is not accessible via xrootd. See note about Wisconsin and Purdue in Nagios above.


This topic: Main > TWikiUsers > BrianBockelman > CmsXrootdArchitecture > XrootdProgress20110316
Topic revision: r2 - 2011-03-16 - FrankWuerthwein
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback