Infrastructure Status

  • No new sites this week.
  • UCSD, Caltech, Nebraska, Wisconsin, Purdue have all tests functional.
  • Stability issues at Purdue and Wisconsin - libdcap related.
  • Progressed on making availability reports (not automated). For example UCSD's availability in the last 7 days is 99.957%.

Action Item Status

  1. UCSD UAF cluster.
    • MT: 4 uaf [3-6] machines up and running.
      • There is an issue with leaking of file-descriptors (ulimit is 2048, which is low for xrootd standards). It seems it affects all (hadoop?) sites.
      • In the process of setting up a secondary master on
      • I guess UCSD is ready to switch into production mode.
  2. Improved service monitoring (missing tests, alerts).
    • MT: tried setting up nagios alert emails twice - and failed. Heh ... it seems it is just cern spam filter.
  3. Clarify plans for JobRobot with Andrea Sciaba.
    • Some emails back and forth. Mostly stalled.
  4. Fix dcap deadlock issues. Done. Submitted back to Will be following up with code reviews. If the patch holds up, will do a release of libdcap.
    • BB: Additional issue found, also submitted. dcap is a problematic library: no surprise there. Improved libdcap is in Koji.
  5. CMSSW TTreeCache management for 4_2_0. Done; appeared in 4_2_0_pre3.
    • BB: new ROOT plus CMS files are having nasty memory issues. Have been spending time trying to salvage the situation.
  6. Upgrade release to 3.0.2; test cmsd throttling from Andy.
    • BB: 3.0.3 pre-release is available in Koji. Not tested yet.
  7. Update project webpages: remove references to demonstrator, add information about architecture we're working on deploying.
    • BB: Done. Next week, we ought to review the user documentation.
  8. Continue Monalisa monitoring investigation.
    • MT: Installed ML repo on
      • The aggregator / service was running already before, group-name xrootd_cms (if you want to look with a ML client
      • I installed also "host monitoring" sensor from ML team on uaf-[3-6]. It's independent java process, taking 40-100MB.
      • Expect to add first plots to web front-end soon, it is empty/bare now ( -- but I changed the logo wink
      • Should point other xrootd servers to send monitoring data to UCSD.
  9. Converting a physicist's analysis to use Xrootd.
    • Verify Ken Bloom and Aaron Dominguez can use Xrootd on his laptop.
    • MT: No news from my side here.
    • BB: No news from me. Carry over to next week.

Items for next week

  1. JobRobot progress.
  2. Progress with local physicists.
  3. Setup a local redirector at UCSD.
  4. Test out the Xrootd 3.0.3 throttling.
  5. Get dCache sites upgraded to new version of libdcap.
  6. Improve the ML webpages.
  7. Make public per-site Nagios pages.
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2011-02-23 - BrianBockelman
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback