Meeting Notes

Infrastructure

  • Fairly quiet 2 weeks on the ranch. No major outages, but not a large set of overflow jobs either.
  • MIT has updated to the latest release and have installed a load-balancer.
    • Still not getting DNs from MIT.
  • Vanderbilt and Purdue have both had downtimes. Vandy crashes and stays down until some intervention. Purdue errors tend to be transient (dCache issues?)
  • Quite a bit of fallback from Purdue 2 weeks ago.
  • FNAL good and quiet. Served a lot of data.
  • Seem to actually be getting standalone users.
    • Publicity campaign starts at OSG AHM?
    • USCMS collaboration meeting? Next one is in May. Should definitely do it then.
    • CMS week? End of month. No one seems to be going.
    • Can we differentiate standalone users from overflow users? Simple answer is no.

Software

  • MT: Update the backend of the monitoring infrastructure. Can now forward packets between installs. Refactor so its easy to install at other sites.
  • MT: Regexp-based filters on the monitoring pages:
    • BB: Hidden behind firewall yet?
  • MT: Caching proxy. Got all the background details from Andy. Implementation estimate is 2 weeks fulltime, 1 month parttime.
    • FKW: How does caching proxy know to cache (or not)? BB: We should have a callout module, and implement the "simple obvious" policy (stat the file) at first.
    • BB: What about re-rolling vector reads? MT: On the list.
    • Want to work on standard candle to understand CMS usage patterns.
  • BB: Statistics work will land soon.
    • FKW: Can we get a hook in such that batch ID is reported to the monitoring system? Yes, just need to add it to the client (may take time for full rollout).
  • DB: Use cases for transparent software access: opportunistic OSG, commercial site on OSG, and cloud on OSG.
    • Integrated with the local glideinWMS VO Frontend. Run CMSSW within GLOW VO at MWT2-UC.
    • BB: We can run this at Nebraska at a semi-large scale, we've done it before with Purdue. DB: OK, will send code.
  • YZ: Working on finishing up file loss report.

AOB

  • IS: Mostly consumed by WLCG TEG and glidein training.
  • Delayed project status update from Nebraska until next week. Between now and then, we'll populate a twiki page and start this work over email.

Nebraska Project Status

DELAYED TO NEXT WEEK:

1c. [Brian] Deploy system at 5 sites and make it manageable for long-term operation

  • How does admin/manager know if it is broken?
    • manager should get report in morning RSV emails. PARTIAL. Probe exists, not documented/deployed. Delayed for the OSG 3.0 release.
    • alarms when something is broken. PARTIAL. Filling in holes (like the overflow failure report).
    • site admin documentation. PARTIAL
    • SAM-like test (James Letts is working on a JobRobot-like test that reads a file that is supposed to be there and one that isn't). NOT DONE
    • monitor IO perf for WAN and LAN. DONE
      • dashboard will examine job report to determine if job overflowed, so this data should be available. PARTIAL
      • Frank also wants this in gratia. DONE (?)
  • operations plan
    • decide on a set of metrics for overall system that we expect it to deliver on a regular basis. DONE in effect, but not written.

5. CMSSW I/O

5a. [Brian] metrics

  • Waiting on xrootd stats in FJR (5e). NOT DONE.

5e. [Brian] improve measurements in CMSSW

  • xrootd statistics in FJR. PARTIAL. Coding done, not in a release.
  • exit codes; use a different exit code if there is a fallback and fallback failed. NEEDS VERIFY.
  • recording fallback. DONE

8. Operations Tasks

8a. [Brian] Operate Services

8b. [Brian] attend to trouble tickets

    • respond to savannah tickets from sites/anaOps/. ONGOING.
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-02-07 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback