Present: Dario Barberis, Barry Blumenfeld, Alessandro Di Girolamo, Dave Dykstra, Luis Linares, Andrea Valassi

At the last WLCG operation coordination meeting Dave presented a little new information that hadn't yet been brought up at a task force meeting:

  • We plan to ask all sites to allow incoming MRTG queries from 128.142.0.0/16 188.185.0.0/17 which are the "WLCG" (a.k.a. LHCOPN) subset of addresses in CERN and the future Hungary data center. That will give us best flexibility in moving the monitoring servers around. ATLAS & CMS groups have been asked, no objections in general. If a few sites refuse, it's not a big deal, they will just have to change when the monitoring servers change. This allows more freedom to use virtual machines because we don't need to worry so much about getting the longest life out of the machines. It's also one less reason to use a virtual IP address passed back and forth between the machines, which CERN networking really doesn't want to support (they limit it to a single switch, and there aren't any VM clusters that don't take up a whole switch).
  • Dave now thinks we can use linux-ha to determine which of the machine pair is a master but without using a virtual IP address. We can use round-robin between the pair of machines but have the backup forward incoming connections to the primary. Only the primary will initiate monitoring.
  • The main CMS-specific source of information into ATP (analogous to AGIS for ATLAS) is called SiteDB. SiteDB, however, is much too limited to include squid configuration information. Alastair reported to us since last meeting that GOCDB is planning an upgrade in about 4 months where it could likely be able to hold such information.
  • When Dave reported that the squid monitoring task force was targeting being able to monitor CVMFS stratum 1 squids including failover detection based on awstats, Ian Collier brought up that he had objections to allowing monitoring on the RAL CVMFS stratum one he administers. Dave talked to him this week, and Ian said he didn't really mind MRTG monitoring (although the RAL network people didn't like it that's not really relevant) but he still had some objections to the frontier-squid distribution (which would be easiest to use especially to get awstats monitoring) not being fully like a normal Redhat rpm. It used to be a lot worse but he does still object to it not using logrotate to rotate files because he can't set them to compress. Log compression could be added. He and the CERN stratum one administered by Steve Traylen use cacti instead of MRTG for performance monitoring and webalizer instead of awstats for determining the source of incoming requests.

4 VMs on critical power were received by CMS today for replacing frontier.cern.ch, 2 for the wlcg squid monitoring function and one for the remaining frontier monitoring functions.

Alessandro & Andrea were pretty sure that the public squid service names (that is, the public DNS name for a single squid and public round-robin names for multiple squids) and port number (usually 3128) will need to be in GOCDB/OIM.

AGIS currently only stores the squid service names, public at least (and presumably the workstation view too if the names are different -- maybe none are in ATLAS). The SAM test in ATLAS only checks each round-robin squid service once, both the primary and any backup. This is in contrast to CMS which checks every squid in the list and expects every squid at a site to be listed (in addition to a round-robin service name if it is present). The MRTG monitoring auto-configuration on frontier.cern.ch for ATLAS reads from AGIS, but it has hard-coded exceptions for the sites that have more than one squid to list the individual squid names.

Mainly we discussed where and how to maintain the source of configuration information about squids. CMS now asks all sites to put the "worker node view" of squid information into a local configuration xml file in a well-known location, including all individual squids listed in a round-robin. Administrators are then required to check the file in to a shared CVS server, and this is verified by SAM test. This inspired Alessandro to suggest that we ask all sites to maintain a configuration file with all the relevant information for their site, and somehow retrieve it or push it from the squid machines and collect in a central place. Dave is pretty sure squid could be configured with access control lists to allow incoming requests to a specific URL be forwarded to a site-internal web server (and not to a file local to the squid server as Alessandro originally hoped for). It is a pretty complicated solution, however, and would require additional firewall openings and will suffer from unreliable web servers. Dave was also concerned about setting up something new for just the squid service that was non-standard.

We'll probably have another meeting in January, date to be determined.

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt andreanotes.txt r1 manage 8.0 K 2012-12-14 - 21:35 DaveDykstra Andrea's notes for the meeting
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-12-14 - DaveDykstra
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback