Contact people at FZK are Jeff Templon and Ronald Starink.

  • Ronald 22 Jan 2009: Nevertheless, as site administrator, I would like to be notified of problems instead of checking various sources (web sites, portals) for problems. We have good experience with Nagios in this respect. Not only do we use it to monitor our farm, but also to periodically retrieve the results of critical SAM test. If there is a failure, we'll be notified instantly. Note that we use the work of the EGEE-2 Monitoring Working Group for this, as developed at CERN by James Casey and others. Perhaps a similar approach can be taken for the experiment's view; on the site's Nagios server runs a probe- once per hour- to retrieve the results by some kind of remote database query. Shouldn't be difficult because a working example exists! In case of a failure, Nagios could even show a link to the URL that provides details (e.g. the gridmap page).
    Answer: I have discussed with JUlia (in cc) about this. In fact from the technical point of view it should not be difficult to implement a Nagios probe which retrieves results from our database, and sends an alarm in case of error. The problem is that , before that, we should set some rules which clearly define the failures from the point of view of VOs, and this is not done yet. In our tool we have tried to get from the VOs the rules to define the status, which determine the color of the maps. But these are still tentative definitions. Most probably these rules will have to be tuned before they become in some way official for the VO. For this reason we consider that sending to the site an automatic alarm based on these rules could be confusing.
I certainly agree that there should be rules or criteria for failures before notifying sites of problems. We are already working with the HEP VOs to imoprove their SAM tests. My point was that as site admins, we'd like to get an overview of the status from one source, and ideally problems should be "pushed" to us, instead of us "pulling" potential problems from various sources. But I'm already happy with your tool because it combines the results from the various dashboards!
ok

  • Ronald 22 Jan 2009: I can confirm that the right click solution work, although in my opinion it is not an intuitive action. I was tempted to click with the left mouse button, expecting to see details. Unless the left-click is reserved for future enhancements, could you please consider this option?
    Answer: actually, if you click with the left button on a map, a sub map should appear. This is explained also with some screenshots in the User Guide: https://twiki.cern.ch/twiki/bin/view/Sandbox/UserGuide. I hope it helps (though it cannot be as clear as in a real time demo..)

  • Ronald: Concerning the pop-up: could it be changed to behave as a tooltip, meaning that it is shown only if the mouse was not moved for more than ~0.5 seconds? That would make the user interface more quiet.
    Answer: I will take into account this suggestion. I'll talk about it with Max Boehm, who is the developer of gridmap. TALK WITH MAX

  • About target values to associate to the number of running jobs: I asked to NIKHEF about the pledged values for job slots and Jeff pointed out that there are no pledged values for that! His mail: We don't have "pledged values" for job slots. We have fair share here. This means that averaged over a period of time, the VO is guaranteed to get a certain share of the entire farm. This is measured not by job slots, but by the total amount of processor time used by all jobs. The calculation of the fair share is based on :
    a) kSI2K pledged to the VO (this is a real number pledged to the VO and not directly related to job slots)
    b) total kSI2K in the farm and comes out to a percent value. You could indeed use this percent and multiply by the total number of processors, however there are situations in which a VO could not get this many job slots, and it would be perfectly acceptable behavior and in line with what we promised to the VO.
    Hence I would not like to provide any value for "pledged job slots" since we have never pledged a certain number. I would not like to encourage the VOs to believe in any number that does not really exist.

-- ElisaLanciotti - 28 Jan 2009

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-01-28 - ElisaLanciotti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback