As presented at the July WLCG consolidation meeting, please find below the questionnaire with the topics relevant to the evolution of the SAM probes and its framework. Additional details can be found in the agenda and minutes at https://indico.cern.ch/conferenceDisplay.py?confId=263095

For a detailed explanation of each question please read the attached document (https://twiki.cern.ch/twiki/pub/LCG/SAMUsage/SAM_probes_questions.docx).

Question ALICE ATLAS CMS LHCb Comments
Long-term
Is SAM monitoring with OPS credentials essential N N N N  
Is it essential to have all metrics in NAGIOS? N N N N  
Do we need support for VO independent metrics? N N N N  
Do we need to run metrics with multiple FQANs? N Y Y maybe in the future  
Is the current metric results expiration policy OK? Y N Y Y  
Short-term
  ALICE ATLAS CMS LHCb Comments
What is your preferred system to replace WMS in job submission probes ? direct submission Condor-G Condor-G direct submission  

On the second iteration, let's concentrate on the items where there was no consensus, and in other fields where we could improve the current system:

Question ALICE ATLAS CMS LHCb Operations representative
Expiration policy
The current test results expiration is 3 months. Is that reasonable? If not, please note down your request Y Y Y for outputs
∞ for exit status
Y  
The current availability expiration is 12 months. Is that reasonable? If not, please note down your request Y Y Y  
Publication of tests results
At the moment, there are two ways of publishing results: via nagios box or directly to the message brokers.
Do you plan to use the nagios publishing? for now Y, for synchronous tests like e.g. storage ones. The rest to be discussed with sites Y yes, for "simple" tests its useful (e.g. is CVMFS installed, ...)  
Do you plan to use the direct submission to message brokers? Y Y, already in use injecting in the MSG which is consumed by Nagios box itself N Y (currently finishing its development)  
Do you prefer to use a different technology for execution of the probes? If so, which one? possibly, what choices are there? possibly, which are the options? N the usecase is to run simple scripts (shell/python) and return log/return code  
Nagios
At the moment, there are 3 nagios boxes per experiment: production, preproduction, development. Which ones do you use? prod, preprod all all dev, pre-prod, prod (one environment to test new probes is enough, thinking loud: e.g. could be also a different profile in prod?)  
At the moment, the experiment contacts have root access to the development boxes. Is that required, or would access as the 'nagios' user be enough? root access needed (dev not used yet) root root root access preferred  
At the moment, the experiment contacts have root access to the production/preproduction boxes. Is that required, or would access as the 'nagios' user be enough? root access desirable on preprod nagios on prod ok - root on preprod nagios standard nagios should be enough, for emergency fixes root access preferable  
The working process is that the experiment contact develops a probe in the development box, then creates an rpm, and then requests the monitoring team to install that rpm in the production box. Are you happy with this scenario? It not, what do you suggest ? Y Y Y RPM seems heavy for for a sw package which is deployed on a single box, can we e.g. have repositories with tags?  
The probes are developed within the probe framework provided by IT-SDC. Are you happy with this framework? If not, what things need to be improved ? Y Y Y OK, but probes can also be developed as standalone scripts  
Instances
At the moment, there are 3 infrastructures: production, preproduction and validation. Validation is used internally by IT-SDC. Do you use preproduction? Y Y rarely Y (grid wide tests, but could be simplified as mentioned above)  

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdocx SAM_probes_questions.docx r1 manage 145.7 K 2013-07-31 - 08:48 MarianBabik  
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2013-09-12 - AlessandraForti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback