In preparation to the next meeting where we would like to discuss SAM operations, development and deployment aspects and collaboration between SUM developers and developers of the VO tests, David prepared a document.

Here we collect feedback to this document from SUM developers and developers of the VO tests.

Main questions we which might need to address during the discussion between SAM team, SUM developers and people who are in charge of VO-specific Nagios tests are

  • Machines, how many sets of machines we need, usage/purpose of every set, who is in charge for every set, who can do what and how, how we deploy them
  • Test suits for SAM tests, what SUM team can provide in order to partially automatize validation
  • Validation process itself. How we agree when two weeks suggested by SAM team start (people can be busy with other things and can not start validation asap)
  • How upgrade to the new releases is agreed with the experiments

Comments from Pablo:

First point in the 'Probes development'

We are going in the direction of having 5 sets of machine (production, pre-production, nightly build, sam development and experiment development). If we have to install the machines by hand, it will be a lot of work. So the suggestion is to use quattor templates, and they have to be uptodate

Staged rollout

the deployments on pre-production should also be announced to the experiments. The start of the validation process should be agreed with the sam-hep-vo-contacts.

It is not clear if the five days of 'availability and topology comparison' are part of the validation process or not. It might be good to divide the validation process in two parts: the first part when SAM team does their own testing, and after that has been successful, SUM and VO representatives start with the rest of the validation.

Would be nice to put max delay between the SNOW request and the installation on the nodes (something like 'less than 8 working hours').

Questions

o) The staged rollout does not include anything about validation by the experiments. Should it be there?

o) Do all of the SAM components have to be deployed at the same time? I mean, if instead of deploying the MRS, POEM, APT at the same time, they could be deployed one after the other, it would make all the comparison much easier.

o) In the scenario described in the document, preproduction is not that important. Is it really needed?

Comments from Jarka

Jarka agreed to attend the SAM standup meetings, max once a week when issues which can have impact on SUM could be discussed

Jarka has scripts which she uses to compare topologies, availabilities etc. She will provide them to SAM team who can integrate them in their tests.

Topology comparison

Jarka strongly disagrees that 10% difference in the topology is not considered to be a showstopper taking into account the fact that site availability has strong impact on site funding

Availability comparison

The specific profiles with top priority consistency check should be ALICE_CRITICAL, ATLAS_CRITICAL, CMS_CRITICAL_FULL, LHCb_CRITICAL. However, the remaining profiles should not be forgotten and their check should be performed too, also on regular basis. At least once per month for every profile. As any other check in the past, this could help to recover bugs in the SAM framework (not having to mention that experiments are not very happy if a bug is addressed on months timescale).

Deployment

2 days in advance should be fine, but it depends on SAM downtime duration for the experiment. The deployment should not be only announced, it should be agreed in advance with experiments, with each of them separately. In the same way like the EOS/CASTOR interventions are done... Jarka would strongly suggest to agree with experiments on date&time&duration of the intervention before WLCG Daily Ops announcement is done.

-- JuliaAndreeva - 19-Oct-2012

Edit | Attach | Watch | Print version | History: r8 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2012-10-22 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback