ALICE
  • CVMFS Stratum0: add with urgency 4 and impact 9
  • CVMFS Stratum1: add with urgency 3 and impact 5
    • with some other Stratum1 outside CERN, why is the CERN one considered critical at all?
      • A: fail-over to a remote Stratum-1 may not be so transparent as expected (e.g. RAL recently failed to take over from CERN); ATLAS and CMS have similar values for Stratum-1
  • Openstack, Puppet: add with urgency 3 and impact 10
    • As batch
    • Aren’t these internals of CC operations?
    • We would like to understand this a bit more. For puppet, if it fails, machines can be exceptionally configured manually, it is not a blocker. For Openstack cloud, as soon as you run load balanced clustered services the failure of a VM should not be a problem either.
      • A: more and more services (for ALICE e.g. the validation cluster and the CAF) will depend on the Sandbox.OpenStack infrastructure (Cinder/Glance/CEPH/Keystone/Nova/...): if any of that breaks, a lot could go down with it...
  • GIT: add with urgency 5 and impact 9
    • what is stored on GIT that is critical?
      • A: the daily analysis tag and the weekly core revisions. The analysis is agile: if an important improvement cannot be committed, a lot of analysis may stop; this can only be tolerated for a while. If the service stops at a 'bad time', viz. at the weekly core release, this may also stop ongoing reconstruction/MC.
  • JIRA: add with urgency 5 and impact 9
    • why?
      • A: the production is driven by tickets in JIRA. If that service is down, new productions cannot be started.
  • CERN Oracle Tier-0: urgency 4and impact 6
  • Terminal servers: add with urgency 3 and impact 2
    • Still used but several workarounds if they don’t work
    • What are these used for? how is experiment operations affected if there are no terminal servers? We see it similar to lxplus, if lxplus is not critical why the TS should be?
      • A: the urgency and impact are low, but the servers are used, just like lxplus.
  • NICE AD servers: urgency 3 and impact 2
    • Credentials cached
  • DNS: urgency 3 and impact 2
    • Local caches, impact only for new devices
    • could you give us more details on this one?
      • A: the DNS is of course a critical service! The urgency and impact are mitigated by local caches.
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2020-08-18 - TWikiAdminUser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox/SandboxArchive All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback