WLCG Critical Services

Introduction

This page lists per LHC experiment the set of services that are:

  • not operated by its own personnel, and
  • deemed critical for the successful operation of
    its grid workflows and for related activities.

Most of those services are hosted and operated by CERN-IT, while several Tier-1 sites and other partners also provide some.

For every relevant service, each experiment has provided indications of the effects of the service being unavailable. The impact indicates the effect on operations or people if the service were unavailable for a few days. The urgency indicates how quickly that impact would be reached. The criticality is defined as the product of urgency and impact. At the right hand side there are columns for the maximum criticality of a service across the experiments, the sum of the criticalities across the experiments and the weighted maximum criticality. The latter ranks services with identical maximum criticalities according to their respective sums of criticalities. Each numeric column can be sorted in ascending (descending) order by clicking once (twice) on its header.

Impact on operations and/or people

Level Definition
10 ops/VO severely affected
7 ops/VO notably affected
4 ops/VO moderately affected

Urgency levels

Level Definition
10 full impact reached within 6 hours
7 full impact reached within 1 day
4 full impact reached within 2 days
1 full impact reached after 2 days

Criticality levels

As a visual aid, 3 criticality ranges have been defined with distinct colors.
For a given experiment and for the maximum across the experiments, the ranges are as follows:

top
70-100
high
40-69
moderate
0-39

For the sum of the criticalities across the experiments:

top
210-400
high
120-209
moderate
0-119

The colors for the weighted maximum values correspond to those of the maximum values across the experiments.

Purpose of the tables

These tables are meant to clarify which services require which level of attention in their implementation and operation, to try and minimize the effects of service unavailability on the experiments, to the extent feasible. For example, a highly critical service should, if possible, be implemented and monitored in a more robust way than a less critical service. HA deployment methods, load-balancing and/or hot standby setups should be considered for such cases.

These tables do not make any promises about the level of support that can be expected for a given service: unless a specific arrangement was made for a particular service, the support level is best-effort for any service, though in practice it usually is compatible with the actual criticalities of the given service. If not, the service implementation and operation can be looked into.

CERN-IT services

Service urg imp crit urg imp crit urg imp crit urg imp crit    max sum wtd
  ALICE ATLAS CMS LHCb   crit crit max
Px-CC network 7 10
70
7 10
70
4 10
40
10 10
100
 
100
280
1280
LHC-OPN / LHC-ONE / GPN 7 10
70
7 10
70
7 10
70
7 10
70
 
70
280
980
Oracle online 10 10
100
10 10
100
10 10
100
10 10
100
 
100
400
1400
Oracle offline (inc. streaming) 4 7
28
10 10
100
7 10
70
10 10
100
 
100
298
1298
DB-on-Demand    
0
7 10
70
4 10
40
10 10
100
 
100
210
1210
CTA 4 7
28
7 7
49
4 7
28
4 7
28
 
49
133
623
EOS 7 10
70
7 7
49
7 10
70
7 7
49
 
70
238
938
FTS    
0
10 10
100
4 7
28
4 10
40
 
100
168
1168
Global xrootd redirector    
0
   
0
7 7
49
   
0
 
49
49
539
Ceph    
0
10 10
100
4 7
28
10 10
100
 
100
228
1228
CVMFS Stratum-0 7 10
70
7 10
70
4 7
28
4 10
40
 
70
208
908
CVMFS Stratum-1 4 7
28
7 4
28
4 7
28
7 10
70
 
70
154
854
Frontier and Squid    
0
7 7
49
7 10
70
   
0
 
70
119
819
Batch service 7 7
49
7 7
49
4 7
28
4 7
28
 
49
154
644
Dedicated batch    
0
7 7
49
10 7
70
   
0
 
70
119
819
CE 7 7
49
7 7
49
4 4
16
4 7
28
 
49
142
632
VOMS 4 10
40
7 10
70
4 10
40
7 10
70
 
70
220
920
MyProxy 4 10
40
4 4
16
4 10
40
   
0
 
40
96
496
CRIC 1 4
4
7 7
49
4 4
16
1 4
4
 
49
73
563
WAU / WSSA 1 4
4
1 4
4
   
0
1 4
4
 
4
12
52
BDII    
0
   
0
   
0
1 4
4
 
4
4
44
Monit 1 4
4
7 7
49
7 7
49
4 4
16
 
49
118
608
SiteMon 1 4
4
4 4
16
7 7
49
4 4
16
 
49
85
575
AI cloud services 4 7
28
10 10
100
7 7
49
10 10
100
 
100
277
1277
Kubernetes    
0
10 10
100
7 7
49
   
0
 
100
149
1149
Lxplus 4 7
28
7 7
49
7 7
49
10 7
70
 
70
196
896
AFS    
0
7 7
49
7 10
70
   
0
 
70
119
819
GitLab 7 7
49
7 4
28
7 7
49
7 7
49
 
49
175
665
JIRA 4 4
16
7 4
28
4 4
16
4 7
28
 
28
88
368
Twiki 1 4
4
7 4
28
7 7
49
4 4
16
 
49
97
587
Indico 1 4
4
7 7
49
4 7
28
7 7
49
 
49
130
620
Video conf    
0
7 7
49
7 7
49
7 7
49
 
49
147
637
Windows terminal service 1 4
4
1 4
4
   
0
   
0
 
4
8
48

Services at other sites

Service urg imp crit urg imp crit urg imp crit urg imp crit    max sum wtd
  ALICE ATLAS CMS LHCb   crit crit max
GOCDB 1 4
4
4 4
16
4 4
16
7 7
49
 
49
85
575
MyOSG    
0
4 4
16
4 4
16
   
0
 
16
32
192
GGUS 1 4
4
4 4
16
7 7
49
7 4
28
 
49
97
587
FTS    
0
10 10
100
4 7
28
4 10
40
 
100
168
1168
Stratum-1 4 7
28
7 4
28
4 7
28
7 10
70
 
70
154
854
Accounting Portal 1 4
4
1 4
4
   
0
1 4
4
 
4
12
52

Previous versions

Topic attachments
I Attachment History Action Size DateSorted ascending Who Comment
PNGpng ALICE_crit.png r3 r2 r1 manage 19.9 K 2015-03-12 - 15:27 AndreaSciaba  
PNGpng ATLAS_crit.png r2 r1 manage 21.9 K 2015-03-12 - 15:28 AndreaSciaba  
PNGpng CMS_crit.png r2 r1 manage 20.9 K 2015-03-12 - 15:28 AndreaSciaba  
PNGpng LHCb_crit.png r2 r1 manage 22.1 K 2015-03-12 - 15:28 AndreaSciaba  
Edit | Attach | Watch | Print version | History: r36 < r35 < r34 < r33 < r32 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r36 - 2020-10-05 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback