WLCG Critical Services
These numbers have been discussed and agreed between the experiments, WLCG operations and the IT/Tier-0 service managers.
Definitions
- Impact
- the amount of "damage" made by a service unavailability to operations or people if no action is taken
- Urgency
- the delay between the start of the service unavailability and the time the full impact is reached
- Functional service
- a high level service corresponding to a particular function of the computing system, as defined in the WLCG MoU Annex 3
- Specific service
- a service contributing to one or more functional services
CERN functional services
Operations-related services |
High bandwidth connectivity from detector area to computer centre |
Recording and permanent storage in a MSS of raw and reconstructed data |
Disk storage of reconstructed data |
Distribution of raw and reconstructed data to Tier-1 sites in time with data acquisition |
Prompt reconstruction, calibration and alignment |
Storage and distribution of conditions data |
Data analysis facility |
Databases |
VO management services |
Tools and support services |
Tools and services for application development (CVS, SVN, etc.) |
Desktop services (email, web, Twiki, Indico, Vidyo, etc.) |
Tier-1 functional services
Operations-related services |
Raw and reconstructed data import from Tier-0 |
Simulated and processed data import from other WLCG centres |
MSS archival storage of raw, reconstructed, processed and simulated data |
Disk storage for data and temporary files |
Provision of data access to other WLCG centres |
Data analysis and reprocessing |
Other experiment services |
Network and data transfer services to Tier-0 and Tier-1 sites (high bandwidth) and to Tier-2 sites |
Databases |
Tier-2 functional services
Operations-related services |
Disk storage for data and temporary files |
Provision of data access to other WLCG centres |
Data analysis |
Simulation and data processing |
Other experiment services |
Network and data transfer services |
Impact
Impact on operations
Level |
Definition |
10 |
Most operations services stop |
9 |
Some operations services stop |
8 |
One operations service stops |
7 |
Most operations services disrupted |
6 |
Some operations services disrupted |
5 |
One operations service disrupted |
4 |
Some support services stop |
3 |
One support service stops |
2 |
Some support services disrupted |
1 |
One support service disrupted |
Impact on people
Level |
Definition |
10 |
Whole VO affected |
8 |
Users affected > 50% |
5 |
10% < users affected < 50% |
3 |
Users affected < 10% |
1 |
A single user affected |
The overall impact is taken as the maximum between the impact on operations and on people.
Urgency
Criticality of Tier-0/CERN services
|
ALICE |
ATLAS |
CMS |
LHCb |
Service |
Urgency |
Impact |
Urgency |
Impact |
Urgency |
Impact |
Urgency |
Impact |
Px→Computer centre network |
6 |
10 |
4 |
10 |
3 |
10 |
10 |
10 |
WLCG network (LHCOPN, GPN) |
8 |
10 |
7 |
9 |
7 |
9 |
7 |
10 |
CERN Oracle online |
10 |
10 |
9 |
10 |
10 |
10 |
- |
- |
CERN Oracle Tier-0 (inc. streaming) |
4 |
6 |
8 |
9 |
6 |
10 |
10 |
10 |
DB-on-demand |
|
|
|
|
|
|
10 |
10 |
Frontier and squid |
- |
- |
6 |
8 |
6 |
10 |
- |
- |
CASTOR tape |
4 |
10 |
7 |
8 |
2 |
8 |
2 |
8 |
CASTOR disk |
5 |
10 |
- |
- |
- |
- |
- |
- |
EOS |
5 |
10 |
6 |
8 |
6 |
8 |
6 |
8 |
Batch service |
3 |
10 |
6 |
9 |
5 |
9 |
5 |
6 |
CE |
3 |
10 |
6 |
8 |
3 |
3 |
5 |
6 |
FTS |
- |
- |
7 |
8 |
4 |
6 |
5 |
9 |
VOM(R)S |
3 |
10 |
4 |
10 |
4 |
10 |
7 |
10 |
BDII |
- |
- |
2 |
5 |
3 |
5 |
1 |
1 |
MyProxy |
3 |
10 |
3 |
3 |
4 |
9 |
4 |
10 |
CVMFS Stratum0 |
4 |
9 |
4 |
9 |
4 |
6 |
4 |
4 |
CVMFS Stratum1 |
3 |
5 |
3 |
5 |
4 |
6 |
1 |
5 |
Dashboard |
1 |
3 |
5 |
8 |
3 |
5 |
1 |
1 |
SAM |
3 |
3 |
3 |
3 |
5 |
3 |
4 |
2 |
AI cloud services |
3 |
10 |
9 |
10 |
8 |
8 |
10 |
10 |
LXPLUS |
3 |
5 |
5 |
5 |
8 |
6 |
10 |
6 |
AFS |
- |
- |
6 |
8 |
6 |
9 |
4 |
4 |
CAF |
6 |
10 |
8 |
9 |
8 |
8 |
1 |
1 |
SVN |
- |
- |
4 |
8 |
4 |
4 |
6 |
6 |
GIT |
5 |
9 |
|
|
6 |
6 |
|
|
JIRA/TRAC |
5 |
9 |
4 |
8 |
3 |
5 |
3 |
6 |
Global xrootd redirector |
- |
- |
|
|
6 |
8 |
|
|
Twiki |
3 |
3 |
7 |
9 |
6 |
6 |
6 |
6 |
Mail and web services |
6 |
10 |
8 |
10 |
5 |
10 |
8 |
10 |
Hypernews |
- |
- |
- |
- |
4 |
5 |
- |
- |
Indico |
3 |
3 |
3 |
8 |
3 |
5 |
8 |
9 |
Vidyo |
|
|
|
|
|
|
8 |
9 |
SSO |
7 |
10 |
- |
- |
- |
- |
- |
- |
Terminal servers |
3 |
2 |
- |
- |
- |
- |
- |
- |
NICE AD servers |
3 |
2 |
- |
- |
- |
- |
- |
- |
|
|
|
|
|
Notes:
- The Stratum0 entry includes the release nodes
- The CAF, for CMS, consists of LSF queues
Previous versions
- Criticalities during Run1 (link)
- Criticalities during Run2 (link)