WLCG Critical Services

These numbers have been discussed and agreed between the experiments, WLCG operations and the IT/Tier-0 service managers.

Definitions

Impact
the amount of "damage" made by a service unavailability to operations or people if no action is taken
Urgency
the delay between the start of the service unavailability and the time the full impact is reached
Functional service
a high level service corresponding to a particular function of the computing system, as defined in the WLCG MoU Annex 3
Specific service
a service contributing to one or more functional services

CERN functional services

Operations-related services
High bandwidth connectivity from detector area to computer centre
Recording and permanent storage in a MSS of raw and reconstructed data
Disk storage of reconstructed data
Distribution of raw and reconstructed data to Tier-1 sites in time with data acquisition
Prompt reconstruction, calibration and alignment
Storage and distribution of conditions data
Data analysis facility
Databases
VO management services
Tools and support services
Tools and services for application development (CVS, SVN, etc.)
Desktop services (email, web, Twiki, Indico, Vidyo, etc.)

Tier-1 functional services

Operations-related services
Raw and reconstructed data import from Tier-0
Simulated and processed data import from other WLCG centres
MSS archival storage of raw, reconstructed, processed and simulated data
Disk storage for data and temporary files
Provision of data access to other WLCG centres
Data analysis and reprocessing
Other experiment services
Network and data transfer services to Tier-0 and Tier-1 sites (high bandwidth) and to Tier-2 sites
Databases

Tier-2 functional services

Operations-related services
Disk storage for data and temporary files
Provision of data access to other WLCG centres
Data analysis
Simulation and data processing
Other experiment services
Network and data transfer services

Impact

Impact on operations
Level Definition
10 Most operations services stop
9 Some operations services stop
8 One operations service stops
7 Most operations services disrupted
6 Some operations services disrupted
5 One operations service disrupted
4 Some support services stop
3 One support service stops
2 Some support services disrupted
1 One support service disrupted

Impact on people

Level Definition
10 Whole VO affected
8 Users affected > 50%
5 10% < users affected < 50%
3 Users affected < 10%
1 A single user affected

The overall impact is taken as the maximum between the impact on operations and on people.

Urgency

Level Time (hours)
10 0
9 0.5
8 1
7 2
6 4
5 6
4 12
3 24
2 48
1 72

Criticality of Tier-0/CERN services

  ALICE ATLAS CMS LHCb
Service Urgency Impact Urgency Impact Urgency Impact Urgency Impact
Px→Computer centre network 6 10 4 10 3 10 10 10
WLCG network (LHCOPN, GPN) 8 10 7 9 7 9 7 10
CERN Oracle online 10 10 9 10 10 10 - -
CERN Oracle Tier-0 (inc. streaming) 4 6 8 9 6 10 10 10
DB-on-demand             10 10
Frontier and squid - - 6 8 6 10 - -
CASTOR tape 4 10 7 8 2 8 2 8
CASTOR disk 5 10 - - - - - -
EOS 5 10 6 8 6 8 6 8
Batch service 3 10 6 9 5 9 5 6
CE 3 10 6 8 3 3 5 6
FTS - - 7 8 4 6 5 9
VOM(R)S 3 10 4 10 4 10 7 10
BDII - - 2 5 3 5 1 1
MyProxy 3 10 3 3 4 9 4 10
CVMFS Stratum0 4 9 4 9 4 6 4 4
CVMFS Stratum1 3 5 3 5 4 6 1 5
Dashboard 1 3 5 8 3 5 1 1
SAM 3 3 3 3 5 3 4 2
AI cloud services 3 10 9 10 8 8 10 10
LXPLUS 3 5 5 5 8 6 10 6
AFS - - 6 8 6 9 4 4
CAF 6 10 8 9 8 8 1 1
SVN - - 4 8 4 4 6 6
GIT 5 9     6 6    
JIRA/TRAC 5 9 4 8 3 5 3 6
Global xrootd redirector - -     6 8    
Twiki 3 3 7 9 6 6 6 6
Mail and web services 6 10 8 10 5 10 8 10
Hypernews - - - - 4 5 - -
Indico 3 3 3 8 3 5 8 9
Vidyo             8 9
SSO 7 10 - - - - - -
Terminal servers 3 2 - - - - - -
NICE AD servers 3 2 - - - - - -
 
ALICE crit.png
ALICE criticalities
ATLAS crit.png
ATLAS criticalities
CMS crit.png
CMS criticalities
LHCb crit.png
LHCb criticalities

Notes:

  • The Stratum0 entry includes the release nodes
  • The CAF, for CMS, consists of LSF queues

Previous versions

  • Criticalities during Run1 (link)
  • Criticalities during Run2 (link)
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng ALICE_crit.png r3 r2 r1 manage 19.9 K 2015-03-12 - 15:27 AndreaSciaba  
PNGpng ATLAS_crit.png r2 r1 manage 21.9 K 2015-03-12 - 15:28 AndreaSciaba  
PNGpng CMS_crit.png r2 r1 manage 20.9 K 2015-03-12 - 15:28 AndreaSciaba  
PNGpng LHCb_crit.png r2 r1 manage 22.1 K 2015-03-12 - 15:28 AndreaSciaba  
Edit | Attach | Watch | Print version | History: r40 | r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r17 - 2019-08-16 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback