Grouping | Products Covered | Description | Flows |
---|---|---|---|
WMS | RB CE GRPK |
Workload Management System | WmsFlows |
DMS | SE FTS LFC |
Data Management System | |
IS | BDII | Information System | |
AAS | PX VOMS |
Authentication and Authorisation Services | |
MS | MONB GRVW SFT |
Monitoring System |
Grouping | Product | Notes | Implementation | Database | Web Server | LDAP | !GridFTP |
---|---|---|---|---|---|---|---|
WMS | RB | RbNotes | RbWlcg | MySQL | GRIS:2135 | Yes:2811 | |
WMS | CE | CeNotes | CeWlcg | MySQL (empty) | GRIS | Yes:2811 | |
WMS | GRPK | ||||||
DMS | SE | Yes:2811 | |||||
DMS | FTS | FtsNotes | FtsWlcg | Oracle | Tomcat5 | GRIS:2135 | |
DMS | LFC | LfcNotes | Memorandum LfcWlcg | Oracle | |||
IS | BDII | BdiiNotes | BdiiWlcg | GIIS EGEE.BDII |
|||
AAS | PX | PxNotes | PxWlcg | Yes:2135 | |||
AAS | VOMS | VomsNotes | VomsWlcg | Oracle | Apache Tomcat |
||
MS | RGMA | RgmaNotes | MySQL | ||||
MS | SFT | MySQL Oracle |
|||||
MS | GRVW | Apache |
Service | Memory (GB) | File Storage (GB) | Oracle (GB) | Criticality |
---|---|---|---|---|
RBP | 2 | 9000 | C | |
PXP | 2 | 40 | C | |
BDIIP | 2 | 80 | C | |
BDIIL | 2 | 80 | H | |
BDIIE | 2 | 80 | C | |
CEP | 2 | 80 | C | |
RGMAP | 4 | 160 | M | |
MONBP | 4 | M | ||
GRVWP | M | |||
SFTP | 4 | 160 | M | |
GRPKP | M | |||
VOMSP | 4 | 80 | C | |
LFCP-ALICE | 4 | 1000 | H | |
LFCP-ATLAS | 4 | 1000 | H | |
LFCP-CMS | 4 | 1000 | H | |
LFCP-LHCB | 4 | 1000 | C | |
FTSP | 4 | 1000 | C | |
CTRGP | C | |||
GRVWP | M |
Product | HA Approach | Impact of Downtime | File State Data | Database State Data |
---|---|---|---|---|
RB | Filesystem Takeover | |||
PX | Application Replication | Long running jobs cannot renew proxy and fail. Users cannot create new proxies. FTS transfer suspend | ||
BDII | Multiple independent instances with DNS round robin | no automatic failover to external BDIIs if CERN site down.Some sites have their own BDIIs.State kept (4MB) in memory and on disk | ||
CE | Filesystem Takeover | New jobs cannot be submitted to run at the site. Status of jobs running at the site will not be reported. | ||
RGMA | ||||
MONB | Permanently lose monitoring data | |||
GRVW | ||||
SFT | Site status cannot be monitored. New or fixed sites cannot join. Broken sites will not be detected. | |||
GRPK | Job output cannot be viewed by users | |||
VOMS | Master/Slave with IP address takeover | VOMS permissions are allocated with a lifetime of 24 hours. 90 minutes before expiration, a renew operation is tried. Therefore, after 90 minutes of downtime, 5% of jobs will fail every hour. | ||
LFC | DNS Round Robin | File Catalog | ||
FTS | DNS Round Robin | Single machine does not affect service. No file transfers initiated by site performed if entire service is down. | ||
CTRG | DNS Round Robin | Single machine does not affect service. No access to Castor if entire service is down. | None | None |
Product | Capacity | Accessibility | Aliases | Ports In |
---|---|---|---|---|
RB | M | Incoming | ||
PX | L | Incoming | ||
BDII | L | Outgoing | LB | 2170 |
CE | L | Incoming | ||
RGMA | M | Incoming | ||
MONB | M | Incoming | ||
ARCH | M | Incoming | ||
GRVW | M | Incoming | ||
SFT | M | Incoming | ||
GRPK | M | Outgoing | ||
VOMS | M | Outgoing | ||
LFC | H | Incoming | ||
FTS | H | Incoming | ||
CTRG | H | Outgoing |
Service account | uid | gid | Service group name | CRA group name | primary or secondary group |
---|---|---|---|---|---|
edguser | 17680 | 2747 | edguser | g01 | Primary |
edguser | 17680 | 2761 | infosys | g15 | Secondary |
edginfo | 17695 | 2748 | edginfo | g02 | Primary |
edginfo | 17695 | 2761 | infosys | g15 | Secondary |
rgma | 17696 | 2749 | rgma | g03 | Primary |
rgma | 17696 | 2761 | infosys | g15 | Secondary |
dpmmgr | 17697 | 2750 | dpmmgr | g04 | Primary |
lfcmgr | 17700 | 2751 | lfcmgr | g05 | Primary |
ceuser | 17719 | 2752 | ceuser | g06 | Primary |
condor | 17728 | 2753 | condor | g07 | Primary |
wmsuser | 17856 | 2754 | wmsgroup | g08 | Primary |
hacluser | 11774 | 2755 | haclient | g09 | Primary |
gridview | 15257 | 2756 | gridview | g10 | Primary |
glite | 21086 | 2757 | glite | g11 | Primary |
edguser:x:17680:2747::/home/edguser:/bin/bash edginfo:x:17695:2748::/home/edginfo:/bin/bash rgma:x:17696:2749:RGMA user:/opt/edg/etc/rgma:/bin/bash dpmmgr:x:17697:2750:DPM manager:/home/dpmmgr:/bin/bash lfcmgr:x:17700:2751:LFC manager:/home/lfcmgr:/bin/bash ceuser:x:17719:2752::/home/ceuser:/bin/bash condor:x:17728:2753::/home/condor:/bin/bash wmsuser:x:17856:2754:/home/wmsuser:/bin/bash hacluser:x:11774:2755:/home/hacluser:/bin/bash gridview:x:15257:2756:/home/gridview:/bin/bash glite:x:21086:2757:/home/glite:/bin/bashAnd the lines in /etc/group:
edguser:x:2747: edginfo:x:2748: rgma:x:2749: dpmmgr:x:2750: lfcmgr:x:2751: ceuser:x:2752: condor:x:2753: wmsgroup:x:2754: haclient:x:2755: gridview:x:2756: glite:x:2757: infosys:x:2761:rgma,edginfo,edguser
Service account | uid | gid | Owner |
---|---|---|---|
samops | 23550 | 1028 | Judit Novak |
samdteam | 23551 | 1028 | Judit Novak |
samatlas | 23552 | 1028 | Piotr Nyczyk |
samcms | 23554 | 1028 | Andrea Sciaba |
samalice | 23763 | 1028 | Patricia Mendez |
dirac | 25133 | 1470 | Joel Closier |
jabber | 25134 | 1470 | Joel Closier |
tomcat | none | 1028 | Production Grid-Service |
mysql | none | 1028 | Production Grid-Service |
atlsrv | 28475 | 1028 (local 1307) | Production Grid-Service |
Service | Masters | Passive | Clones | Spares | FCports | Comment |
---|---|---|---|---|---|---|
RB-ALICE | 1 | 0 | 0 | 0 | 1 | Spare shared with RBP-PROD |
RB-ATLAS | 1 | 0 | 0 | 0 | 1 | Spare shared with RBP-PROD |
RB-CMS | 1 | 0 | 0 | 0 | 1 | Spare shared with RBP-PROD |
RB-LHCB | 1 | 0 | 0 | 0 | 1 | Spare shared with RBP-PROD |
RB-PROD | 1 | 0 | 0 | 1 | 2 | |
PX | 2 | 0 | 2 | 2 | Replicated | |
BDIIL | 1 | 1 | 0 | 0 | LCG BDII | |
BDIIP | 1 | 1 | 0 | 0 | PROD BDII (CERN Site) | |
BDIIE | 1 | 1 | 0 | 0 | Experiment BDII | |
CE | 1 | 1 | 0 | 0 | 2 | |
VOMS | 2 | 1 | 0 | 0 | 0 | |
FTS | 7 | 0 | 0 | 2 | Spare shared between VOs | |
LFC-LHCB | 2 | 0 | 0 | 0 | Spare shared between VOs | |
LFC-ALICE | 1 | 0 | 0 | 0 | Spare shared between VOs | |
LFC-ATLAS | 1 | 0 | 0 | 0 | Spare shared between VOs | |
LFC-CMS | 1 | 0 | 0 | 0 | Spare shared between VOs | |
LFC-SHARED | 1 | 0 | 0 | 0 | Shared server for other VOs | |
LFC-PROD | 1 | 0 | 0 | 0 | Backup lfc server for all | |
GRVW | 1 | 1 | 0 | 0 | Grid View |
Machine | Service | CDB Cluster | Purpose | Area | Config | Comment |
---|---|---|---|---|---|---|
bdii001 | BDIIL | gridbdii | LCG BDII Master | UPS | Basic Midrange Server | In prod.To be logically moved to LCG |
bdii002 | BDIIL | gridbdii | LCG BDII Backup | UPS | Basic Midrange Server | In prod.To be logically moved to LCG |
bdii101 | BDIIL | gridbdii | LCG BDII Master | Basic Midrange Server | Switch1.Add to load balancing then stop bdii001. Priority 1 | |
bdii102 | BDIIL | gridbdii | LCG BDII Backup | Basic Midrange Server | Switch2. Add to load balancing then stop bdii002. Priority 1 | |
bdii103 | BDIIP | gridbdii | Site BDII Master | Basic Midrange Server | Switch2. Priority 1 | |
bdii104 | BDIIP | gridbdii | Site BDII Backup | Basic Midrange Server | Switch1. Priority 1 | |
bdii105 | BDIIE | gridbdii | Experiment BDII Master | Basic Midrange Server | Switch1. Priority 1 | |
bdii106 | BDIIE | gridbdii | Experiment BDII Master | Basic Midrange Server | Switch2. Priority 1 | |
ce101 | CEP | gridce | Production CE Master | NFC | Basic Midrange Server | Switch2.Leave unused ce001 in UPS area for now. Priority 2 |
ce102 | CEP | gridce | Production CE Backup | NFC | Basic Midrange Server | Switch1. Priority 2 |
fts101 | FTSP | gridfts | production FTS Transfer Agent Master | Large Memory Midrange Server | Switch1. Priority 4 | |
fts102 | FTSP | gridfts | production FTS Transfer Agent Hot Spare | Large Memory Midrange Server | Switch2. Priority 4 | |
fts103 | FTSP | gridfts | production FTS Web Server Master | Large memory Midrange Server | Switch1. lb name prod-ftsws. Priority 4 | |
fts104 | FTSP | gridfts | production FTS Web Server Master | Large Memory Midrange Server | Switch2. lb name prod-ftsws. Priority 4 | |
fts105 | FTSP | gridfts | production FTS Alice agent | Basic Midrange Server | Switch1. alias prod-ftsvo-alice. Priority 4 | |
fts106 | FTSP | gridfts | production FTS Atlas agent | Basic Midrange Server | Switch2. alias prod-ftsvo-atlas. Priority 4 | |
fts107 | FTSP | gridfts | production FTS CMS agent | Basic Midrange Server | Switch1. alias prod-ftsvo-cms. Priority 4 | |
fts108 | FTSP | gridfts | production FTS LHCB agent | Basic Midrange Server | Switch2. alias prod-ftsvo-lhcb. Priority 4 | |
fts109 | FTSP | gridfts | production experiment agent Hot Spare | Basic Midrange Server | Switch1. Priority 4 | |
grvw001 | GRVWP | gridgrvw | production GRIDVIEW Web server | Basic Midrange Server | Switch1. Priority 5 | |
grvw002 | GRVWP | gridgrvw | production GRIDVIEW data mining server | Basic Midrange Server | Switch2. Priority 5 | |
lfc101 | LFC-LHCB | gridlfc | production LHCb LFC | Basic Midrange Server | Switch1. alias prod-lfc-lhcb. Priority 7 | |
lfc102 | LFC-LHCB | gridlfc | production LHCb LFC Backup | Basic Midrange Server | Switch2. Priority 7 | |
lfc103 | LFC-ALICE | gridlfc | production Alice LFC | Basic Midrange Server | Switch2. alias prod-lfc-alice. Priority 7 | |
lfc104 | LFC-ATLAS | gridlfc | production Atlas LFC | Basic Midrange Server | Switch2. alias prod-lfc-atlas. Priority 7 | |
lfc105 | LFC-CMS | gridlfc | production CMS LFS | Basic Midrange Server | Switch2. alias prod-lfc-cms. Priority 7 | |
lfc106 | LFCP | gridlfc | production shared LFC | Basic Midrange Server | Switch1. alias prod-lfc-shared. Priority 7 | |
lfc107 | LFCP | gridlfc | production LFC backup | Basic Midrange Server | Switch1. Priority 7 | |
rb101 | RB-ALICE | gridrb | RB for Alice | NFC | Extra disk Midrange Server | Switch1. Priority 8 |
rb102 | RB-ATLAS | gridrb | RB for Atlas | NFC | Extra disk Midrange Server | Switch1. Priority 8 |
rb103 | RB-CMS | gridrb | RB for CMS | NFC | Extra disk Midrange Server | Switch1. Priority 8 |
rb104 | RB-LHCB | gridrb | RB for LHCB | NFC | Extra disk Midrange Server | Switch1. Priority 8 |
rb105 | RB-PROD | gridrb | RB for other VOs | NFC | Extra disk Midrange Server | Switch1. Priority 8 |
rb106 | RB-PROD | gridrb | RB spare | NFC | Extra disk Midrange Server | Switch2. Priority 8 |
px101 | PXP | gridpx | Production MyProxy Master | NFC | Basic Midrange Server | Switch2. Priority 3 |
px102 | PXP | gridpx | Production MyProxy Slave | NFC | Basic Midrange Server | Switch1. Priority 3 |
px103 | PXP | gridpx | Production MyProxy Master for FTS | Basic Midrange Server | Switch2. Priority 3 | |
px104 | PXP | gridpx | Production MyProxy Slave for FTS | Basic Midrange Server | Switch1. Priority 3 | |
voms101 | VOMSP | gridvoms | Production VOMS Master | NFC | Large Memory Midrange Server | Switch1. Priority 6 |
voms102 | VOMSP | gridvoms | Production VOMS Slave | NFC | Large Memory Midrange Server | Switch2. Priority 6 |
voms103 | VOMSP | gridvoms | Production VOMS ldap publisher | NFC | Basic Midrange Server | Switch2. Priority 6 |
Attribute | Class U![]() |
Class L | Class M | Class H | Class C |
---|---|---|---|---|---|
Backup | |||||
Configuration | |||||
Facilities | |||||
Hardware | |||||
High Availability | |||||
Monitoring | |||||
Controlled physical access | Badge | Badge | Badge | ||
Power into Data Centre | Redundant | Redundant | |||
Power connection on UPS If HA, only 1 machine required on UPS |
Yes | Yes | |||
Machine in rack | Yes | Yes | Yes | ||
Redundant power supply in PC | Yes | Yes | |||
Internal system disks mirrored | Yes | Yes | Yes | ||
Console remotely accessible | Yes | Yes | Yes | Yes | |
Minimum RAID Levels for data | 5 | 5 | 5 | 5 | |
Redundant Controllers / Paths | Yes | Yes | |||
Off-site copies of backup data | |||||
Yearly backup/restore test | |||||
Redundant network cards | |||||
Status command for each component | Yes | Yes | Yes | ||
Automatic Event reported to console if component down | Yes | Yes | Yes | ||
Automatic configuration from database/xml | Yes | Yes | |||
Standby Levels | Cold | Warm | Hot | ||
Procedures for failover | Administrator | Operator | Automatic | ||
Networking | |||||
Physical | |||||
Storage |