SAM and WLCG statistics for C5
SAM and WLCG statistics are included every week in the IT-GT C5 report. This twiki explains in detail the meaning of the information presented in these weekly statistics.
The statistics refer to the EGI and WLCG infrastructures and they therefore contain information about sites running OSG, EMI, gLite and ARC middleware.
SAM statistics
These statistics are taken from the
MyWLCG Portal
.
1. Number of EGI and OSG sites
The number of EGI and OSG sites are sites that have been monitored by SAM during the last week under OPS credentials.
SAM is available at
SAM portal
.
2. Number of Services Monitored
The number of Services Monitored is the EGI and OSG services that have been monitored by
SAM
during the last week under OPS credentials.
3. Number of Metrics Available
The number of Metrics Available is the number of different metrics executed by
SAM
in the last 7 days.
4. Status of Certified and Production EGI and OSG Sites
Sites per Status numbers are based on the calculation of the site availability. Site availability is described at
Gridview Service Availability Computation
. Site availability is based on the results of SAM tests defined as critical by each VO for every service, and also scheduled/unscheduled downtimes reported at the
GOCDB
(which is the official repository for storing and presenting EGI topology and resources information) and at the
OIM
(which is the official repository for storing and presenting OSG topology and resources information)
Sites consist of the following services: CE service, SE service, SRM service and site BDII service. Each service consists of one or more service instances. Site status are:
- OK: If all services are OK. A service is OK if at least one of the service instances available in the site is OK. A service instance is OK if all critical SAM tests have passed.
- Degraded: If at least one service is degraded. A service is degraded if one or more (but not all) service instances are OK.
- Down: If at least one service is down. A service is down if all service instances are down. A service instance is down if at least one critical SAM test has failed.
- Maintenance: If at least one service is scheduled for maintenance. A service is down if all service instances are scheduled for maintenance.
- Not available: If at least one service is not available. A service is not available if all service instances are not available. A service instance is not available when at least one critical SAM test result is not available.
Total number of Sites = Ok + Degraded + Down + Maintenance + Not available
5. EGI and OSG Sites per Software version
EGI software version is coming from the 'org.sam.WN-SoftVer' 'CE' test executed in the WNs. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are not counted. If during the last week, the test landed in different WNs with different version numbers, the highest version number is considered for the site.
Note that:
- The number of sites with a given software version is based on the software found installed on worker nodes at gLite sites. This calculation doesn't take into account any other service running at the site, only the WNs.
- Note that only sites running CREAM or lcg-CE are taken into account (i.e. ARC CEs are not taken into account). Moreover, if a site was down and couldn't execute the 'CE-sft-softver' CE test, it wasn't counted either. This explains why the total number doesn't match the number of EGI sites presented in section 1.
OSG software version is coming from 'org.osg.general.osg-version' CE test.
WLCG statistics
Number of Official WLCG Sites
It is the number of offical WLCG sites as reported by
REBUS
i.e. those that have signed
MoUs.
Executed jobs
Job statistics taken from the
EGI Accounting Portal
for VO group 'lhc'. This number is related to WLCG VOs.
CE Deployment
The following statistics are presented:
- Number of CREAM CEs unique hosts deployed
- Number of LCG-CEs unique hosts deployed
- Number of GLOBUS unique hosts deployed
- Number of sites supporting CREAM CEs
- Number of sites supporting LCG-CEs
- Number of sites with only a CREAM CE
- Number of sites with only an LCG CE
This information is extracted from the Grid Information System (lcg-bdii) It can be checked by counting the sites that have clusters or sites which have CEs with
GlueCEImplementationName=CREAM
or
GlueCEImplementationName=LCG-CE
or
GlueCEImplementationName=GLOBUS
. The results are filtered to only show offical WLCG sites as reported by REBUS.
LRMS Systems
Extracted from the Grid information system using the
GlueCE
object and the attribute
GlueCEInfoLRMSType
. The results are filtered to only show offical WLCG sites as reported by REBUS.
Installed Capacity by OS
Extracted from the Grid information system using the
GlueSubCluster
object and the attributes
GlueSubClusterLogicalCPUs
,
GlueHostProcessorOtherDescription
,
GlueHostOperatingSystemName
and
GlueHostOperatingSystemNameRelease
. The results are filtered to only show offical WLCG sites as reported by REBUS.
Sites should follow the instructions on
how to publish the OS in the information system
. More operating systems or CE types can be added if necessary.
In the table presented per operating system, all WLCG sites are taken into account. The number of subclusters (set of homogeneous resources), logical CPUs, total SI2000 and HEPSpec06 is calculated.
SI2000 means SPECint 2000 which is a computer benchmark specification for CPU's integer processing power. It is maintained by the Standard Performance Evaluation Corporation (SPEC). It tests the CPU performance of a computer. The site computing power is calculated summing up the computing power of each node.
On the other hand SPECint 2000 benchmark
doesn't reflect the real system performance for HEP applications
. A new benchmark called
HEPSPEC06
has been selected. The report shows SPECint 2000 numbers due to historical reasons to simplify comparison with reports from previous years.
Note that sometimes SI2000 or HEPSpec06 are showing wrong values (or zero). It's due to fact that sites do not publish correct information in the Information System.
In most cases it happens on "non standard" Linux distributions (like Debian or Ubuntu), where our grid middleware isn't supported.
Installed Capacity by CE Type
Extracted from the Grid information system using the
GlueCE
object and the attribute
GlueCEImplementationName
. The results are filtered to only show offical WLCG sites as reported by REBUS. The capacities shown are those of the cluster which can be accessed via the interface.
In the table presented per CE type, only sites running gLite CEs are taken into account. Moreover, the logical number of CPUs actually represent the amount of CPUs that can be accessed per CE type. These two numbers should not be summed up to know the total amount of installed capacity. It's better to use this number from the previous table. This is due to inconsistent information that may be published in a site running both lcg-CE and
CREAM CEs and accessing the same set of physical resources.
Installed Capacity by LRMS Type
Extracted from the Grid information system using the
GlueCE
object and the attribute
GlueCEInfoLRMSType
. The results are filtered to only show offical WLCG sites as reported by REBUS. The capacities shown are those of the cluster which is managed by the LRMS.