SAM and WLCG statistics for C5

SAM and WLCG statistics are included every week in the IT-GT C5 report. This twiki explains in detail the meaning of the information presented in these weekly statistics.

The statistics refer to the EGI and WLCG infrastructures and they therefore contain information about sites running OSG, EMI, gLite and ARC middleware.

SAM statistics

These statistics are taken from the MyWLCG Portal.

1. Number of EGI and OSG sites

The number of EGI and OSG sites are sites that have been monitored by SAM during the last week under OPS credentials. SAM is available at SAM portal.

2. Number of Services Monitored

The number of Services Monitored is the EGI and OSG services that have been monitored by SAM during the last week under OPS credentials.

3. Number of Metrics Available

The number of Metrics Available is the number of different metrics executed by SAM in the last 7 days.

4. Status of Certified and Production EGI and OSG Sites

Sites per Status numbers are based on the calculation of the site availability. Site availability is described at Gridview Service Availability Computation. Site availability is based on the results of SAM tests defined as critical by each VO for every service, and also scheduled/unscheduled downtimes reported at the GOCDB (which is the official repository for storing and presenting EGI topology and resources information) and at the OIM (which is the official repository for storing and presenting OSG topology and resources information)

Sites consist of the following services: CE service, SE service, SRM service and site BDII service. Each service consists of one or more service instances. Site status are:

  • OK: If all services are OK. A service is OK if at least one of the service instances available in the site is OK. A service instance is OK if all critical SAM tests have passed.
  • Degraded: If at least one service is degraded. A service is degraded if one or more (but not all) service instances are OK.
  • Down: If at least one service is down. A service is down if all service instances are down. A service instance is down if at least one critical SAM test has failed.
  • Maintenance: If at least one service is scheduled for maintenance. A service is down if all service instances are scheduled for maintenance.
  • Not available: If at least one service is not available. A service is not available if all service instances are not available. A service instance is not available when at least one critical SAM test result is not available.

Total number of Sites = Ok + Degraded + Down + Maintenance + Not available

5. EGI and OSG Sites per Software version

EGI software version is coming from the 'org.sam.WN-SoftVer' 'CE' test executed in the WNs. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are not counted. If during the last week, the test landed in different WNs with different version numbers, the highest version number is considered for the site.

Note that:

  • The number of sites with a given software version is based on the software found installed on worker nodes at gLite sites. This calculation doesn't take into account any other service running at the site, only the WNs.
  • Note that only sites running CREAM or lcg-CE are taken into account (i.e. ARC CEs are not taken into account). Moreover, if a site was down and couldn't execute the 'CE-sft-softver' CE test, it wasn't counted either. This explains why the total number doesn't match the number of EGI sites presented in section 1.

OSG software version is coming from 'org.osg.general.osg-version' CE test.

WLCG statistics

Number of Official WLCG Sites

It is the number of offical WLCG sites as reported by REBUS i.e. those that have signed MoUs.

Executed jobs

Job statistics taken from the EGI Accounting Portal for VO group 'lhc'. This number is related to WLCG VOs.

CE Deployment

The following statistics are presented:

  • Number of CREAM CEs unique hosts deployed
  • Number of LCG-CEs unique hosts deployed
  • Number of GLOBUS unique hosts deployed
  • Number of sites supporting CREAM CEs
  • Number of sites supporting LCG-CEs
  • Number of sites with only a CREAM CE
  • Number of sites with only an LCG CE

This information is extracted from the Grid Information System (lcg-bdii) It can be checked by counting the sites that have clusters or sites which have CEs with GlueCEImplementationName=CREAM or GlueCEImplementationName=LCG-CE or GlueCEImplementationName=GLOBUS. The results are filtered to only show offical WLCG sites as reported by REBUS.

LRMS Systems

Extracted from the Grid information system using the GlueCE object and the attribute GlueCEInfoLRMSType. The results are filtered to only show offical WLCG sites as reported by REBUS.

Installed Capacity by OS

Extracted from the Grid information system using the GlueSubCluster object and the attributes GlueSubClusterLogicalCPUs, GlueHostProcessorOtherDescription, GlueHostOperatingSystemName and GlueHostOperatingSystemNameRelease. The results are filtered to only show offical WLCG sites as reported by REBUS.

Sites should follow the instructions on how to publish the OS in the information system. More operating systems or CE types can be added if necessary.

In the table presented per operating system, all WLCG sites are taken into account. The number of subclusters (set of homogeneous resources), logical CPUs, total SI2000 and HEPSpec06 is calculated.

SI2000 means SPECint 2000 which is a computer benchmark specification for CPU's integer processing power. It is maintained by the Standard Performance Evaluation Corporation (SPEC). It tests the CPU performance of a computer. The site computing power is calculated summing up the computing power of each node.

On the other hand SPECint 2000 benchmark doesn't reflect the real system performance for HEP applications. A new benchmark called HEPSPEC06 has been selected. The report shows SPECint 2000 numbers due to historical reasons to simplify comparison with reports from previous years.

Note that sometimes SI2000 or HEPSpec06 are showing wrong values (or zero). It's due to fact that sites do not publish correct information in the Information System. In most cases it happens on "non standard" Linux distributions (like Debian or Ubuntu), where our grid middleware isn't supported.

Installed Capacity by CE Type

Extracted from the Grid information system using the GlueCE object and the attribute GlueCEImplementationName. The results are filtered to only show offical WLCG sites as reported by REBUS. The capacities shown are those of the cluster which can be accessed via the interface.

In the table presented per CE type, only sites running gLite CEs are taken into account. Moreover, the logical number of CPUs actually represent the amount of CPUs that can be accessed per CE type. These two numbers should not be summed up to know the total amount of installed capacity. It's better to use this number from the previous table. This is due to inconsistent information that may be published in a site running both lcg-CE and CREAM CEs and accessing the same set of physical resources.

Installed Capacity by LRMS Type

Extracted from the Grid information system using the GlueCE object and the attribute GlueCEInfoLRMSType. The results are filtered to only show offical WLCG sites as reported by REBUS. The capacities shown are those of the cluster which is managed by the LRMS.

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2013-05-31 - AlbertoAimar
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback