gLite CLUSTER

Introduction

glite-CLUSTER is a node type that can publish information about clusters and subclusters in a site, referenced by any number of compute elements.

If you want to understand in detail why this node type is needed and the advantages of using it, please check the following Technical Note from Stephen Burke, Flavia Donno and Maarten Litmaath.

Deployment Scenarios

glite-CLUSTER can be deployed in the same host of the lcg-CE or in a different one. Check the sections below to know more about each deployment scenario.

glite-CLUSTER and lcg-CE

non-cluster mode

lcg-CE can be configured as usual without worrying about the glite-CLUSTER node. This can be useful for small sites who don't want to worry about cluster/subcluster configurations because they have a very simple setup. In this case lcg-CE will publish a single cluster/subcluster.

cluster mode

lcg-CE can work on cluster mode using the glite-CLUSTER node type by defining LCGCE_CLUSTER_MODE=yes. The lcg-CE can be in the same host or in a different host from the glite-CLUSTER node.

For the same host, please run:

yum install lcg-CE
yum install glite-CLUSTER
yum install glite-LRMS_utils, where LRMS is TORQUE, LSF, SGE or CONDOR
yaim -c -s site-info.def -n lcg-CE glite-CLUSTER glite-LRMS_utils

For different hosts, please run:

yum install lcg-CE
yum install glite-LRMS_utils, where LRMS is TORQUE, LSF, SGE or CONDOR
yaim -c -s site-info.def -n lcg-CE glite-LRMS_utils
yum install glite-CLUSTER
yaim -c -site-info.def -n glite-CLUSTER

In cluster mode, there are new lcg-CE, yaim configuration variables which must be set. Check the lcg-CE configuration variables twiki for more details.

In order to configure the glite-CLUSTER, please check the glite-CLUSTER configuration variables twiki.

Note on sw tags and WN configuration

If a glite-CLUSTER node is to be used with the lcg-CE on a separate machine, then it becomes possible for VO managers who want to set their application tags, to do so per subcluster, using the --sc option in the lcg-tags or lcg-ManageVOTag commands.

This also requires that a user can discover the relevant subcluster name on a given WN. The glite-wn-info command is used to do that using the configuration file ${GLITE_LOCATION}/etc/glite-wn-info.conf, where the subcluster ID is set. YAIM can automatically configure glite-wn-info.conf if the WN_LIST file is properly configured as explained in the WN_list section of the YAIM configuration guide.

Known issues

For glite-CLUSTER 3.1.4 when installing glite-CLUSTER and lcg-CE in the same machine: In case your new or reconfigured subclusters are named differently than before, the old directory in /opt/glite/var/info/ should be deleted, otherwise details of the old subcluster keep being published.

The cluster unique ID (i.e. set with the CE_HOST_<host-name>_CLUSTER_UniqueID in cluster mode) must not contain upper case letters, i.e. it may contain only lower case alpha numeric, or the three characters '.', '_' and '-'.

When setting up the lcg-CE with a glite-CLUSTER node on a separate machine, the VO application tag directories at lcgce:$EDG_LOCATION/var/info/ should be shared with cluster:$EDG_LOCATION/var/info/.

glite-CLUSTER and CREAM

There are instructions in the 3.2 glite-CLUSTER release notes for modifying an existing, already configured cream CE to make use of a glite-CLUSTER node at the site. Currently YAIM can not setup the cream CE to do this automatically.

Note that It is not possible to co-locate a creamCE and glite-CLUSTER on the same node. They have to be installed in separate hosts.

glite-CLUSTER check

You can check whether glite-CLUSTER is properly configured by querying the information system. If you query the resource BDII of the glite-CLUSTER node, you should see something like the output below. This is basically the same as you have with the existing configuration, but the details should obviously reflect what you configured in yaim.

In particular, check that the references to GlueCEUniqueIDs in the GlueCluster object(s) correspond to the right queues. Also, check by querying the CE information that the GlueCE objects have the right reverse reference (GlueForeignKey) to the Cluster.

Note the following scenarios when querying the resource BDII:

  • if a box hosts only a glite-CLUSTER, its resource bdii should publish GlueCluster + GlueSubCluster (but not GlueCE).
  • if a box hosts only a CE configured in cluster mode, its resource bdii should publish GlueCE (but not GlueCluster + GlueSubCluster).

Nothe that if you query the site BDII, the results should be the same: some number of GlueCE objects, each linked to a single GlueCluster (many-to-one), and each GlueCluster linked to one GlueSubCluster (one-to-one).

ldapsearch -x -h localhost -p 2170 -b "mds-vo-name=resource,o=grid"

# extended LDIF
#
# LDAPv3
# base <mds-vo-name=resource,o=grid> with scope sub
# filter: (objectclass=*)
# requesting: ALL
#

# resource, grid
dn: Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: Mds
Mds-Vo-name: resource

# vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528, resource, grid
dn: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublisher_28559765
 28,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceUniqueID: vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528
GlueServiceName: BUDAPEST-RTEPublisher
GlueServiceType: org.glite.RTEPublisher
GlueServiceVersion: 1.0.0
GlueServiceEndpoint: gsiftp://vtb-generic-21.cern.ch:2811/opt/glite/var/info
GlueServiceStatus: OK
GlueServiceStatusInfo: globus-gridftp-server (pid 10588) is running...
GlueServiceSemantics: http://grid-deployment.web.cern.ch/grid-deployment/eis/d
 ocs/ExpSwInstall/sw-install.html
GlueServiceStartTime: 2010-11-22T12:21:55+01:00
GlueServiceOwner: dteam
GlueServiceAccessControlBaseRule: VOMS:/dteam/Role=lcgadmin
GlueForeignKey: GlueSiteUniqueID=BUDAPEST
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

# GlueSubClusterUniqueID:gergosubcluster, vtb-generic-21.cern.ch_org.glite.RT
 EPublisher_2855976528, resource, grid
dn: GlueServiceDataKey=GlueSubClusterUniqueID:gergosubcluster,GlueServiceUniqu
 eID=vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528,Mds-Vo-name=reso
 urce,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceDataKey: GlueSubClusterUniqueID:gergosubcluster
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
 r_2855976528
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

# gergo.clus-ter, resource, grid
dn: GlueClusterUniqueID=gergo.clus-ter,Mds-Vo-name=resource,o=grid
objectClass: GlueClusterTop
objectClass: GlueCluster
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueClusterName: GergoCluster human readable
GlueClusterService: vtb-generic-12.cern.ch:2119/lcgpbs-dteam-jobmanager-dteam
GlueClusterUniqueID: gergo.clus-ter
GlueForeignKey: GlueSiteUniqueID=Budapest
GlueForeignKey: GlueCEUniqueID=vtb-generic-12.cern.ch:2119/lcgpbs-jobmanager-d
 team
GlueInformationServiceURL: ldap://vtb-generic-21.cern.ch:2170/mds-vo-name=reso
 urce,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

# glite-info-service_version, vtb-generic-21.cern.ch_org.glite.RTEPublisher_2
 855976528, resource, grid
dn: GlueServiceDataKey=glite-info-service_version,GlueServiceUniqueID=vtb-gene
 ric-21.cern.ch_org.glite.RTEPublisher_2855976528,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceDataKey: glite-info-service_version
GlueServiceDataValue: 1.5
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
 r_2855976528
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

# glite-info-service_hostname, vtb-generic-21.cern.ch_org.glite.RTEPublisher_
 2855976528, resource, grid
dn: GlueServiceDataKey=glite-info-service_hostname,GlueServiceUniqueID=vtb-gen
 eric-21.cern.ch_org.glite.RTEPublisher_2855976528,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceDataKey: glite-info-service_hostname
GlueServiceDataValue: vtb-generic-21.cern.ch
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
 r_2855976528
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

# gergosubcluster, gergo.clus-ter, resource, grid
dn: GlueSubClusterUniqueID=gergosubcluster,GlueClusterUniqueID=gergo.clus-ter,
 Mds-Vo-name=resource,o=grid
objectClass: GlueClusterTop
objectClass: GlueSubCluster
objectClass: GlueHostApplicationSoftware
objectClass: GlueHostArchitecture
objectClass: GlueHostBenchmark
objectClass: GlueHostMainMemory
objectClass: GlueHostNetworkAdapter
objectClass: GlueHostOperatingSystem
objectClass: GlueHostProcessor
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueChunkKey: GlueClusterUniqueID=gergo.clus-ter
GlueHostApplicationSoftwareRunTimeEnvironment: GPU
GlueHostApplicationSoftwareRunTimeEnvironment: GPU-TEST-2
GlueHostArchitectureSMPSize: 12
GlueHostArchitecturePlatformType: intel
GlueHostBenchmarkSF00: 100
GlueHostBenchmarkSI00: 100
GlueHostMainMemoryRAMSize: 100
GlueHostMainMemoryVirtualSize: 100
GlueHostNetworkAdapterInboundIP: TRUE
GlueHostNetworkAdapterOutboundIP: TRUE
GlueHostOperatingSystemName: linux
GlueHostOperatingSystemRelease: gekko
GlueHostOperatingSystemVersion: 3.4
GlueHostProcessorClockSpeed: 100
GlueHostProcessorModel: 200
GlueHostProcessorVendor: 300
GlueHostProcessorOtherDescription: mydescription
GlueSubClusterName: GergoSubcluster human readable
GlueSubClusterUniqueID: gergosubcluster
GlueSubClusterPhysicalCPUs: 100
GlueSubClusterLogicalCPUs: 200
GlueSubClusterTmpDir: /tmp
GlueSubClusterWNTmpDir: /tmp
GlueInformationServiceURL: ldap://vtb-generic-21.cern.ch:2170/mds-vo-name=reso
 urce,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

# glite-version, vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528, re
 source, grid
dn: GlueServiceDataKey=glite-version,GlueServiceUniqueID=vtb-generic-21.cern.c
 h_org.glite.RTEPublisher_2855976528,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
GlueServiceDataKey: glite-version
GlueServiceDataValue: 3.1.0
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
 r_2855976528

# search result
search: 2
result: 0 Success

# numResponses: 9
# numEntries: 8

Use Cases

The following use cases represent the most common scenarios. Check the definitions below to be able to understand the diagrams.

  • RTE Publisher: Run Time Environment Service Publisher. It publishes information about the glite-CLUSTER service in the information system.
  • GlueCluster: A GlueCluster in the Glue Schema gives a representation of a set of physical resources (hosts or Worker Nodes or computers) behind a CE.
  • GlueSubcluster: A GlueSubCluster refers to homogeneous set of hosts as regards the selected attributes. This entity provides details of the machines that offer execution environments to jobs.
  • gridftp server: needed by the WNs to be able to copy in the glite-CLUSTER node which software is installed.
  • /opt/glite/var/info/SubCluster1/VO1: Location where the information about the software installed in the WNs is copied in the glite-CLUSTER node.
  • Head Node: It can be a lcg-CE or CREAM CE.
  • GlueCE: A GlueCE entry in the Glue Schema represents a Computing Element which is an abstraction for an entity managing computing resources exposed to the Grid.
  • Head Node Service Publisher: It publishes information about the lcg-CE or CREAM CE service in the information system.
  • lcg-info-dynamic-software: plugin that publishes information about the software installed in the WNs in the GlueSubCluster.
  • glite-info-service: plugin that actually publishes the service information on the resource BDII.
  • glite-info-dynamic-_lrms_: plugin that actually publishes information relevant to the batch system queues in the GlueCE.
  • LRMS: Local Resource Management System, that is the batch system.
  • glite-wn-info: command used by the WNs to know under which GlueSubcluster they are represented.
  • lcg-tags/lcg-ManageVOtags --subcluster: command used by the WNs to copy information about the software they have installed to the glite-CLUSTER node.

One Cluster/SubCluster, One Head Node, One GlueCE, One LRMS queue

1-clu-1-head-1-ce-1-queue.jpg

Configuration variables

In the previous scenario, the following configuration variables are needed (the values are only an example):

  • glite-CLUSTER
    # The Cluster variables should contain the name of the cluster variable in upper case
    CLUSTER_HOST="vtb-generic-74.cern.ch"
    CLUSTERS="yaim"
    CLUSTER_YAIM_CLUSTER_UniqueID=my-yaim
    CLUSTER_YAIM_CLUSTER_Name="this is the yaim cluster"
    CLUSTER_YAIM_SITE_UniqueID=yaim                                
    CLUSTER_YAIM_CE_TYPE="jobmanager"
    CLUSTER_YAIM_INFO_PORT=2170
    CLUSTER_YAIM_INFO_TYPE=resource
    
    # The CE host variables should contain the name of the CE hostname in lower case and replace '.' and '-' with '_'
    CLUSTER_YAIM_CE_HOSTS="vtb-generic-64.cern.ch"
    CE_HOST_vtb_generic_64_cern_ch_CE_TYPE="jobmanager"
    CE_HOST_vtb_generic_64_cern_ch_QUEUES="dteam"
    CE_HOST_vtb_generic_64_cern_ch_CE_InfoJobManager="lcgpbs"
    
    # The Subcluster variables should contain the name of the subcluster variable in upper case
    SUBCLUSTER_SLC4_SUBCLUSTER_UniqueID=slc4
    SUBCLUSTER_SLC4_HOST_ApplicationSoftwareRunTimeEnvironment="LCG-2|LCG-2_1_0|LCG-2_1_1|LCG-2_2_0"   # CE_RUNTIMEENV
    SUBCLUSTER_SLC4_HOST_ArchitectureSMPSize=2                                                         # CE_SMPSIZE
    SUBCLUSTER_SLC4_HOST_ArchitecturePlatformType=i686                                                 # CE_OS_ARCH
    SUBCLUSTER_SLC4_HOST_BenchmarkSF00=0                                                               # CE_SF00
    SUBCLUSTER_SLC4_HOST_BenchmarkSI00=381                                                             # CE_SI00
    SUBCLUSTER_SLC4_HOST_MainMemoryRAMSize=513                                                         # CE_MINPHYSMEM
    SUBCLUSTER_SLC4_HOST_MainMemoryVirtualSize=1025                                                    # CE_MINVIRTMEM
    SUBCLUSTER_SLC4_HOST_NetworkAdapterInboundIP=FALSE                                                 # CE_INBOUNDIP
    SUBCLUSTER_SLC4_HOST_NetworkAdapterOutboundIP=TRUE                                                 # CE_OUTBOUNDIP
    SUBCLUSTER_SLC4_HOST_OperatingSystemName="Scientific Linux"                                        # CE_OS
    SUBCLUSTER_SLC4_HOST_OperatingSystemRelease=3.0.6                                                  # CE_OS_RELEASE
    SUBCLUSTER_SLC4_HOST_OperatingSystemVersion="SL"                                                   # CE_OS_VERSION
    SUBCLUSTER_SLC4_HOST_ProcessorClockSpeed=1001                                                      # CE_CPU_SPEED
    SUBCLUSTER_SLC4_HOST_ProcessorModel=PIII                                                           # CE_CPU_MODEL
    SUBCLUSTER_SLC4_HOST_ProcessorVendor=intel                                                         # CE_CPU_VENDOR
    SUBCLUSTER_SLC4_SUBCLUSTER_Name="my subcluster YAIM"
    SUBCLUSTER_SLC4_SUBCLUSTER_PhysicalCPUs=1                                                          # CE_PHYSCPU
    SUBCLUSTER_SLC4_SUBCLUSTER_LogicalCPUs=1                                                           # CE_LOGCPU
    SUBCLUSTER_SLC4_SUBCLUSTER_TmpDir=/tmp
    SUBCLUSTER_SLC4_SUBCLUSTER_WNTmpDir=/tmp
    
  • lcg-CE
    CE_HOST=vtb-generic-64.cern.ch
    CE_HOST_vtb_generic_64_cern_ch_CLUSTER_UniqueID=my-yaim
    CE_HOST_vtb_generic_64_cern_ch_CE_InfoApplicationDir=/sw_dir
    CE_HOST_vtb_generic_64_cern_ch_CE_TYPE=jobmanager
    
    # Distributed in site-info.def
    CE_HOST_vtb_generic_64_cern_ch_CE_InfoJobManager=lcgpbs
    CE_HOST_vtb_generic_64_cern_ch_QUEUE_DTEAM_VOVIEW_DTEAM_CE_StateWaitingJobs=666666
    CE_HOST_vtb_generic_64_cern_ch_QUEUES="dteam"
    CE_HOST_vtb_generic_64_cern_ch_QUEUE_DTEAM_CE_AccessControlBaseRule="dteam"
    
  • If you use glite-TORQUE_server
    
    # The following "old variables" still need to be defined for the TORQUE server.
    QUEUES="dteam"
    DTEAM_GROUP_ENABLE="dteam"
    CE_SMPSIZE=2
    

FAQ

  1. What are the implications/advantages of using the CLUSTER node type at our site? Any disadvantages? The advantages are described in the Technical Note. The risk is that if you publish the wrong thing it may affect job submission and/or installed capacity publication, but that's also true with the current system. It is possible to migrate gradually, i.e. you can have a mixture of CEs which are connected to the cluster node and others which keep the existing setup.
  2. How do we build the CLUSTER node type? Please, check the gLite web Pages to know how to install glite-CLUSTER. Then, check the YAIM configuration variable twiki to know how to configure glite-CLUSTER.
  3. What do I need to do to my current nodes (SEs, CEs, BDII, ... ) to make them interact with the new CLUSTER node type? The cluster node has a resource BDII like any other node, which allows the published information to be collected by the site BDII. It doesn't interact with the SEs, but it has a rather intimate connection with the CEs, because the GlueCE objects link to the GlueCluster objects and vice versa.
  4. If I deploy the glite-CLUSTER on a node with no CREAM CE installed do I need to setup the batch system specific support on that node? Yes, the information providers used by the glite-CLUSTER require the batch system software in order to query the local resource management system.
  5. Is there anything else I need to do to make any other site aware of and/or interact with the CLUSTER node type? Cluster publication doesn't change anything about the way the glue schema works, it's just about configuration, so if it's configured correctly nothing external to the site will notice.

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2013-08-29 - PaulAndreetto
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback