YAIM cluster configuration: phase 1
Introduction
This wiki page describes the steps that are needed to test the new yaim cluster module that contains the configuration of the Glue cluster and Glue subcluster entities. In phase 1, the idea is to configure one cluster, one subcluster and one lcg CE in the same host.
The relevant yaim modules that are needed to test the new cluster configuration are:
- yaim lcg ce: this module has been modified to include new variables and to remove the code that configures the Glue cluster and Glue subcluster entities. Some functions have been transfered from the lcg CE to the CLUSTER node type (i.e.
config_gip_software_plugin
, config_info_service_rtepublish
).
- yaim cluster: this is a new module that contains the configuration of the Glue cluster and Glue subcluster entities.
- yaim torque server: it hasn't been changed. This implies that while lcg ce has been changed to use new variables, torque server still uses old variables, which implies that new and old variables will have to coexist at this point.
- yaim torque utils: it hasn't been changed but it's not affected by the new cluster configuration.
The goal is to install and configure an lcg CE with the new yaim cluster module and perform the usual lcg CE tests.
Installation instructions
Clean installation
In order to test the new cluster configuration, you can install the following metapackages:
lcg-CE
glite-TORQUE_server
glite-TORQUE_utils
Optionally, you can also install a
glite-BDII
, if you want to run a site
BDII.
Then you should upgrade the glite-yaim-lcg-ce rpm by running:
rpm -U /afs/cern.ch/project/gd/www/yaim/testing/cluster-testing/glite-yaim-lcg-ce-5.0.0-1.noarch.rpm
or
rpm -U http://grid-deployment.web.cern.ch/grid-deployment/yaim/testing/cluster-testing/glite-yaim-lcg-ce-5.0.0-1.noarch.rpm
And install the new cluster configuration yaim module by running:
rpm -U /afs/cern.ch/project/gd/www/yaim/testing/cluster-testing/glite-yaim-cluster-1.0.0-2.noarch.rpm
or
rpm -U http://grid-deployment.web.cern.ch/grid-deployment/yaim/testing/cluster-testing/glite-yaim-cluster-1.0.0-2.noarch.rpm
Now follow the configuration instructions.
Upgrade
In order to test the new cluster configuration, you can install the following metapackages:
lcg-CE
glite-TORQUE_server
glite-TORQUE_utils
Optionally, you can also install a
glite-BDII
, if you want to run a site
BDII.
Run yaim to configure your services:
./yaim -c -s site-info.def -n lcg-CE (-n BDII_site) -n TORQUE_server -n TORQUE_utils
Then upgrade the glite-yaim-lcg-ce rpm by running:
rpm -U /afs/cern.ch/project/gd/www/yaim/testing/cluster-testing/glite-yaim-lcg-ce-5.0.0-1.noarch.rpm
And install the new cluster configuration yaim module by running:
rpm -i /afs/cern.ch/project/gd/www/yaim/testing/cluster-testing/glite-yaim-cluster-1.0.0-2.noarch.rpm
Now follow the configuration instructions.
Configuration instructions
Since there's a set of new variables, you would need to change your usual site-info.def:
lcg CE
Mandatory variables for the lcg CE: You'll find them under
/opt/glite/yaim/examples/services/lcg-ce
:
The new variable names follow this syntax:
- In general, variables based on hostnames, queues or VOViews containing '.' and '_' # should be transformed into '-'
- <host-name>: identifier that corresponds to the CE hostname in lower case. Example: ctb-generic-1.cern.ch -> ctb_generic_1_cern_ch
- <queue-name>: identifier that corresponds to the queue in upper case. Example: dteam -> DTEAM
- <voview-name>: identifier that corresponds to the VOView id in upper case. '/' and '=' should also be transformed into '_'. Example: /dteam/Role=admin -> DTEAM_ROLE_ADMIN
Variable Name |
Description |
Value type |
Version |
CE_HOST_<host-name>_CLUSTER_UniqueID |
UniqueID of the cluster the CE belongs to |
string |
glite-yaim-lcg-ce 4.0.5-1 |
CE_InfoApplicationDir |
Prefix of the experiment software directory in a site. This variable has been renamed in the new infosys configuration. The old variable name was: VO_SW_DIR . This parameter can be defined per CE, queue, site or voview. See /opt/glite/yaim/examples/services/lcg-ce for examples. |
string |
glite-yaim-lcg-ce 4.0.5-1 |
The following variables will be distributed in the future in site-info.def since they affect other yaim modules. At this moment we are in a transition face to migrate to the new variable names.
Variable Name |
Description |
Value type |
Version |
CE_HOST_<host-name>_CE_TYPE |
CE type: 'jobmanager' for lcg CE and 'cream' for cream CE |
string |
glite-yaim-lcg-ce 4.0.5-1 |
CE_HOST_<host-name>_QUEUES |
Space separated list of the queue names configured in the CE. This variable has been renamed in the new infosys configuration. The old variable name was: QUEUES |
string |
glite-yaim-lcg-ce 4.0.5-1 |
CE_HOST_<host-name>_QUEUE_<queue-name>_CE_AccessControlBaseRule |
Space separated list of FQANS and/or VO names which are allowed to access the queues configured in the CE. This variable has been renamed in the new infosys configuration. The old variable name was: _GROUP_ENABLE |
string |
glite-yaim-lcg-ce 4.0.5-1 |
CE_HOST_<host-name>_CE_InfoJobManager |
The name of the job manager used by the gatekeeper. This variable has been renamed in the new infosys configuration. The old variable name was: JOB_MANAGER . Please, define: lcgpbs, lcglfs, lcgsge or lcgcondor |
string |
glite-yaim-lcg-ce 4.0.5-1 |
JOB_MANAGER |
The old variable is still needed since config_jobmanager in yaim core hasn't been modified to use the new variable. To be done. |
string |
OLD variable |
Default variables for the lcg CE: You'll find them under:
-
/opt/glite/yaim/defaults/lcg-ce.pre
:
It contains a list of
CE_*
variables with some default values. These are the Glue schema properties belonging to the Compuing Element and the VOView entities. By default, these variables are specified per CE, but they can also be specified per queue or per VOVIEW, depending if we want that all the VOViews of a queue share a specific value or depending if we want that a certain VOViews has a specific value. For example, if I define in site-info.def:
# In the CE vtb-generic-17.cern.ch, in the queue dteam, in the VOView dteam,
# I want that the default value StateWaitingJobs is 666666
CE_HOST_vtb_generic_17_cern_ch_QUEUE_DTEAM_VOVIEW_DTEAM_CE_StateWaitingJobs=666666
Or I can also define:
# In the CE vtb-generic-17.cern.ch, in the queue dteam, in all the supported VOViews,
# I want that the default value StateWaitingJobs is 666666
CE_HOST_vtb_generic_17_cern_ch_QUEUE_DTEAM_CE_StateWaitingJobs=666666
If none of the above is defined, the default value for the whole CE, defined in
/opt/glite/yaim/defaults/lcg-ce.pre
, is taken.
The variables that can be redefined per CE-queue are:
CE_VAR="
ImplementationName
ImplementationVersion
InfoGatekeeperPort
InfoLRMSType
InfoLRMSVersion
InfoJobManager
InfoApplicationDir
InfoDataDir
InfoDefaultSE
InfoTotalCPUs
StateEstimatedResponseTime
StateRunningJobs
StateStatus
StateTotalJobs
StateWaitingJobs
StateWorstResponseTime
StateFreeJobSlots
StateFreeCPUs
PolicyMaxCPUTime
PolicyMaxObtainableCPUTime
PolicyMaxRunningJobs
PolicyMaxWaitingJobs
PolicyMaxTotalJobs
PolicyMaxWallClockTime
PolicyMaxObtainableWallClockTime
PolicyPriority
PolicyAssignedJobSlots
PolicyMaxSlotsPerJob
PolicyPreemption"
The variables that moreover can also be redefined per CE-queue-VOVIEW are:
VOVIEW_VAR="
StateRunningJobs
StateWaitingJobs
StateTotalJobs
StateFreeJobSlots
StateEstimatedResponseTime
StateWorstResponseTime
InfoDefaultSE
InfoApplicationDir
InfoDataDir
"
If the
Glue schema
supports other variables than the ones defined here, you can just add new ones by redefining
CE_VAR
and/or
VOVIEW_VAR
in site-info.def. It's the list of variables contained in
CE_VAR
and
VOVIEW_VAR
what
YAIM uses to create the ldif file.
-
/opt/glite/yaim/defaults/lcg-ce.post
:
It defines some auxiliary variables:
Variable Name |
Description |
Value type |
Default Value |
Version |
CE_ImplementationVersion |
The version of the implementation. This should probably be in .pre instead of .post |
version |
3.1 |
glite-yaim-lcg-ce 4.0.5-1 |
CE_InfoLRMSType |
Type of the underlying Resource Management System |
string |
${CE_BATCH_SYS} |
glite-yaim-lcg-ce 4.0.5-1 |
STATIC_CREATE |
Path to the script that creates the ldif file |
path |
${INSTALL_ROOT}/glite/sbin/glite-info-static-create |
glite-yaim-lcg-ce 4.0.5-1 |
TEMPLATE_DIR |
Path to the ldif templates directory |
path |
${INSTALL_ROOT}/glite/etc |
glite-yaim-lcg-ce 4.0.5-1 |
CONF_DIR |
Path to the temporary configuration directory |
path |
${INSTALL_ROOT}/glite/var/tmp/gip |
glite-yaim-lcg-ce 4.0.5-1 |
LDIF_DIR |
Path to the ldif directory |
path |
${INSTALL_ROOT}/glite/etc/gip/ldif |
glite-yaim-lcg-ce 4.0.5-1 |
GlueCE_ldif |
Path to the GlueCE ldif file |
path |
${LDIF_DIR}/static-file-CE.ldif |
glite-yaim-lcg-ce 4.0.5-1 |
GlueCESEBind_ldif |
Path to the GlueCESEBind ldif file |
path |
${LDIF_DIR}/static-file-CESEBind.ldif |
glite-yaim-lcg-ce 4.0.5-1 |
Cluster
Mandatory variables for the cluster: You'll find them under
/opt/glite/yaim/examples/services/glite-cluster
:
The new variable names follow this syntax:
- In general, variables based on hostnames, queues or VOViews containing '.' and '_' # should be transformed into '-'
- <host-name>: identifier that corresponds to the CE hostname in lower case. Example: ctb-generic-1.cern.ch -> ctb_generic_1_cern_ch
- <cluster-name>: identifier that corresponds to the cluster name in upper case. Example: my_cluster -> MY_CLUSTER
- <subcluster-name>: identifier that corresponds to the subcluster name in upper case. Example: my_subcluster -> MY_SUBCLUSTER
Variable Name |
Description |
Value type |
Version |
CLUSTERS |
Space separated list of your cluster names, Ex. "cluster1 [cluster2 [...]]" |
string list |
glite-yaim-cluster 1.0.0-1 |
CLUSTER_<cluster-name>_CLUSTER_UniqueID |
Cluster UniqueID |
string |
glite-yaim-cluster 1.0.0-1 |
CLUSTER_<cluster-name>_CLUSTER_Name |
Cluster human readable name |
string |
glite-yaim-cluster 1.0.0-1 |
CLUSTER_<cluster-name>_SITE_UniqueID |
Site name where the cluster belongs to. It should be consistent with your variable SITE_NAME. NOTE: This may be changed to SITE_UniqueID when the GlueSite is configured with the new infosys variables |
string |
glite-yaim-cluster 1.0.0-1 |
CLUSTER_<cluster-name>_CE_HOSTS |
Space separated list of CE hostnames configured in the cluster |
hostname list |
glite-yaim-cluster 1.0.0-1 |
CLUSTER_<cluster-name>_SUBCLUSTERS |
Space separated list of your subcluster names, Ex="subcluster1 [subcluster2 [...]]"= |
string list |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_SUBCLUSTER_UniqueID |
Subcluster UniqueID |
string |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_ApplicationSoftwareRunTimeEnvironment |
"sw1 [| sw2 [| ...]" old CE_RUNTIMEENV |
string list |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_ArchitectureSMPSize |
old CE_SMPSIZE |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_ArchitecturePlatformType |
old CE_OS_ARCH |
string |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_BenchmarkSF00 |
old CE_SF00 |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_BenchmarkSI00 |
old CE_SI00 |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_MainMemoryRAMSize |
old CE_MINPHYSMEM |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_MainMemoryVirtualSize |
old CE_MINVIRTMEM |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_NetworkAdapterInboundIP |
old CE_INBOUNDIP |
boolean |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_NetworkAdapterOutboundIP |
old CE_OUTBOUNDIP |
boolean |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_OperatingSystemName |
old CE_OS |
OS name |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_OperatingSystemRelease |
old CE_OS_RELEASE |
OS release |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_OperatingSystemVersion |
old CE_OS_VERSION |
OS version |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_ProcessorClockSpeed |
old CE_CPU_SPEED |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_ProcessorModel |
old CE_CPU_MODEL |
string |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_HOST_ProcessorVendor |
old CE_CPU_VENDOR |
string |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_SUBCLUSTER_Name |
subcluster human readable name |
string |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_SUBCLUSTER_PhysicalCPUs |
old CE_PHYSCPU |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_SUBCLUSTER_LogicalCPUs |
old CE_LOGCPU |
number |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_SUBCLUSTER_TmpDir |
tmp directory |
path |
glite-yaim-cluster 1.0.0-1 |
SUBCLUSTER_<subcluster-name>_SUBCLUSTER_WNTmpDir |
WN tmp directory |
path |
glite-yaim-cluster 1.0.0-1 |
Default variables for the lcg CE: You'll find them under
-
/opt/glite/yaim/defaults/glite-cluster.pre
:
It contains the list of variables that can be configured per Subcluster. They belong to the
Host and
Subcluster entities in the Glue schema:
HOST_VAR="
ApplicationSoftwareRunTimeEnvironment
ArchitectureSMPSize
ArchitecturePlatformType
BenchmarkSF00
BenchmarkSI00
MainMemoryRAMSize
MainMemoryVirtualSize
NetworkAdapterInboundIP
NetworkAdapterOutboundIP
OperatingSystemName
OperatingSystemRelease
OperatingSystemVersion
ProcessorClockSpeed
ProcessorModel
ProcessorVendor"
SUBCLUSTER_VAR="
Name
UniqueID
PhysicalCPUs
LogicalCPUs
TmpDir
WNTmpDir"
If the
Glue schema
supports other variables than the ones defined here, you can just add new ones by redefining
HOST_VAR
and/or
SUBCLUSTER_VAR
in site-info.def. It's the list of variables contained in
HOST_VAR
and
SUBCLUSTER_VAR
what
YAIM uses to create the ldif file.
-
/opt/glite/yaim/defaults/glite-cluster.post
:
It defines some auxiliary variables:
Variable Name |
Description |
Value type |
Default Value |
Version |
STATIC_CREATE |
Path to the script that creates the ldif file |
path |
${INSTALL_ROOT}/glite/sbin/glite-info-static-create |
glite-yaim-cluster 1.0.0-1 |
TEMPLATE_DIR |
Path to the ldif templates directory |
path |
${INSTALL_ROOT}/glite/etc |
glite-yaim-cluster 1.0.0-1 |
CONF_DIR |
Path to the temporary configuration directory |
path |
${INSTALL_ROOT}/glite/var/tmp/gip |
glite-yaim-cluster 1.0.0-1 |
LDIF_DIR |
Path to the ldif directory |
path |
${INSTALL_ROOT}/glite/etc/gip/ldif |
glite-yaim-cluster 1.0.0-1 |
GlueCluster_OUTFILE |
Path to the file to store the temp file that will be used to create the ldif file |
path |
${CONF_DIR}/glite-info-static-cluster.conf |
glite-yaim-cluster 1.0.0-1 |
GlueCluster_ldif |
Path to the Glue Cluster ldif file |
path |
${LDIF_DIR}/static-file-Cluster.ldif |
glite-yaim-cluster 1.0.0-1 |
Torque server
Since I haven't modified the code of this yaim module, the following variables are still needed, even if there are new variable replacing them:
-
QUEUES
-
<queue-name>_GROUP_ENABLE
-
CE_SMPSIZE
Once you have defined all the needed variables, configure the lcg CE by running:
./yaim -c -s site-info.def -n lcg-CE -n glite-CLUSTER (-n BDII_site) -n TORQUE_server -n TORQUE_utils
What to test
- Define only one cluster, one subcluster and one CE.
- It's important to test both an upgrade and a clean installation.
- Test basic job submission and usual lcg CE related tests that would be executed to certify a new release of the lcg CE
- Define new Glue Schema variables for the CE, Voview, Host and Subcluster entities (not sure if YAIM already defined all the existing ones). Are they included in the ldif file?
- Define CE and VOView entity variables also per queue and per queue-voview to test that this feature actually works. Are they really taken into account? Check the ldif file.
Feedback
- Is it easy to use the new variables?
- comments on the complexity of the new way to configure the information system.
- report on bugs and other issues.
--
MariaALANDESPRADILLO - 01 Sep 2008