Deploying Sun Grid Engine in a LCG Computing Element

Disclaimer

This software is considered beta -- you use it at your own risk. It may be not fully optimized or correct and therefore, should be considered as experimental. There is no guarantee that it is compatible with the way in which your site is configured.

About

Author: Gonçalo Borges, goncalo@lipNOSPAMPLEASE.pt

Version: 0.0.0-2

Abstract: SGE Yaim integration Manual for lcg-CE and glite-WN

RPMS Description:

gliteWN-yaimtosge-0.0.0-2.i386.rpm: Modification to standard glite yaim tool for glite-WN integration using SGE as scheduler system. It will install: {{{ /etc/profile.d/sge.sh (csh): To set the proper environment; /opt/glite/yaim/scripts/configure_sgeclient.pm: SGE installation directories; /opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition; /opt/glite/yaim/functions/config_sge_client: Configures SGE exec host; }}}

lcgCE-yaimtosge-0.0.0-2.i386.rpm: Modification to standard glite yaim tool for lcg-CE integration using SGE as scheduler system. It will install: {{{ /etc/profile.d/sge.sh (csh): To set the proper environment; /opt/glite/yaim/scripts/configure_sgeserver.pm: SGE installation directories; /opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition; /opt/glite/yaim/functions/config_sge_server: Configures SGE QMASTER /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgsge.pm: The SGE jobmanager; /opt/lcg/libexec/lcg-info-dynamic-sge: The SGE CE GRIS/GIIS perl script. }}}

sge-V60u7_1-3.i386.rpm: Contains the binaries and libraries needed to run sge commands;

sge-utils-V60u7_1-3.i386.rpm: Instalation scripts and SGE utilities;

sge-daemons-V60u7_1-3.i386.rpm: The SGE daemons;

sge-ckpt-V60u7_1-3.i386.rpm: For checkpointing purposes;

sge-parallel-V60u7_1-3.i386.rpm: For running parallel environments, as OpenMpi, Mpich, etc;

sge-docs-V60u7_1-3.i386.rpm: Documentation, manuals and examples;

sge-qmon-V60u7_1-3.i386.rpm: The SGE GUI interface;

RPMS Download:

http://www.lip.pt/grid/gliteWN-yaimtosge-0.0.0-2.i386.rpm

http://www.lip.pt/grid/lcgCE-yaimtosge-0.0.0-2.i386.rpm

http://www.lip.pt/grid/sge-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-utils-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-daemons-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-ckpt-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-parallel-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-docs-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-qmon-V60u7_1-3.i386.rpm

Pré-Requisites:

The SGE rpm packages delivered together with this manual were built under SLC4 with the additional packaging of the libdb-4.2.so library in order for them to work in SLC3. Please report problem to goncalo@lipNOSPAMPLEASE.pt.

We will assume that the standard “lcg-CE” and “glite-WN” softwares are already installed (but not configured) in the proper machines. The installation should have been performed using the instructions proposed in:

http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install/

https://twiki.cern.ch/twiki/bin/view/EGEE/CertTestBedWorld

Check that your apt repositories are properly set to: {{{ [root@ce03 root]# cat /etc/apt/sources.list.d/lcg-ca.list rpm http://linuxsoft.cern.ch/ LCG-CAs/current production

[root@ce03 root]# cat /etc/apt/sources.list.d/lcg.list rpm http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/ rhel30 externals Release3.0 updates

[root@ce03 root]# cat /etc/apt/sources.list.d/cern.list rpm http://linuxsoft.cern.ch cern/slc30X/i386/apt os updates extras rpm-src http://linuxsoft.cern.ch cern/slc30X/i386/apt os updates extras }}}

You should stop following the LCG manual and start to follow this one right before you reach the Middleware Configuration section.

Please ensure that “passwordless ssh” will work from a WN pool account to a CE pool account. This is something which is not specific for this precise deployment but needed by all grid infrastructures.

CE gatekeeper Installation

Sun Grid Engine needs a Qmaster machine which, in the present manual, we assume it will be installed in the CE Gatekeeper. The SGE rpms will deploy all files under /usr/local/sge/V60u7_1 and link that directory to /usr/local/sge/pro. Latter on, $SGE_ROOT will be defined as /usr/local/sge/pro in such a way that we can keep old SGE versions and use them when needed. Please install the following SGE packages (require “openmotif (>= 2.2.3-5)” package, if not already there, which you may find in the SLC repositories): {{{ sge-V60u7_1-3.i386.rpm sge-utils-V60u7_1-3.i386.rpm sge-daemons-V60u7_1-3.i386.rpm sge-qmon-V60u7_1-3.i386.rpm sge-ckpt-V60u7_1-3.i386.rpm sge-parallel-V60u7_1-3.i386.rpm sge-docs-V60u7_1-3.i386.rpm }}}

{{{ [root@ ~]# rpm -ivh sge-V60u7_1-3.i386.rpm sge-utils-V60u7_1-3.i386.rpm sge-daemons-V60u7_1-3.i386.rpm sge-qmon-V60u7_1-3.i386.rpm sge-ckpt-V60u7_1-3.i386.rpm sge-parallel-V60u7_1-3.i386.rpm sge-docs-V60u7_1-3.i386.rpm Preparing... ########################################### [100%]

    1. :sge ########################################### [ 14%]
    2. :sge-utils ########################################### [ 29%]
    3. :sge-daemons ########################################### [ 43%]
    4. :sge-qmon ########################################### [ 57%]
    5. :sge-ckpt ########################################### [ 71%]
    6. :sge-parallel ########################################### [ 86%]
    7. :sge-docs ########################################### [100%]
}}}

* Install lcgCE-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE scheduler configuration. This rpm requires “perl-XML-Simple >= 2.14-2.2” package which you can download from http://rpmfind.net/linux/rpm2html/search.php?query=perl-XML-Simple. It also requires glite-yaim >= 3.0.0-34.

(!) Please upgrade your yaim version to the last release.

{{{ [root@ ~]# rpm -ivh lcgCE-yaimtosge-0.0.0-1.i386.rpm Preparing... ########################################### [100%]

    1. :lcgCE-yaimtosge ########################################### [100%]
}}}

* Add the following values to your site-info.def file: {{{ SGE_QMASTER=$CE_HOST DEFAULT_DOMAIN=$MY_DOMAIN ADMIN_MAIL= }}}

* Check that the “WN_LIST”, “USERS_CONF”, “VOS” and "QUEUES" variables are also properly defined in your site-info.def file. The content of these variables will be used to build the SGE exec node list, the SGE user sets and the SGE local queues. For the time being, VO users in the USERS_CONF file have to be defined following the same order as the QUEUES definition. Otherwise, the VO SGE userset will not correspond to the correct VO QUEUE. This will be fixed in the future...

* Configure the CE running SGE using the “CE_sge” node definiton.

{{{ [root@ ~]#/opt/glite/yaim/scripts/configure_node <path_to_your_site-info.def_file> CE_sge BDII_site }}}

* The CE configuration must be always run before the WN configurations, otherwise the SGE daemons in the WNs will not be started since there is no Qmaster host associated to them.

* SGE prompt commands will be accessible after a new login (to source the /etc/profile.d/ scripts).

* To start SGE GUI, using the “qmon” comand, you need to install “xorg-x11-xauth >= 6.8.2-1”. Unfortunately, this package is not available in the SLC3 repository and you have to download it from the SLC4 one http://linuxsoft.cern.ch/cern/slc4X/i386/SL/RPMS/xorg-x11-xauth-6.8.2-1.EL.13.37.i386.rpm

If you have configured your CE with wrong values for the “WN_LIST”, “USERS_CONF”, “VOS” and "QUEUES" variables, an easy way to solve the question is to delete the /usr/local/sge/pro/default directory and run the CE configuration again.

WN Installation

Please install the following sge packages: {{{ sge-V60u7_1-3.i386.rpm sge-utils-V60u7_1-3.i386.rpm sge-daemons-V60u7_1-3.i386.rpm sge-parallel-V60u7_1-3.i386.rpm sge-docs-V60u7_1-3.i386.rpm }}}

{{{ [root@ ~]# rpm -ivh sge-V60u7_1-3.i386.rpm sge-utils-V60u7_1-3.i386.rpm sge-daemons-V60u7_1-3.i386.rpm sge-parallel-V60u7_1-3.i386.rpm sge-docs-V60u7_1-3.i386.rpm Preparing... ########################################### [100%]

    1. :sge ########################################### [ 20%]
    2. :sge-utils ########################################### [ 40%]
    3. :sge-daemons ########################################### [ 60%]
    4. :sge-parallel ########################################### [ 80%]
    5. :sge-docs ########################################### [100%]
}}}

* Install gliteWN-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE client configuration.

{{{ [root@ ~]# rpm -ivh gliteWN-yaimtosge-0.0.0-1.i386.rpm Preparing... ########################################### [100%]

        1. :gliteWN-yaimtosge ########################################### [100%]

}}}

* Use the same site-info.def file as in the Gatekeeper case. This file should already include definitions for “SGE_QMASTER”, “DEFAULT_DOMAIN”, “ADMIN_MAIL” variables

* Configure the WN using the “WN_sge” node definiton.

{{{ [root@ ~]# /opt/glite/yaim/scripts/configure_node <path_to_your_site-info.def_file> WN_sge }}}

Testing:

Test the information system using the following commands: {{{ ldapsearch -x -h -p 2135 -b "mds-vo-name=local,o=grid" ldapsearch -x -h -p 2170 -b "mds-vo-name=,o=grid" }}}

* Check if it is returning the proper queue names and available resources.

* Try to submit a simple script from a give pool account in your CE. From this test you will check if the SGE prompt commands (like qsub or qstat) are working. If the job finishes sucessfully, the stdout and stderr files won't be available in our CE since, in a normal grid event, they would be transfered directly from the WN to the RB using GSIFTP. Try to check stderr/stdout files in the WN...

* Try to do a globus-job-run using fork from a UI (you have to start your proxy first):

{{{ [goncalo@ui01]$ globus-job-run ce03.lip.pt:2119/jobmanager-fork /bin/uname -a Linux ce03.lip.pt 2.6.9-34.EL.cern #1 Sun Mar 12 12:19:53 CET 2006 i686 athlon i386 GNU/Linux }}}

* Try to do a globus-job-run using lcgsge from a UI:

{{{ [goncalo@ui01]$ globus-job-run ce03.lip.pt:2119/jobmanager-lcgsge /bin/uname -a Linux sgewn01.lip.pt 2.6.9-34.EL.cern #1 Sun Mar 12 12:19:53 CET 2006 i686 i686 i386 GNU/Linux }}}

* Try to submit a job though the RB from a UI:

{{{ [goncalo@ui01]$ edg-job-submit -r ce03.lip.pt:2119/jobmanager-lcgsge-dteamgrid well.jdl Selected Virtual Organisation name (from proxy certificate extension): dteam Connecting to host rb02.lip.pt, port 7772 Logging to host rb02.lip.pt, port 9002 ******************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://rb02.lip.pt:9000/Ab0W2EpWMPkpJKjAMpRCsQ *******************************************************************************************

[goncalo@ui01 ce02]$ edg-job-status https://rb02.lip.pt:9000/Ab0W2EpWMPkpJKjAMpRCsQ *********************************************************** BOOKKEEPING INFORMATION: Status info for the Job : https://rb02.lip.pt:9000/Ab0W2EpWMPkpJKjAMpRCsQ Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce03.lip.pt:2119/jobmanager-lcgsge-dteamgrid reached on: Fri Feb 2 18:42:38 2007 ***********************************************************

[goncalo@ui01 ce02]$ cat /tmp/jobOutput/goncalo_Ab0W2EpWMPkpJKjAMpRCsQ/well.out One Perl out of the sea! This is Linux sgewn01.lip.pt 2.6.9-34.EL.cern #1 Sun Mar 12 12:19:53 CET 2006 i686 i686 i386 GNU/Linux Fri Feb 2 18:31:14 WET 2007 }}}


This topic: SGE_Yaim_Version2 > WebHome
Topic revision: r8 - 2007-05-21 - JavierLopezCacheiro
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback