Deploying Sun Grid Engine in a LCG Computing Element

Disclaimer

This software is considered beta -- you use it at your own risk. It may be not fully optimized or correct and therefore, should be considered as experimental. There is no guarantee that it is compatible with the way in which your site is configured.

About

Author: Gonçalo Borges, goncalo@lipNOSPAMPLEASE.pt (reviewed by CESGA)

Version: 0.0.0-2

Abstract: SGE Yaim integration Manual for lcg-CE and glite-WN

RPMS Description:

gliteWN-yaimtosge-0.0.0-3.i386.rpm: Modification to standard glite yaim tool for glite-WN integration using SGE as scheduler system. It will install:

/etc/profile.d/sge.sh (csh): To set the proper environment;
/opt/glite/yaim/scripts/configure_sgeclient.pm: SGE installation directories;
/opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition;
/opt/glite/yaim/functions/config_sge_client: Configures SGE exec host;

lcgCE-yaimtosge-0.0.0-3.i386.rpm: Modification to standard glite yaim tool for lcg-CE integration using SGE as scheduler system. It will install:

/etc/profile.d/sge.sh (csh): To set the proper environment;
/opt/glite/yaim/scripts/configure_sgeserver.pm: SGE installation directories;
/opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition;
/opt/glite/yaim/functions/config_sge_server: Configures SGE QMASTER
/opt/globus/lib/perl/Globus/GRAM/JobManager/lcgsge.pm: The SGE jobmanager;
/opt/lcg/libexec/lcg-info-dynamic-sge: The SGE CE GRIS/GIIS perl script.

sge-V60u7_1-3.i386.rpm: Contains the binaries and libraries needed to run sge commands;

sge-utils-V60u7_1-3.i386.rpm: Instalation scripts and SGE utilities;

sge-daemons-V60u7_1-3.i386.rpm: The SGE daemons;

sge-ckpt-V60u7_1-3.i386.rpm: For checkpointing purposes;

sge-parallel-V60u7_1-3.i386.rpm: For running parallel environments, as OpenMpi, Mpich, etc;

sge-docs-V60u7_1-3.i386.rpm: Documentation, manuals and examples;

sge-qmon-V60u7_1-3.i386.rpm: The SGE GUI interface;

RPMS Download:

http://www.lip.pt/grid/gliteWN-yaimtosge-0.0.0-3.i386.rpm

http://www.lip.pt/grid/lcgCE-yaimtosge-0.0.0-3.i386.rpm

http://www.lip.pt/grid/sge-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-utils-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-daemons-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-ckpt-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-parallel-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-docs-V60u7_1-3.i386.rpm

http://www.lip.pt/grid/sge-qmon-V60u7_1-3.i386.rpm

Pré-Requisites:

The SGE rpm packages delivered together with this manual were built under SLC4 with the additional packaging of the libdb-4.2.so library in order for them to work in SLC3. Please report problem to goncalo@lipNOSPAMPLEASE.pt.

We will assume that the standard “lcg-CE” and “glite-WN” softwares are already installed (but not configured) in the proper machines. The installation should have been performed using the instructions proposed in:

http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install/

https://twiki.cern.ch/twiki/bin/view/EGEE/CertTestBedWorld

Check that your apt repositories are properly set to:

[root@ce03 root]# cat /etc/apt/sources.list.d/lcg-ca.list
rpm http://linuxsoft.cern.ch/ LCG-CAs/current production

[root@ce03 root]# cat /etc/apt/sources.list.d/lcg.list
rpm http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/ rhel30 externals Release3.0 updates

[root@ce03 root]# cat /etc/apt/sources.list.d/cern.list
rpm http://linuxsoft.cern.ch  cern/slc30X/i386/apt  os updates extras
rpm-src http://linuxsoft.cern.ch  cern/slc30X/i386/apt  os updates extras

You should stop following the LCG manual and start to follow this one right before you reach the Middleware Configuration section.

Please ensure that “passwordless ssh” will work from a WN pool account to a CE pool account. This is something which is not specific for this precise deployment but needed by all grid infrastructures.

CE gatekeeper Installation

Sun Grid Engine needs a Qmaster machine which, in the present manual, we assume it will be installed in the CE Gatekeeper. The SGE rpms will deploy all files under /usr/local/sge/V60u7_1 and link that directory to /usr/local/sge/pro. Latter on, $SGE_ROOT will be defined as /usr/local/sge/pro in such a way that we can keep old SGE versions and use them when needed. Please install the following SGE packages (require “openmotif (>= 2.2.3-5)” package, if not already there, which you may find in the SLC repositories):

sge-V60u7_1-3.i386.rpm
sge-utils-V60u7_1-3.i386.rpm
sge-daemons-V60u7_1-3.i386.rpm
sge-qmon-V60u7_1-3.i386.rpm
sge-ckpt-V60u7_1-3.i386.rpm
sge-parallel-V60u7_1-3.i386.rpm
sge-docs-V60u7_1-3.i386.rpm

[root@<your_ce> ~]# rpm -ivh sge-V60u7_1-3.i386.rpm sge-utils-V60u7_1-3.i386.rpm sge-daemons-V60u7_1-3.i386.rpm sge-qmon-V60u7_1-3.i386.rpm sge-ckpt-V60u7_1-3.i386.rpm sge-parallel-V60u7_1-3.i386.rpm sge-docs-V60u7_1-3.i386.rpm
Preparing...                ########################################### [100%]
1:sge                    ########################################### [ 14%]
2:sge-utils              ########################################### [ 29%]
3:sge-daemons            ########################################### [ 43%]
4:sge-qmon               ########################################### [ 57%]
5:sge-ckpt               ########################################### [ 71%]
6:sge-parallel           ########################################### [ 86%]
7:sge-docs               ########################################### [100%]

(!) Please upgrade your yaim version to the last release.

[root@<your_ce> ~]# rpm -ivh lcgCE-yaimtosge-0.0.0-1.i386.rpm
Preparing...                ########################################### [100%]
1:lcgCE-yaimtosge        ########################################### [100%]

  • Add the following values to your site-info.def file:

SGE_QMASTER=$CE_HOST
DEFAULT_DOMAIN=$MY_DOMAIN
ADMIN_MAIL=<your_admin_email>

  • Check that the “WN_LIST”, “USERS_CONF”, “VOS” and "QUEUES" variables are also properly defined in your site-info.def file. The content of these variables will be used to build the SGE exec node list, the SGE user sets and the SGE local queues. For the time being, VO users in the USERS_CONF file have to be defined following the same order as the QUEUES definition. Otherwise, the VO SGE userset will not correspond to the correct VO QUEUE. This will be fixed in the future...

  • Configure the CE running SGE using the “CE_sge” node definiton.

[root@<your_ce> ~]#/opt/glite/yaim/scripts/configure_node <path_to_your_site-info.def_file> CE_sge BDII_site

  • The CE configuration must be always run before the WN configurations, otherwise the SGE daemons in the WNs will not be started since there is no Qmaster host associated to them.

  • SGE prompt commands will be accessible after a new login (to source the /etc/profile.d/ scripts).

If you have configured your CE with wrong values for the “WN_LIST”, “USERS_CONF”, “VOS” and "QUEUES" variables, an easy way to solve the question is to delete the /usr/local/sge/pro/default directory and run the CE configuration again.

WN Installation

Please install the following sge packages:

sge-V60u7_1-3.i386.rpm 
sge-utils-V60u7_1-3.i386.rpm 
sge-daemons-V60u7_1-3.i386.rpm 
sge-parallel-V60u7_1-3.i386.rpm 
sge-docs-V60u7_1-3.i386.rpm

[root@<your_wn> ~]# rpm -ivh sge-V60u7_1-3.i386.rpm sge-utils-V60u7_1-3.i386.rpm sge-daemons-V60u7_1-3.i386.rpm sge-parallel-V60u7_1-3.i386.rpm sge-docs-V60u7_1-3.i386.rpm
Preparing...                ########################################### [100%]
1:sge                    ########################################### [ 20%]
2:sge-utils              ########################################### [ 40%]
3:sge-daemons            ########################################### [ 60%]
4:sge-parallel           ########################################### [ 80%]
5:sge-docs               ########################################### [100%]

  • Install gliteWN-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE client configuration.

[root@<your_wn> ~]# rpm -ivh gliteWN-yaimtosge-0.0.0-1.i386.rpm
Preparing...                ########################################### [100%]
1:gliteWN-yaimtosge      ########################################### [100%]

  • Use the same site-info.def file as in the Gatekeeper case. This file should already include definitions for “SGE_QMASTER”, “DEFAULT_DOMAIN”, “ADMIN_MAIL” variables

  • Configure the WN using the “WN_sge” node definiton.

[root@<your_wn> ~]# /opt/glite/yaim/scripts/configure_node  <path_to_your_site-info.def_file> WN_sge

Testing:

Test the information system using the following commands:

ldapsearch -x -h <your_ce> -p 2135 -b "mds-vo-name=local,o=grid"
ldapsearch -x -h <your_ce> -p 2170 -b "mds-vo-name=<site_name>,o=grid"

  • Check if it is returning the proper queue names and available resources.

  • Try to submit a simple script from a give pool account in your CE. From this test you will check if the SGE prompt commands (like qsub or qstat) are working. If the job finishes sucessfully, the stdout and stderr files won't be available in our CE since, in a normal grid event, they would be transfered directly from the WN to the RB using GSIFTP. Try to check stderr/stdout files in the WN...

  • Try to do a globus-job-run using fork from a UI (you have to start your proxy first):

[goncalo@ui01]$ globus-job-run ce03.lip.pt:2119/jobmanager-fork /bin/uname -a
Linux ce03.lip.pt 2.6.9-34.EL.cern #1 Sun Mar 12 12:19:53 CET 2006 i686 athlon i386 GNU/Linux

  • Try to do a globus-job-run using lcgsge from a UI:

[goncalo@ui01]$ globus-job-run ce03.lip.pt:2119/jobmanager-lcgsge /bin/uname -a
Linux sgewn01.lip.pt 2.6.9-34.EL.cern #1 Sun Mar 12 12:19:53 CET 2006 i686 i686 i386 GNU/Linux

  • Try to submit a job though the RB from a UI:

[goncalo@ui01]$ edg-job-submit -r ce03.lip.pt:2119/jobmanager-lcgsge-dteamgrid well.jdl
Selected Virtual Organisation name (from proxy certificate extension): dteam
Connecting to host rb02.lip.pt, port 7772
Logging to host rb02.lip.pt, port 9002
*********************************************************************************************
                               JOB SUBMIT OUTCOME
 The job has been successfully submitted to the Network Server.
 Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
 - https://rb02.lip.pt:9000/Ab0W2EpWMPkpJKjAMpRCsQ
*********************************************************************************************

[goncalo@ui01 ce02]$ edg-job-status https://rb02.lip.pt:9000/Ab0W2EpWMPkpJKjAMpRCsQ
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://rb02.lip.pt:9000/Ab0W2EpWMPkpJKjAMpRCsQ
Current Status:     Done (Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        ce03.lip.pt:2119/jobmanager-lcgsge-dteamgrid
reached on:         Fri Feb  2 18:42:38 2007
*************************************************************

[goncalo@ui01 ce02]$ cat /tmp/jobOutput/goncalo_Ab0W2EpWMPkpJKjAMpRCsQ/well.out
One Perl out of the sea!
This is Linux sgewn01.lip.pt 2.6.9-34.EL.cern #1 Sun Mar 12 12:19:53 CET 2006 i686 i686 i386 GNU/Linux
Fri Feb  2 18:31:14 WET 2007
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2007-10-17 - GoncaloBorges
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback