Results of the tests on the IT SA3 CertTestBed for patches certification
Date |
Component |
Patch |
Bug |
Results |
16 May 2007 |
Lsf Local Submit Attributes |
1044 |
Certified |
We updated the following .rpm on the lsf gLite CE wmstest-ce06.cr.cnaf.infn.it:
glite-blah-local-submit-attributes-lsf-3.0.10-2, glite-ce-blahp-1.5.21-1, lcg-info-dynamic-scheduler-lsf-1.0.1-1 and lcg-info-dynamic-lsf-2.0.34-1
Updated version of blah is necessary in order to have the requirements forwarded by blah to the lsf batch system via the script 'lsf_local_submit_attributes.sh' Also some wms configuration changes are needed.
The following has to be added to the WMS configuration file /opt/glite/etc/glite_wms.conf under the 'WorkloadManager' section
CeForwardParameters = {
"GlueHostMainMemoryVirtualSize",
"GlueHostMainMemoryRAMSize"
};
adding all the parameters that have to be forwarded.
The lsf_local_submit_attributes.sh script contains the following to be forwarded:
GlueHostMainMemoryRAMSize,
GlueHostMainMemoryVirtualSize and
GlueHostOperatingSystemName
therefore tests were performed using these attributes.
The lsf_local_submit_attributes.sh and the wms configuration file have to be modified in order to add more attributes.
- We submit a job from the UI to the wms 'lxb2032.cern.ch' with the following requirements in the .jdl
- requirements = (( other.GlueHostMainMemoryRAMSize > 300 ) && ( other.GlueHostMainMemoryVirtualSize == 996 ));
- we check the 'blahjob*' created that will be executed in the wn, it contains the right information for the attributes to be forwarded, the following:
- #BSUB -R "select[mem>=300&&swap>=996]"
- The job gets executed 'successfully' on the wn
- We submit another job from the UI to the wms 'lxb2032.cern.ch' with the following requirements:
- requirements = (( other.GlueHostMainMemoryRAMSize > 300 ) && ( other.GlueHostMainMemoryVirtualSize
= 996 ) && ( other.GlueCEPolicyMaxCPUTime >
2880 ));
- we check the 'blahjob*' created that will be executed in the wn, it contains the right information for the attributes to be forwarded, the following:
- #BSUB -q default
- #BSUB -R "select[mem>=300&&swap>=996]"
- The job gets executed 'successfully' on the wn
Date |
Component |
Patch |
Bug |
Results |
08 Feb 2007 |
BLAH - LSF |
991 |
20989 |
OK |
The bug is the same that we checked in the previous tests, so we repeat the same
procedure. We update the single blah rpm: glite-ce-blahp_R_1_5_19 on the glite CE
(wmstest-ce06.cr.cnaf.infn.it) and also on the LSF server (wmstest-ce02.cr.cnaf.infn.it).
No changes are needed on the configuration of blah.
Now we submit a job as usually from the UI and when it is done we check
the accounting log file on the CE to see if the correct info is reported:
"timestamp=2007-02-08 10:27:11" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/Email=alessio.gianelle@pd.infn.it" "userFQAN=/dteam/Role=NULL/Capability=NULL" "ceID=wmstest-ce06.cr.cnaf.infn.it:2119/blah-lsf-infinite" "jobID=https://lxb2032.cern.ch:9000/91xW8HJaV4zYPFa1jvv9WQ" "lrmsID=2771" "localUser=1667"
The lrmsID is "2771" so the hostname is not specified as expected.
For what concern the lsf CE the bug
20989
should be considered fixed with patch
991
.
We stress the CE submitting about 100 simple jobs and we observe that sometimes the records
written in the accounting log file are not correct; for example:
"timestamp=2007-02-08 10:30:16" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/Email=alessio.gianelle@pd.infn.it" HÄ·
This is bug
21816
which need to be fixed ASAP.
Date |
Component |
Patch |
Bug |
Results |
24 Jan 2007 |
BLAH - LSF |
982 |
20989 |
OK |
We update the single blah rpm: glite-ce-blahp_R_1_5_18 on the glite CE
(wmstest-ce06.cr.cnaf.infn.it) and also on the LSF server (wmstest-ce02.cr.cnaf.infn.it)
to update the
BLParserLSF (this is not necessary but suggested).
No changes are needed on the configuration of blah, but we observe that
the old configuration (i.e. the file /opt/glite/etc/blah.config) is overwritten.
This seems not to be the correct behaviour so we submit a new bug:
23263
for release 3.1 and Di
submit the bug
23465
for release
3.0. In order to have a correct accounting we need also to add suid(s) on
/opt/glite/bin/BDlogger (usually this is done by the yaim config scripts).
Now we submit a job as usually from the UI and when it finishes we check
the accounting log file on the CE to see if the correct info is reported:
"timestamp=2007-01-24 15:25:20" "userDN=/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Alessio Gianelle/Email=alessio.gianelle@pd.infn.it" "userFQAN=/dteam/Role=NULL/Capability=NULL" "ceID=wmstest-ce06.cr.cnaf.infn.it:2119/blah-lsf-infinite" "jobID=https://lxb2032.cern.ch:9000/BMo54WPOooT2m4738iuSqA" "lrmsID=2281" "localUser=1667"
The lrmsID is "2281" so the hostname is not specified as expected.
We check also that the job is correctly accounted:
gianelle@lxde02:$ /opt/glite/bin/glite-dgas-hlr-query -H wmstest-ce07.cr.cnaf.infn.it:56568: -Q resourceAggregate -j
https://lxb2032.cern.ch:9000/BMo54WPOooT2m4738iuSqA
MIN(date) |
MAX(date) |
COUNT(dgJobId) |
SUM(cpuTime)/60 |
SUM(wallTime)/60 |
SUM(pmem)/1024 |
SUM(vmem)/1024 |
SUM(AMOUNT)/1000 |
2007-01-24 16:25:17 |
2007-01-24 16:25:17 |
1 |
0.02 |
0.17 |
2.65 |
7.30 |
0.00 |
For what concern the lsf CE the bug
20989
should be considered fixed with patch
982
.
-- Main.gianelle - 13 Feb 2007