Test and Release process for Castor SRM2

This page outlines the testing process currently performed to certify a new Castor SRM2 release for production deployment. For the time being, it also tracks work in progress and/or plans to achieve the desired test process.

Definitions

  • Major version: a software release where any digit of the version number can change with respect to the previous release. A major version upgrade may require new Castor libraries and/or schema changes and requires an intrusive intervention.
  • Minor release: a software release where only the last digit is changed with respect to the previous release. A minor version upgrade does not require new Castor libraries nor schema changes, and it may or may not be performed in a transparent (rolling) manner.

Functional test steps

To run any of the following tests, you need to have a valid grid certificate and a set of shell environment variables. For example, from lxplus it is advised to run:

source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid-env.sh
grid-proxy-init
before initiating any test session. Moreover, it is required to use a directory in the Castor namespace where the pool account you're mapped to (usually dteam001) has write access. The S2 test suite uses by default /castor/cern.ch/grid/dteam/S2-test-results.

For any new version, the following functionality tests are performed against certification endpoints, lxsrmdev0N.cern.ch for N = 1, 2, 3, 4 (see SrmDev for the actual deployment):

  1. The SAM based test
  2. The S2 test suite
  3. Extra Castor SRM tests
  4. GFAL prestaging

SAM-based test

A SAM-like test using lcg_utils is provided in svn.

Typical usage:

> srm2_testlcgutils.sh
Usage: ./srm2_testlcgutils.sh endpoint-name [spacetoken] [castor path]

> ./srm2_testlcgutils.sh srm-pps srm2_d1t0
#
# Executing "lcg-cp --verbose --nobdii -D srmv2 --vo dteam --dst srm2_d0t1 file:///etc/group srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663"                
#                                                                                                                           

Using grid catalog type: UNKNOWN
Using grid catalog : (null)     
VO name: dteam                  
Checksum type: None             
Destination SE type: SRMv2      
Destination SRM Request Token: 9145153
Source URL: file:/etc/group           
File size: 2443                       
Source URL for copy: file:/etc/group  
Destination URL: gsiftp://lxfsre5303.cern.ch:20886/7e8e0dad-e0fc-3105-e040-8a89c180035b
# streams: 1                                                                           
         2443 bytes      3.42 KB/sec avg      3.42 KB/sec inst                         
Transfer took 1070 ms                                                                  

#
# Executing "lcg-ls --verbose --nobdii -D srmv2 --vo dteam -l srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663"                                               
#                                                                                                                           

SE type: SRMv2
-rw-r-----   1     2     2    2443               ONLINE /castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663                                                                                           
        * Checksum:  ()                                                                                                     
        * Space tokens: 48f34339-0000-1000-926f-8fd2f86a7650                                                                

#
# Executing "lcg-cp --verbose --nobdii -D srmv2 --vo dteam srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663 file:///tmp/test-group"                           
#                                                                                                                           

Using grid catalog type: UNKNOWN
Using grid catalog : (null)     
VO name: dteam                  
Checksum type: None             
Trying SURL srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663 ...                                                                                              
Source SE type: SRMv2                                                                                                       
Source SRM Request Token: 9145156                                                                                           
Source URL: srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663                                                                                                  
File size: 2443                                                                                                             
Source URL for copy: gsiftp://lxfsrl6306.cern.ch:20024/7e8ce0cc-ae74-d981-e040-8a89c180035d                                 
Destination URL: file:/tmp/test-group                                                                                       
# streams: 1                                                                                                                
            0 bytes      0.00 KB/sec avg      0.00 KB/sec inst                                                              
Transfer took 1010 ms                                                                                                       

#
# Executing "lcg-gt --verbose --nobdii -D srmv2 --st srm2_d0t1 srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663 gsiftp"                                       
#                                                                                                                           

gsiftp://lxfsre5303.cern.ch:20622/7e8e0da9-f72d-715b-e040-8a89c1800363
9145159

#
# Executing "lcg-gt --verbose --nobdii -D srmv2 --st srm2_d0t1 srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663 rfio"
#

rfio://castorpublic.cern.ch:9002//castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663?svcClass=default&castorVersion=2
9145162

#
# Executing "lcg-gt --verbose --nobdii -D srmv2 --st srm2_d0t1 srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663 xroot"
#

root://castorpublic.cern.ch//castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663?svcClass=default
9145168

#
# Executing "lcg-del --verbose --nobdii -D srmv2 --nolfc --vo dteam srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663"
#

VO name: dteam
SE type: SRMv2
srm://srm-pps:8443/srm/managerv2?SFN=/castor/cern.ch/grid/dteam/castordev/test-srm-pps_8443-srm2_d0t1-ed6b7013-5329-4f5b-aaba-0e1341f30663 - DELETED

#
# Done!
#

S2 test suite

The S2 test suite has been developed by Flavia and can be executed on a 32-bit-enabled box. Details on S2 are on the SRMDev twiki. A basic run of S2 is:

cd ~itglp/testsuite/srm/S2
source env.sh
cd basic
make test

The certification process includes running both the basic and the usecase test families. Note that they can take a substantial time to complete!

-- To be done -- Recompile the S2 framework

To interpret the outcome you must take into account that a number of SRM requests are not supported by Castor SRM, thus the correspondent basic tests fail; also a number of use case tests exercise special boundary conditions which are known to break! This is ok as long as the impact is known to be negligible for the users.

Extra Castor SRM tests

These tests should be part of S2 at some stage. For the time being they are run by using the Castor SRM srm2_test* command-line clients.

  • srmPrepareToGet|BoL of 2 files
  • srmPrepareToGet|BoL of a directory
  • srmPrepareToGet|BoL passing a space token
  • GetStatusPartial{Ex,Ne} also for bringOnline
  • srmPurgeFromSpace using a 'predefined' space token (so to not rely on srmReserveSpace)
  • srmPrepareToPut|Get cycles with a different protocol than gsiftp (rfio, xroot)
  • srmPrepareToPut|Get (lcg-getturls) with a list of protocols, checking the order is respected

Testing GFAL prestage

-- work in progress --

See https://twiki.cern.ch/twiki/bin/view/Sandbox/PreStagingTestsReferenceImplementationArch

Testing srmcp

-- work in progress -- For example:

export SRM_PATH=/afs/cern.ch/project/gd/LCG-share/3.2.8-0/d-cache/srm
$SRM_PATH/bin/srmcp -debug  -srm_protocol_version 2 -space_token <spacetoken> SURL1 SURL2

Testing VOMS Roles

-- work in progress --

itglp@lxcastordev02:user/i/itglp> source /afs/cern.ch/project/gd/LCG-share/current_3.2/etc/profile.d/grid-env.sh
itglp@lxcastordev02:user/i/itglp> voms-proxy-init -voms dteam:/dteam/cern/Role=lcgadmin
Enter GRID pass phrase:
Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lopresti/CN=626027/CN=Giuseppe Lo Presti
Creating temporary proxy .................................................................................... Done
Contacting  lcg-voms.cern.ch:15004 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "dteam" Done
Creating proxy ...................................................................................... Done
Your proxy is valid until Sat Jan 15 03:13:42 2011

itglp@lxcastordev02:user/i/itglp> voms-proxy-info -all
subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lopresti/CN=626027/CN=Giuseppe Lo Presti/CN=proxy
issuer    : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lopresti/CN=626027/CN=Giuseppe Lo Presti
identity  : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lopresti/CN=626027/CN=Giuseppe Lo Presti
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u22103
timeleft  : 11:58:22
=== VO dteam extension information ===
VO        : dteam
subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lopresti/CN=626027/CN=Giuseppe Lo Presti
issuer    : /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch
attribute : /dteam/cern/Role=lcgadmin/Capability=NULL
attribute : /dteam/cern/Role=NULL/Capability=NULL
attribute : /dteam/Role=NULL/Capability=NULL
timeleft  : 11:58:22
uri       : lcg-voms.cern.ch:15004

Stress tests

Only for major versions: tests are run against the castordev/srmcert5 CDB cluster (to become the srm-itdc.cern.ch endpoint); database schema is srm_itdc@srm-itdc-db.

  1. The load/stress test family of the S2 test suite
  2. The FTS load tests

The S2 stress test family

This test is was performed in collaboration with Flavia. The scripts are being adapted to be able to use lxtest machines as clients and without any dependency on Flavia's grid certificate.

Description of the test

The test runs from multiple clients and aims at loading the endpoint with a large number of concurrent requests. Typical rates over a day follow:

[root@lxbrb2910 castor]# grep 'New Req' srmfed.log  | awk '{print $10}' | sort | uniq -c
     60 Type="srm__srmGetSpaceTokens"
   2790 Type="srm__srmLs"
    733 Type="srm__srmMkdir"
  14011 Type="srm__srmPrepareToGet"
   9226 Type="srm__srmPrepareToPut"
   8210 Type="srm__srmPutDone"
   4846 Type="srm__srmRm"
    182 Type="srm__srmRmdir"
1792332 Type="srm__srmStatusOfGetRequest"
1427412 Type="srm__srmStatusOfPutRequest"

With a hourly rate between 100K and 220K reqs/h. Moreover, the same set of SURLs is reused from many clients in order to exercise race conditions: any given SURL is re-written, re-read and aborted many times concurrently.

  • To be added to the stress test: multiple srmBringOnline requests on top of the prepareToGet|Put to unveil potential race conditions and/or deadlocks across all asynchronous stager requests.

When the stress test is ongoing, the standard S2 basic and use-case suites are run on top to assess whether the system continues to behave correctly under load.

How to assess the outcome of the test

A stress test does not provide a red/green flag by its nature. Typical things to observe include:

  • Core dumps due to race conditions
  • Memory or socket leaks: check the lemon page for the box
  • Oracle errors: check both the frontend and the backend daemons' logs and monitor the Oracle EM for bad execution plans, deadlocks, etc.
  • High rate of INTERNAL_ERRORs
  • Abnormally high processing times
  • etc...

FTS load tests

To be done. We need to agree and setup FTS channels between the srm-itdc and srm-pps endpoints. srm-public could be involved depending on availability and other concurrent production activities in its Castor backend (castorpublic).

History

In the FIO wiki.

-- GiuseppeLoPresti - Aug 2009

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2012-01-11 - GiuseppeLoPresti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DataManagement All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback