Castor at CNAF

  • Tue Oct 2 2007

Analysis of run castor-cnaf.test2 from Tue Oct 2 19:11:03 to Wed Oct 3 09:11:08 2007
Num of parallel proc.: 100 each with 100 requests
Increasing polling time, starting with t0=4, Max. timeout for a request: 4088.0 s ( 68.1 min.)
Mean time per request in each process: 363.3 s ( 6.1 min.)
Average over all the run: (total req. done)/(total duration)= 0.198 s (frequency= 5.04 Hz)
Total failures getting the request token: 48
Total failures getting the TURL: 505
Total failures in ptp => 553 over 10000 (5.53 %)
Comments: The 48 failures in getting the token are all due to a mismatch of database. Below the output of the ptp command is reported:

Request status:
  statusCode="SRM_INTERNAL_ERROR"(14) 
  explanation="Version mismatch between the database and the code : "2_1_2_4" versus "2_1_3_8""
About the 505 failures in getting the TURL:
in 233 cases out of 505 the failure is due to some SRM internal error. The output of 'statusptp' command is reported below:
Sending StatusPtP request to: httpg://srm-v2.cr.cnaf.infn.it:8443/
============================================================
Request status:
statusCode="SRM_FAILURE"(1)
explanation="No subrequests succeeded"
============================================================
 SRM Response:
remainingTotalRequestTime=0
arrayOfFileStatuses (size=1)
[0] SURL="srm://srm-v2.cr.cnaf.infn.it:8443/castor/cnaf.infn.it/grid/lcg/lhcb/elisa/response_tests/99/84"
[0] status: statusCode="SRM_FAILURE"(1)
                   explanation="stage_put error: Unknown internal error"

In 271 cases out of 505 the TURL is not got because the timeout elapsed. Actually, no error occurred from the SRM point of view, as the request is in status 'SRM_REQUEST_QUEUED' after the last polling.

  • Fri 28 Sep 2007

still problems with GPFS file system.

  • Thu 27 Sep 2007

problems with the GPFS file system at CNAF. Results of tests are not reliable.

  • Wed 26 Sep

Analysis of run castor-cnaf.test1 Num of parallel proc.: 100 each with 10 requests
Increasing polling time, starting with t0=2
Mean time per request: 228.068 s ( 3.801 min.)
in average over all the run: (total req. done)/(total duration)= 0.218 s (frequency= 4.587 Hz)
Timeout for a request is: 2044.000 s ( 34.067 min.)
Total failures getting the TURL: 53
Total failures in ptp => 53 over 1000 (5.3 %)
comments: all failures due to timeout. Repeat the same run with a longer timeout.

-- ElisaLanciotti - 26 Sep 2007

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2007-10-03 - ElisaLanciotti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback