Castor at CERN

  • Mon Oct 8 2007
Analysis of run castor-cern.test5 from Mon Oct 8 10:11 to Mon Oct 8 14:23 2007
Num of parallel proc.: 200 each with 100 requests
Increasing polling time, starting with t0=8, timeout for a request: 8176.0 s (136.3 min.)
Mean time per request in each process: 121.6 s ( 2.0 min.)
Average over all the run: (total req. done)/(total duration)= 1.326 s (frequency= 0.75 Hz)
Total failures getting the request token: 9
Total failures getting the TURL: 100
Total failures in ptp => 109 over 20000 (0.545 %)
Comments: All the failed requests are from the same process and have a common origin: the initial operation of 'mkdir' to create the directory where to direct the ptp requests, failed. Therefore all the 100 requests failed because of 'INVALID PATH'. The failure of mkdir is due to a gSoap error:
Sending Mkdir request to: httpg://srm-v2.cern.ch:8443/
============================================================
Request status:
gSoap code: 12

soap_print_fault:
SOAP FAULT: SOAP-ENV:Client
"CGSI-gSOAP: Could not open connection !"
Detail: TCP connect failed in tcp_connect()
mkdir:soap_print_fault_location:
Warning: during this run also 100 errors of type: "CGSI-gSOAP: Could not open connection" occurred while polling the system with command of type 'statusptp'. These errors didn't cause the request to fail because the client script polls the SRM server 10 times before declaring the request failed, so that if this error happens, then the TURL can still be otained in the following polling. As a consequence these errors are not included in the statistics.

  • Fri Oct 5 2007

Analysis of run castor-cern.test4 from Fri Oct 5 15:52 to Fri Oct 5 19:53 2007
Num of parallel proc.: 100 each with 100 requests
Increasing polling time, starting with t0=8, timeout for a request: 8176.0 s (136.3 min.)
Mean time per request in each process: 107.7 s ( 1.8 min.)
Average over all the run: (total req. done)/(total duration)= 0.690 s (frequency= 1.45 Hz)
Total failures getting the request token: 0
Total failures getting the TURL: 2
Total failures in ptp => 2 over 10000 (0.02 %)
Comments: Only 2 errors occurred: one of them due to the "stage_put error: Unknown internal error", and the other due to "stage_prepareToPut: Internal Error" (to see the complete error message see below).
This confirms the result of run castor-cern.test3. The upgrade has significantly reduced the error rate.

  • Thu Oct 4 2007

Analysis of run castor-cern.test3 from Thu Oct 4 15:07 to Thu Oct 4 17:52 2007
Num of parallel proc.: 100 each with 100 requests
Increasing polling time, starting with t0=4, timeout for a request: 4088.0 s ( 68.1 min.)
Mean time per request in each process: 81.5 s ( 1.4 min.)
Average over all the run: (total req. done)/(total duration)= 1.016 s (frequency= 0.98 Hz)
Total failures getting the request token: 0
Total failures getting the TURL: 6
Total failures in ptp => 6 over 10000 (0.06 %)
Comments: First run after the upgrade. Of the 6 failed requests, 5 of them are pure timeout (no error occurred) and the 6th is due to the "stage_put error: Unknown internal error".

After the upgrade the rate of failures have decreased (from about 3-4% to less than 1%) and the time per request in each process has dropped from 3 min to about 1 min. Repeat the test to be sure it is not just a fluctuation.

Notification of upgrade On Thursday, October 4 2007, the CASTORLHCB MSS at CERN will be upgraded to thelatest Castor software version. The intervention will start at 09:00 CEST. (run castor-cern.test2 finished at 01:01, so it was not affected).

  • Wed Oct 3 2007

Analysis of run castor-cern.test2 from Wed Oct 3 15:23 to Thu Oct 4 01:01 2007
Num of parallel proc.: 100 each with 100 requests
Increasing polling time, starting with t0=4, timeout for a request: 4088.0 s ( 68.1 min.)
Mean time per request in each process: 187.6 s ( 3.1 min.)
Average over all the run: (total req. done)/(total duration)= 0.288 s (frequency= 3.47 Hz)
Total failures getting the request token: 0
Total failures getting the TURL: 324
Total failures in ptp => 324 over 10000 (3.24 %)
Comments: Only 1 failed TURL request is due the timeout (not an error)
Then, 276 failures out of 324 are due to the "stage_put error: Unknown internal error", already observed in the previous run (see below the complete error message).
The remaining 47 failures are due to:

 Sending StatusPtP request to: httpg://srm-v2.cern.ch:8443/
 ============================================================
 Request status:
   statusCode="SRM_FAILURE"(1)
  explanation="No subrequests succeeded"
 ============================================================
SRM Response:
 remainingTotalRequestTime=0
 arrayOfFileStatuses (size=1)
    [0] SURL="srm://srm-v2.cern.ch:8443/castor/cern.ch/grid/lhcb/elisa/response_tests/mkdir/5/60"
      [0] status: statusCode="SRM_FAILURE"(1)
                   explanation="stage_prepareToPut: Internal Error"
 ============================================================

Analysis of run castor-cern.test1 from Wed Oct 3 11:12 to Wed Oct 3 13:54 2007
Num of parallel proc.: 10 each with 100 requests
Increasing polling time, starting with t0=4, Max. timeout for a request: 4088.0 s ( 68.1 min.)
Mean time per request in each process: 26.7 s ( 0.4 min.)
Average over all the run: (total req. done)/(total duration)= 0.103 s (frequency= 9.71 Hz)
Total failures getting the request token: 0
Total failures getting the TURL: 3
Total failures in ptp => 3 over 1000 (0.3 %)
Comments The 3 failures to get a TURL are due to this error (output of 'statusptp' command):

Request status:
statusCode="SRM_FAILURE"(1)
explanation="No subrequests succeeded"
 ============================================================
SRM Response:
remainingTotalRequestTime=0
arrayOfFileStatuses (size=1)
[0] SURL="srm://srm-v2.cern.ch:8443/castor/cern.ch/grid/lhcb/elisa/response_tests/mkdir/8/91"
[0] status: statusCode="SRM_FAILURE"(1)
            explanation="stage_put error: Unknown internal error"
the same error has been observed for Castor of CNAF.

-- ElisaLanciotti - 03 Oct 2007

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2007-10-08 - ElisaLanciotti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback