DPM scalability tests

All the test plans and results of the DPM scalability test will be reported in this wiki page.

Test plans

: first series of tests

Test results :

Test Series 1: write <N> clients concurrently 1 Mb files with rfcp (100 clients per client host) using GSI & VOMS proxy authentication (CSEC_MECH=GSI)

The test starts with 100 clients on a single host using GSI proxy authentication. The client machine load is ~15%. I/O rate is ~ 12 MB/s. We see connection timeouts at the
permille level. With increase of the fast threads and the TCP NODELAY
option the timeouts could be suppressed to 0%.

Scaling up the number of clients from 100 -> 900 the timeouts come
back to ~15-20%. I didn't check exactly the point where the
timeouts come back because I hoped there wouldn't be any timeouts
anymore.... upto 300 clients the timeout fraction was still 0%, then 600 further clients
have been started in parallel.

It is evident that the files/s rate is not independent from the number of
clients.

We don't manage to load the head-node more than 80%. I am not sure, where
this is coming from.

Next actions:
- have a look with oprofile to the daemons to see if something
can be easily improved.

- should we continue the test with the timeouts or should we reduce the number of clients to the point where we see
0% timeouts?

Insert Rate on DPM


c1.jpg

CPU Utilization on DPM head node running dpns/dpm/mysql (8 core 2GHz/16GB)


headnode.gif

Load Average on DPM head node



headnode-load.gif

Transfertime


transfertime.jpg

On the left axis is the time during the test, on the right axis the transfertime in seconds. For 900 clients the average transfertime for a 1Mb file is around 110s with a long tail towards 400s.

Test Series 2: write <N> clients concurrently 1 Mb files with rfcp (100 clients per client host) using trusted host mechanism (CSEC_MECH=ID)

csecid.jpg

The insert rate is slowly decreasing over time. The dip is due to the logrotation of the DPM logfile. The failure rate accounts to 0.5 %.
The logfile output should be modified in the future: tt grows with 200 lines/s during this test.

transfertime-id.jpg

Looking at the transfer times one recognizes a clear band structure which mus be due to polling/timeout or retry mechanisms.

Head Node Performance Analysis

The average i/o on the head node is 16 Mb/s. Switching off the log output of dpns didn't show any change in the server performance.

Mysql Stats

+--------------------------------+--------------+
| Variable_name | Value |
+--------------------------------+--------------+
| Aborted_clients | 3619 |
| Aborted_connects | 33 |
| Binlog_cache_disk_use | 0 |
| Binlog_cache_use | 0 |
| Bytes_received | 110211126634 |
| Bytes_sent | 545623410243 |
| Com_begin | 42990059 |
| Com_commit | 42986967 |
| Com_delete | 3859757 |
| Com_insert | 18230507 |
| Com_rollback | 9520 |
| Com_select | 410967646 ||
| Com_update | 39126621 |
| Connections | 2601 |
| Handler_commit | 42994343 |
| Handler_read_first | 19603070 |
| Handler_read_key | 954421051 |
| Handler_read_next | 34103433 |
| Handler_read_rnd | 3649140 |
| Handler_read_rnd_next | 526429133 |
| Handler_rollback | 11869 |
| Handler_update | 6 |
| Handler_write | 18232230 |
| Key_blocks_unused | 6698 |
| Key_blocks_used | 4 |
| Key_read_requests | 18 |
| Key_reads | 4 |
| Key_write_requests | 4 |
| Key_writes | 4 |
| Max_used_connections | 241 |
| Not_flushed_delayed_rows | 0 |
| Open_tables | 99 |
| Opened_tables | 45292186 |
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 0 |
| Qcache_hits | 3087276 |
| Qcache_inserts | 2296799 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 7988471 |
| Qcache_queries_in_cache | 0 |
| Qcache_total_blocks | 0 |
| Questions | 561674124 |
| Select_scan | 12306066 |
| Sort_range | 2138 |
| Sort_rows | 3649140 |
| Sort_scan | 3648658 |
| Table_locks_immediate | 472470634 |
| Threads_connected | 202 |
| Threads_created | 2600 |
| Threads_running | 100 |
+--------------------------------+--------------+

Profiling dpm daemon (CSEC_MECH=ID)

samples % symbol name
2732398 10.9529 _Cthread_findglobalkey
927441 3.7177 _Cthread_obtain_mtx_debug
914914 3.6675 send2dpnsx
800055 3.2070 anonymous symbol from section .plt
769408 3.0842 Csec_client_negociate_protocol
764252 3.0635 Cthread_Getspecific
758169 3.0391 Cgetnetaddress
660972 2.6495 Csec_trace
613202 2.4580 _Cthread_release_mtx
562963 2.2567 Csec_get_shlib
513143 2.0569 getconfent_r
498123 1.9967 Cglobals_get
478327 1.9174 C__serrno
473584 1.8984 Cthread_Lock_Mtx
332569 1.3331 Cgetnameinfo
322363 1.2922 _Csec_recv_token
309003 1.2386 _netsignal
306545 1.2288 Csec_init_globals
300708 1.2054 dpmlogit
286785 1.1496 Csec_server_negociate_protocol
269453 1.0801 dpm_srv_proc_put
263317 1.0555 netread_timeout
251793 1.0093 Cthread_Getspecific_init
251090 1.0065 doit
247245 0.9911 rfio_connect
226963 0.9098 _Cpool_starter
213220 0.8547 _Cthread_self
212157 0.8504 isremote_sa
190502 0.7636 Cpool_assign_ext
185665 0.7442 _add_to_bigbuf
182180 0.7303 print_trace
179616 0.7200 Cpool_next_index_timeout_ext
176471 0.7074 Csec_clear_errmsg
174520 0.6996 dpns_apiinit
173024 0.6936 logit
171985 0.6894 Csec_apiinit
169868 0.6809 _Csec_send_token
164355 0.6588 Cgetaddrinfo
163672 0.6561 Cthread_Mutex_Unlock
158391 0.6349 netwrite_timeout
158367 0.6348 sendrep
150615 0.6037 rfio_smstat64
138149 0.5538 msthread
133030 0.5333 Csec_get_service_name_caller
132976 0.5330 Csec_client_establishContext
131559 0.5274 isTrustedHost2
130235 0.5221 scan_interfaces
127120 0.5096 Csec_clearContext
126320 0.5064 getconfent
122176 0.4897 is_loopback
122023 0.4891 dpm_oneputdone
121946 0.4888 procreq
117076 0.4693 netconnect_timeout
116588 0.4673 Csec_client_initContext
116357 0.4664 isremote_scan_cb
113909 0.4566 Csec_client_lookup_protocols
112488 0.4509 Cthread_Lock_Mtx_ext
105072 0.4212 dpm_srv_put
104836 0.4202 _Csec_print_token
104698 0.4197 Csnprintf
104666 0.4196 Csec_setup_protocols_to_offer
102543 0.4110 s_recv
102167 0.4095 Cthread_Wait_Condition_ext
98744 0.3958 u64tostr
94963 0.3807 _Cthread_addcid
94933 0.3805 Csec_errmsg
94059 0.3770 Csec_get_peer_service_name
91772 0.3679 _unmarshall_STRINGN
90700 0.3636 dpm_srv_getstatus_putreq
88622 0.3552 Csec_delete_connection_context_caller
81765 0.3278 _check_short_resp
78868 0.3161 dpm_srv_putdone
75545 0.3028 dpm_main
71989 0.2886 getreq
71500 0.2866 netmask_to_prefixlen
71117 0.2851 dpm_decode_pfr_entry
70847 0.2840 _net_readable
68291 0.2737 Cthread_Cond_Broadcast_ext
65683 0.2633 rfio_parseln
63827 0.2559 check_ctx
61968 0.2484 Cvsnprintf
61459 0.2464 dpm_exec_query
60341 0.2419 Csec_client_establish_context_caller
59714 0.2394 Cthread_Mutex_Unlock_ext
58164 0.2332 Csec_server_establish_context_ext
57549 0.2307 dpns_addreplicax
56742 0.2275 Csec_get_service_name
54007 0.2165 s_send
53878 0.2160 Cmutex_lock
52896 0.2120 Cthread_Self0
52831 0.2118 Cglobals_getTid
52322 0.2097 Csec_client_set_service_name
51651 0.2070 proclreq
50475 0.2023 Csec_server_getAuthorizationId
50146 0.2010 rfio_chown
49827 0.1997 dpns_setfsize
49613 0.1989 Cgetpwuid
48352 0.1938 strtou64
48128 0.1929 dpns_statx
47435 0.1901 dpns_creatc
47175 0.1891 dpns_statr
47141 0.1890 dpm_update_pfr_entry
46838 0.1878 dpns_selectsrvr
46427 0.1861 send2dpns
45639 0.1829 _net_connectable
44656 0.1790 dpm_decode_xferreq_entry
41874 0.1679 Cgetservbyname
41830 0.1677 marshall_PFR
41743 0.1673 rfioreadopt
41429 0.1661 init_trace
41329 0.1657 Csec_server_lookup_protocols
37420 0.1500 Csec_server_establish_context_ext_caller
36188 0.1451 Csec_server_set_protocols
35470 0.1422 isTrustedHost
35298 0.1415 Csec_get_local_service_name
34989 0.1403 status2str
34756 0.1393 dpm_get_def_lifetime
34642 0.1389 s_close
34563 0.1385 next_xifr
34260 0.1373 get_client_actual_id
33924 0.1360 dpm_end_tr
33163 0.1329 Csec_unload_shlib
32512 0.1303 rfio_stat64
32282 0.1294 dpm_get_maxpintime
32259 0.1293 dpm_updfreespace
32244 0.1293 dpm_get_pending_req_by_token
31916 0.1279 Csec_server_reinitContext
31840 0.1276 _check_for_id
31288 0.1254 dpm_selectfsinpool
30921 0.1239 dpm_logreq
29890 0.1198 dpm_poolmatch2
29668 0.1189 dpm_start_tr
29349 0.1176 Cthread_Kill
29319 0.1175 dpns_setptime
29086 0.1166 Cthread_Wait_Condition
28803 0.1155 Csec_server_getClientId
28571 0.1145 dpns_setrltime
27573 0.1105 dpm_get_pfr_by_fullid
26625 0.1067 dpns_setrstatus
26467 0.1061 Csec_server_set_service_name
26043 0.1044 dpm_list_pfr_entry
25919 0.1039 Csec_delete_creds_caller
25239 0.1012 dpm_selectfs
24334 0.0975 dpm_get_req_by_token
24252 0.0972 CDoubleDnsLookup
22554 0.0904 Csec_server_establishContext
22246 0.0892 rfio_close_v2
22089 0.0885 C__rfio_errno
21429 0.0859 Cthread_Lock_Mtx_init
21366 0.0856 _net_writable
20808 0.0834 dpm_get_pfr_by_surl
20627 0.0827 Csec_mapToLocalUser
19449 0.0780 dpns_readdirx
18855 0.0756 inc_reqctr
18092 0.0725 dpns_client_resetAuthorizationId
17957 0.0720 islocalhost
17638 0.0707 Csec_server_initContext
17092 0.0685 _is_proto_deleg_able
16951 0.0679 dpm_get_max_lifetime
16206 0.0650 end_trace
16017 0.0642 rfio_apiinit
15702 0.0629 Csec_name2id
15662 0.0628 dpm_insert_pending_entry
15599 0.0625 dpm_insert_pfr_entry
15054 0.0603 Csec_isIdAService
14803 0.0593 setnetio
14443 0.0579 Cthread_environment
14336 0.0575 dpns_opendirxg
13078 0.0524 dpm_unique_id
12794 0.0513 i64tostr
12648 0.0507 Cencode_groups
12631 0.0506 dpns_lchown
12481 0.0500 dpm_get_next_req
12440 0.0499 dpns_readdir64
11867 0.0476 Csec_client_setVOMS_data
11256 0.0451 dpns_access
11056 0.0443 Cthread_Detach
10962 0.0439 rfio_errmsg_r
10698 0.0429 dpns_rename
9692 0.0389 Cpool_create_ext
9677 0.0388 _Cthread_init
9618 0.0386 Cthread_Mutex_Unlock_init
9449 0.0379 dpns_accessr
8853 0.0355 _is_proto_compat_with_addr
8829 0.0354 stat
8620 0.0346 dpm_insert_xferreq_entry
8407 0.0337 rfio_HsmIf_SwapHsmDirEntry
8115 0.0325 match_ipv6
8099 0.0325 Cgroupmatch
7943 0.0318 dpns_creatx
7834 0.0314 Cthread_Create_Detached
7615 0.0305 rfio_serror_r
6464 0.0259 Cthread_init
6037 0.0242 dpm_update_pending_entry
5894 0.0236 rfio_stat
5495 0.0220 stat64tostat
5475 0.0219 dpns_delreplica
5088 0.0204 dpm_delete_pending_entry
4981 0.0200 dpns_rmdir
4873 0.0195 rfio_mkdir
4585 0.0184 Csec_server_setSecurityOpts
4334 0.0174 Csec_context_is_client
4332 0.0174 dpm_closedb
3762 0.0151 isadminhost
3761 0.0151 Cthread_Create
3755 0.0151 Csec_setup_trace
3636 0.0146 Csec_getErrorMessageSummary
3550 0.0142 dpns_readdir
3108 0.0125 Csec_server_getDelegatedCredentials
3066 0.0123 dpns_lstat
3037 0.0122 rfio_HsmIf_IsCnsFile
2966 0.0119 CnsCleanup
2811 0.0113 match_ipv6_string
2805 0.0112 rfio_unlink
2612 0.0105 dpm_rm_onereplica
2527 0.0101 rfio_serrno
2402 0.0096 AddCnsFileDescriptor
2259 0.0091 Cthread_Cond_Broadcast
2218 0.0089 Cmutex_unlock
2196 0.0088 Csec_client_setSecurityOpts
2045 0.0082 dpns_statg
1871 0.0075 Cthread_Self
1850 0.0074 _setSecurityOpts
1838 0.0074 rexthread
1798 0.0072 rfio_HsmIf_AddCnsFileDescriptor
1761 0.0071 GetCnsFileDescriptor
1710 0.0069 Csec_get_default_context
1598 0.0064 dpns_chown
1481 0.0059 Csec_server_get_client_vo
1451 0.0058 Csec_acquire_creds_caller
1436 0.0058 dpns_unlink
1259 0.0050 _Cthread_destroy
1209 0.0048 Cgetpwnam
1097 0.0044 dpns_setratime
1046 0.0042 rfio_errmsg
967 0.0039 Cthread_Exit
963 0.0039 Csec_client_setAuthorizationId
929 0.0037 _Cthread_start_pthread
883 0.0035 dpns_setatime
877 0.0035 rfio_CnsFilesfdt_freeentry
868 0.0035 rfio_HsmIf_SetCnsWrittenTo
771 0.0031 setlogbits
763 0.0031 rfio_close
753 0.0030 dpm_list_expired_puts
739 0.0030 _Cpool_self
734 0.0029 rfio_HsmIf_stat64
683 0.0027 Csec_server_get_client_fqans
666 0.0027 rfio_lasthost
651 0.0026 dpns_getreplica
642 0.0026 dpns_setfsizeg
620 0.0025 rfio_HsmIf_GetHsmType
590 0.0024 rfio_HsmIf_open_limbysz
576 0.0023 dpns_delete
563 0.0023 rfio_HsmIf_open
562 0.0023 dpns_client_setVOMS_data
555 0.0022 isremote
543 0.0022 rfio_HsmIf_readdir
535 0.0021 __libc_csu_fini
517 0.0021 rfio_HsmIf_read
500 0.0020 rfio_HsmIf_FirstWrite
455 0.0018 dpns_client_getAuthorizationId
415 0.0017 rfio_newhost
377 0.0015 dpns_client_setAuthorizationId
372 0.0015 sstrerror_r
322 0.0013 rfio_mstat64
279 0.0011 gcthread
268 0.0011 rfio_HsmIf_IsHsmDirEntry
229 9.2e-04 dpm_reallocate_space
228 9.1e-04 initlog
209 8.4e-04 rfio_HsmIf_DelDirEntry
197 7.9e-04 DelCnsFileDescriptor
160 6.4e-04 dpm_enoughfreespace
151 6.1e-04 dpm_list_rr_puts
133 5.3e-04 dpns_getgrpbygids
132 5.3e-04 dpm_findpool
117 4.7e-04 sstrerror
116 4.6e-04 dpns_getpath
96 3.8e-04 Csec_map2name_caller
75 3.0e-04 dpns_errmsg
71 2.8e-04 Csec_getErrorMessage
65 2.6e-04 dpns_getgrpbygid
62 2.5e-04 rfiosetopt
59 2.4e-04 Cdecode_groups
59 2.4e-04 _Cthread_addspec
50 2.0e-04 Cpool_create
42 1.7e-04 sperror
41 1.6e-04 dpns_stat
39 1.6e-04 t_recv
34 1.4e-04 s_ioctl
32 1.3e-04 dpns_getidmap
29 1.2e-04 Cthread_isproto
28 1.1e-04 Cinitdaemon
26 1.0e-04 Cthread_Lock_Mtx_addr
13 5.2e-05 rfio_stglog
12 4.8e-05 Cthread_Setspecific
12 4.8e-05 dpm_abort_tr
10 4.0e-05 s_errmsg
9 3.6e-05 setrtimo
8 3.2e-05 Cgai_strerror
7 2.8e-05 _Cthread_addmtx
6 2.4e-05 Cthread_proto
4 1.6e-05 s_nrecv
2 8.0e-06 dpm_getonereqsummary
2 8.0e-06 dpm_getpoolconf
2 8.0e-06 dpm_list_pending_req
2 8.0e-06 dpns_addreplica
1 4.0e-06 Cgetgrgid
1 4.0e-06 Cgetgrnam
1 4.0e-06 Cpool_realloc
1 4.0e-06 Cthread_Mutex_Destroy
1 4.0e-06 Cthread_Setspecific0
1 4.0e-06 dpm_abort_backend_filereq
1 4.0e-06 dpm_decode_fs_entry
1 4.0e-06 dpm_get_cpr_by_fullid
1 4.0e-06 dpm_list_expired_spaces
1 4.0e-06 dpm_opendb
1 4.0e-06 dpm_srv_getstatus_copyreq
1 4.0e-06 dpm_srv_updatefilestatus
1 4.0e-06 dpm_update_pool_entry
1 4.0e-06 dpns_chmod
1 4.0e-06 dpns_creat
1 4.0e-06 rfio_serror
1 4.0e-06 rfio_statfs64

Profiling dpns daemon (CSEC_MECH=ID)

1307079 4.6737 Cthread_Getspecific
1139856 4.0758 Cgetnetaddress
984380 3.5198 _Cthread_addcid
913443 3.2662 Cthread_Lock_Mtx
867947 3.1035 anonymous symbol from section .plt
820168 2.9327 Cthread_Lock_Mtx_ext
764467 2.7335 _Cthread_findglobalkey
732930 2.6207 Csec_server_negociate_protocol
577956 2.0666 _Cthread_obtain_mtx_debug
470888 1.6838 Cns_decode_fmd_entry
468999 1.6770 u64tostru
465542 1.6646 Cns_exec_query
464092 1.6595 doit
455008 1.6270 Cns_parsepath
441824 1.5798 isremote_sa
423451 1.5141 Csec_get_shlib
396670 1.4184 Csec_errmsg
391133 1.3986 strtou64
376554 1.3464 Cgetnameinfo
368770 1.3186 Cgetaddrinfo
347786 1.2436 _Cthread_release_mtx
344498 1.2318 logit
334157 1.1948 Cpool_assign_ext
328117 1.1732 Cpool_next_index_timeout_ext
328065 1.1731 Cglobals_get
320644 1.1465 Cthread_Lock_Mtx_addr
319330 1.1418 scan_interfaces
316476 1.1316 _Csec_recv_token
307022 1.0978 Csec_trace
303477 1.0851 procreq
297433 1.0635 Cns_get_fmd_by_fileid
296120 1.0588 Csec_client_negociate_protocol
291327 1.0417 getconfent_r
288537 1.0317 Cthread_Kill
275145 0.9838 _Cpool_starter
270716 0.9680 _Cthread_self
265608 0.9497 Cns_chkaclperm
259929 0.9294 sendrep
238530 0.8529 Csec_name2id
232363 0.8309 Cthread_Mutex_Unlock
227608 0.8139 C__Coptind
225018 0.8046 u64tostr
224171 0.8016 netread_timeout
223113 0.7978 Cns_get_fmd_by_fullid
202634 0.7246 getreq
190928 0.6827 Csec_clearContext
185724 0.6641 Cns_chkbackperm
185238 0.6624 isremote_scan_cb
178142 0.6370 C__serrno
172035 0.6151 Cns_main
167635 0.5994 Csec_deactivate_caller
164201 0.5871 Cpool_create_ext
159242 0.5694 u64tostrsi
150885 0.5395 Cthread_Create_Detached
147107 0.5260 _netsignal
138135 0.4939 Csec_server_establish_context_ext
127500 0.4559 isTrustedHost
123670 0.4422 Csec_init_globals
122097 0.4366 Cns_chkentryperm
111018 0.3970 Cthread_Wait_Condition_ext
107364 0.3839 netwrite_timeout
106950 0.3824 sstrerror_r
105668 0.3778 Csec_clear_errmsg
105496 0.3772 C__Coptopt
100255 0.3585 match_ipv6
97499 0.3486 Csec_setup_protocols_to_offer
96187 0.3439 Cdomainname
96108 0.3437 Cthread_Getspecific_init
95909 0.3429 Csec_getErrorMessageSummary
94074 0.3364 isTrustedHost2
91951 0.3288 _unmarshall_STRINGN
91747 0.3281 nslogit
90153 0.3224 _Csec_send_token
89766 0.3210 Cns_srv_addreplica
89323 0.3194 C__Copterr
87451 0.3127 Cns_srv_creat
87331 0.3123 Csec_get_local_service_name
87096 0.3114 initlog
86814 0.3104 _add_id
83336 0.2980 Csec_get_service_name_caller
83097 0.2971 get_client_actual_id
80547 0.2880 Csec_server_establish_context_ext_caller
79501 0.2843 Cns_srv_stat
78907 0.2821 getifnam_scan_cb
76951 0.2752 Csec_client_lookup_protocols
76085 0.2721 Csec_apiinit
75740 0.2708 Csec_server_lookup_protocols
71743 0.2565 Cns_srv_setfsize
69855 0.2498 _Cpool_writen_timeout
68508 0.2450 Cthread_Wait_Condition
67309 0.2407 Csec_acquire_creds_caller
67282 0.2406 Csec_delete_connection_context_caller
67125 0.2400 Cns_logreq
66497 0.2378 Cns_get_rep_by_sfn
64661 0.2312 Csec_server_set_protocols
63459 0.2269 Cns_srv_setrstatus
62526 0.2236 Csec_server_getAuthorizationId
62067 0.2219 Csec_server_establishContext
62049 0.2219 netmask_to_prefixlen
61620 0.2203 Cns_decode_rep_entry
61287 0.2191 Cns_srv_setptime
60777 0.2173 is_loopback
56902 0.2035 Cns_srv_setrltime
56059 0.2005 Csec_server_getClientId
55735 0.1993 _Cpool_writen
55666 0.1990 Csec_client_establishContext
55543 0.1986 _Csec_print_token
55336 0.1979 getifnam_sa
54456 0.1947 Csec_map2id
53551 0.1915 Csec_server_set_service_name
53335 0.1907 Cns_start_tr
53115 0.1899 Csec_delete_creds_caller
52990 0.1895 _setSecurityOpts
52692 0.1884 Csec_isIdAService
50140 0.1793 Cgethostbyaddr
50036 0.1789 _net_readable
49214 0.1760 _net_writable
48891 0.1748 Cns_srv_statr
48839 0.1746 Cthread_Mutex_Unlock_ext
48095 0.1720 Csec_get_peer_service_name
47110 0.1685 _add_to_bigbuf
47018 0.1681 Cns_update_rep_entry
46597 0.1666 Csec_server_reinitContext
46499 0.1663 Cthread_Create
45110 0.1613 Csec_get_service_name
41907 0.1498 Cthread_Detach
40815 0.1459 Cns_srv_accessr
40763 0.1458 Cthread_Cond_Broadcast_ext
37456 0.1339 Cns_end_tr
35965 0.1286 check_ctx
35510 0.1270 Cgetservbyname
35067 0.1254 Csec_map2name_caller
33980 0.1215 _try_activate_func
33741 0.1206 _net_connectable
30133 0.1077 CDoubleDnsLookup
29994 0.1072 Csec_init_context_caller
29924 0.1070 isadminhost
29060 0.1039 Cgethostbyname
28812 0.1030 Csec_unload_shlib
28506 0.1019 Cgroupmatch
28501 0.1019 _Cpool_readn
28324 0.1013 _check_short_resp
28117 0.1005 s_send
27435 0.0981 getconfent
25775 0.0922 Cns_update_fmd_entry
24295 0.0869 Cthread_Mutex_Destroy
24154 0.0864 Cns_insert_rep_entry
23571 0.0843 Csec_server_set_service_type
22784 0.0815 Csec_server_initContext
22627 0.0809 _check_for_id
22169 0.0793 Cthread_Cond_Broadcast
19565 0.0700 Cmutex_lock
18900 0.0676 Csec_client_establish_context_caller
17801 0.0637 Csec_client_get_service_name
16442 0.0588 strtoi64
15494 0.0554 Csec_client_set_service_name
15339 0.0548 Csec_setup_trace
15057 0.0538 Cencode_groups
14657 0.0524 _is_proto_compat_with_addr
14524 0.0519 i64tostr
14351 0.0513 Cthread_Exit
13914 0.0498 next_xifr
11824 0.0423 Cthread_Join
11792 0.0422 _Cthread_destroy
10968 0.0392 Cgai_strerror
10198 0.0365 Cthread_unprotect
8620 0.0308 Cns_unique_id
8587 0.0307 Csec_client_initContext
8529 0.0305 _Cthread_init
8463 0.0303 Cthread_Lock_Mtx_init
7976 0.0285 Csec_context_is_client
7871 0.0281 Cthread_environment
7261 0.0260 Cthread_Self
7228 0.0258 Cthread_init
6865 0.0245 Cns_insert_fmd_entry
5901 0.0211 Cns_srv_delreplica
5595 0.0200 Cns_acl_inherit
5368 0.0192 t_recv
4915 0.0176 s_close
4436 0.0159 _fini
3568 0.0128 Cthread_Mutex_Unlock_init
2993 0.0107 Cns_vo_from_dn
2824 0.0101 sperror
2737 0.0098 setlogbits
2328 0.0083 Cthread_isproto
2219 0.0079 s_recv
2117 0.0076 Cpool_create
2048 0.0073 Cmutex_unlock
2035 0.0073 getidmap
1964 0.0070 strutou64
1704 0.0061 procdirreq
1541 0.0055 main
1488 0.0053 sstrerror
797 0.0028 Cns_get_usrinfo_by_name
771 0.0028 Csec_activate_caller
640 0.0023 Cns_get_grpinfo_by_name
633 0.0023 __do_global_ctors_aux
533 0.0019 __libc_csu_fini
501 0.0018 Cns_delete_rep_entry
445 0.0016 Cns_srv_getidmap
410 0.0015 __libc_csu_init
383 0.0014 getonegid
78 2.8e-04 Cgetpwnam
64 2.3e-04 Cgetpwuid
58 2.1e-04 Cns_srv_getgrpbygids
50 1.8e-04 _Cpool_self
42 1.5e-04 Cns_get_grpinfo_by_gid
42 1.5e-04 Cns_srv_unlink
42 1.5e-04 Cupv_check
30 1.1e-04 unlinkonefile
25 8.9e-05 Cns_srv_getgrpbygid
12 4.3e-05 Cns_list_rep_entry
12 4.3e-05 Cns_srv_mkdir
11 3.9e-05 Cns_mysql_error
9 3.2e-05 Csec_initialize_protocols_from_list
8 2.9e-05 Cns_get_umd_by_fileid
4 1.4e-05 Cinitdaemon
4 1.4e-05 Cns_delete_fmd_entry
3 1.1e-05 Csec_mapToLocalUser
3 1.1e-05 _is_proto_deleg_able
2 7.2e-06 Cns_abort_tr
2 7.2e-06 Cns_srv_setatime
2 7.2e-06 s_errmsg
1 3.6e-06 Cns_closedb
1 3.6e-06 Cns_opendb
1 3.6e-06 Cns_srv_delreplicas
1 3.6e-06 Cns_srv_getreplicax
1 3.6e-06 Cns_srv_readdir
1 3.6e-06 Cns_srv_rmdir
1 3.6e-06 Cns_srv_setcomment
1 3.6e-06 Cthread_Setspecific
1 3.6e-06 _Cthread_addspec
1 3.6e-06 _Cthread_cid_once
1 3.6e-06 netconnect_timeout
1 3.6e-06 setrtimo

Thread Usage DPM/DPN

Fast Threads: 2-52 - Slow Threads 53-102
dpmthreads.jpg
DPNS threads 1-100

dpnsthreads.jpg

Performance Degredation over Time


performancedecay.jpg

Test Series 3 - Listing of Directories

900 clients listing randomly one out of 9000 directories with ~9000 entries each.CSEC_MEC="ID" (no GSI).
Average listing rate: 1.4 directories / s ( e.g. ~ 15000 entries/s returned). 19% of requests say "send2nsd: NS002 - send error : Operation timed out" although
return code of rfdir is always '0'. It is not clear to me, if the listing is anyway successful or only the return code is wrong.
dirlist.jpg

Test Series 4 - Reading randomly files (size 1Mb) with 900 clients - CSEC_MECH=ID - namespace size = 7.8 Mio files


readrate.jpg

The read performance is ~20 files/s. At a certain point we see 'client out of memory' error in the DPM logfile.

readtime.jpg


Test Series 5 - Storing files with KRB5 authentication mechanism

Quick tests were made with KRB5 to see whether authentication would be faster.

******************************************************************* Authentication via KRB5, DPM host lxb8971.cern.ch [labadie@lxb5409 LCG-DM]$ time rfio/rfcp /tmp/bigfile.01 /dpm/cern.ch/home/dteam 4000000 bytes in 1 seconds through local (in) and eth0 (out) (3906 KB/sec) 0.007u 0.024s 0:02.04 0.9% 0+0k 0+0io 0pf+0w [labadie@lxb5409 LCG-DM]$ dd if=/dev/zero of=/tmp/bigfile.02 bs=1000 count=0 seek=1000 > & /dev/null [labadie@lxb5409 LCG-DM]$ time rfio/rfcp /tmp/bigfile.01 /dpm/cern.ch/home/dteam 4000000 bytes in 1 seconds through local (in) and eth0 (out) (3906 KB/sec) 0.010u 0.018s 0:01.95 1.0% 0+0k 0+0io 0pf+0w [labadie@lxb5409 LCG-DM]$ time rfio/rfcp /tmp/bigfile.02 /dpm/cern.ch/home/dteam 1000000 bytes in 0 seconds through local (in) and eth0 (out) 0.010u 0.015s 0:01.80 1.1% 0+0k 0+0io 0pf+0w [labadie@lxb5409 LCG-DM]$ dd if=/dev/zero of=/tmp/bigfile.03 bs=1000 count=0 seek=1000 > & /dev/null [labadie@lxb5409 LCG-DM]$ time rfio/rfcp /tmp/bigfile.03 /dpm/cern.ch/home/dteam 1000000 bytes in 1 seconds through local (in) and eth0 (out) (976 KB/sec) 0.009u 0.020s 0:01.64 1.2% 0+0k 0+0io 0pf+0w

[labadie@lxb5409 LCG-DM]$ time ns/dpns-ls /dpm/ cern.ch 0.008u 0.003s 0:00.06 0.0% 0+0k 0+0io 0pf+0w [labadie@lxb5409 LCG-DM]$ time ns/dpns-ls /dpm/cern.ch home 0.005u 0.001s 0:00.04 0.0% 0+0k 0+0io 0pf+0w [labadie@lxb5409 LCG-DM]$ time ns/dpns-ls /dpm/cern.ch/home alice apeters atlas biomed cms dteam lhcb ops 0.006u 0.005s 0:00.04 0.0% 0+0k 0+0io 0pf+0w

********************************************************************* GSI, DPM lxn1177.cern.ch SL4-32 not loaded [labadie@lxb5409 ~]$ time /opt/lcg/bin/rfcp /tmp/bigfile.13 /dpm/cern.ch/home/lhcb 4000000 bytes in 1 seconds through local (in) and eth0 (out) (3906 KB/sec) 0.198u 0.033s 0:02.67 8.2% 0+0k 0+0io 0pf+0w [labadie@lxb5409 ~]$ time /opt/lcg/bin/rfcp /tmp/bigfile.12 /dpm/cern.ch/home/lhcb 4000000 bytes in 0 seconds through local (in) and eth0 (out) 0.203u 0.027s 0:02.68 8.2% 0+0k 0+0io 0pf+0w [labadie@lxb5409 ~]$ dd if=/dev/zero of=/tmp/bigfile.14 bs=1000 count=0 seek=1000 > & /dev/null [labadie@lxb5409 ~]$ time /opt/lcg/bin/rfcp /tmp/bigfile.14 /dpm/cern.ch/home/lhcb 1000000 bytes in 0 seconds through local (in) and eth0 (out) 0.200u 0.019s 0:02.43 8.6% 0+0k 0+0io 0pf+0w [labadie@lxb5409 ~]$ dd if=/dev/zero of=/tmp/bigfile.15 bs=1000 count=0 seek=1000 > & /dev/null [labadie@lxb5409 ~]$ time /opt/lcg/bin/rfcp /tmp/bigfile.15 /dpm/cern.ch/home/lhcb 1000000 bytes in 1 seconds through local (in) and eth0 (out) (976 KB/sec) 0.205u 0.017s 0:02.43 8.6% 0+0k 0+0io 0pf+0w [labadie@lxb5409 ~]$

[labadie@lxb5409 ~]$ time /opt/lcg/bin/dpns-ls /dpm/ cern.ch 0.084u 0.014s 0:00.44 20.4% 0+0k 0+0io 2pf+0w [labadie@lxb5409 ~]$ time /opt/lcg/bin/dpns-ls /dpm/cern.ch home 0.089u 0.009s 0:00.40 20.0% 0+0k 0+0io 0pf+0w [labadie@lxb5409 ~]$ time /opt/lcg/bin/dpns-ls /dpm/cern.ch/home alice atlas biomed cms dteam lhcb ops 0.085u 0.012s 0:00.41 21.9% 0+0k 0+0io 0pf+0w [labadie@lxb5409 ~]$ ***********************************************************************

As one can see, there is almost no difference between authentication based on KRB5 and GSI when using rfcp commands. So in our case, we don't expect much differences. The only major improvment is the CPU consumption which is much less.

-- LanaAbadie - 13 Feb 2008

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf DPM_scalability_test_plan.pdf r2 r1 manage 280.5 K 2008-02-15 - 12:18 LanaAbadie first test plan
JPEGjpg c1.jpg r1 manage 62.7 K 2008-02-18 - 20:06 AndreasPeters  
JPEGjpg csecid.jpg r1 manage 34.3 K 2008-02-20 - 11:11 AndreasPeters  
JPEGjpg dirlist.jpg r1 manage 35.8 K 2008-02-25 - 11:35 AndreasPeters  
JPEGjpg dpmthreads.jpg r1 manage 15.7 K 2008-02-21 - 11:01 AndreasPeters  
JPEGjpg dpnsthreads.jpg r1 manage 17.9 K 2008-02-21 - 11:01 AndreasPeters  
GIFgif headnode-load.gif r1 manage 9.9 K 2008-02-18 - 20:08 AndreasPeters  
GIFgif headnode.gif r1 manage 17.6 K 2008-02-18 - 20:07 AndreasPeters  
JPEGjpg performancedecay.jpg r1 manage 33.3 K 2008-02-21 - 11:14 AndreasPeters  
JPEGjpg readrate.jpg r1 manage 24.0 K 2008-02-25 - 13:46 AndreasPeters  
JPEGjpg readtime.jpg r1 manage 31.5 K 2008-02-25 - 13:46 AndreasPeters  
JPEGjpg transfertime-id.jpg r2 r1 manage 46.9 K 2008-02-20 - 11:17 AndreasPeters  
JPEGjpg transfertime.jpg r1 manage 43.4 K 2008-02-19 - 11:28 AndreasPeters  
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2008-02-28 - LanaAbadie
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback