LFC performance tests with Python API

 

LFC sessions

  • The time needed to establish a LFC session is long, particularly with very distant LFC hosts.
  • The cause could be a large number of messages exchanged during authentication.
  • It is possible to trace those messages with the CSEC_TRACE and CSEC_TRACEFILE environment variables
    • By now, CSEC_TRACEFILE must specify a logfile not on AFS, but on a local disk.
    • Listing an empty folder permits to count the number of messages needed to establish a LFC session, and that gives 14 messages :
      • 8 messages '_Csec_send_token: Sending packet'
      • 6 messages '_Csec_recv_token: Receiving packet'
    • One can suppose that the time needed to transfer these 14 messages should not be higher than 8 RTT (Round Trip Time).
    • But measurement between Switzerland and Taiwan gives :
      • RTT :                                   0.3 s
      • Elapsed time to list an empty folder :  4.0 s
    • Who can explain why the elapsed time is 2.6 s higher than 8 RTT ?
 

Performance matrix CE host / LFC host

  • Following scripts are saved under http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/LCG-DM/test/python/lfc/ :
    • simple functional test scripts using the Python API,
    • a LFC performance test script using the simple Python scripts,
    • a job submission script,
    • scripts to parse the log files.
  • This permits to get quite quickly a matrix of LFC performances between CE hosts and LFC hosts around the world.
  • Apart session initialization, test durations are quite proportional to RTT (Round Trip Time), so best performance is achieved when the CE host and the LFC host are close.
  • Testing with a CE host in the same country as the LFC host shows that 'prod-lfc-shared-central.cern.ch' is much slower than all other LFC hosts (often by a factor 2). This could be caused by heavy load.
  • For the list of an empty folder, lfc-mkdir, lfc-rm, lcg-cr and lcg-del commands, Standard Deviation is sometimes higher than Average.  The cause is that the global response time is generally quick and stable, expect for a few cases where it is much longer.  In these cases, average elapsed times given are not meaningful.
 

Optimization of the 'LFCReplicaCatalog.py' script of ATLAS

  • Performance test scripts using the simple Python scripts are saved under http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/LCG-DM/test/python/lfc/
    • Following scripts use different methods to list LFC files and their replicas :
      • lfc-readdirxr.py                   calls the 'lfc_readdirxr' method on the folder
      • lfc-readdirg-lr.py                 calls the 'lfc_listreplica' method for each file
      • lfc-readdirg-recurse-gr.py         calls the 'lfc_getreplica' method for each file
      • lfc-readdirg-recurse-readdirxr.py  creates a dictionary of files per folder, and then calls the 'lfc_opendirg' and 'lfc_readdirxr' methods
      • lfc-readdirg-recurse-grs.py        calls the new 'lfc_getreplicas' method once with the list of all GUID in parameter

  • On  LFC_HOST=lxb1540.cern.ch,  for the  /grid/atlas/dq2/ddmf1  folder, average elapsed times measured on 26 January 2007 are :
    • For the little  ddmf1.001099.Argon.global.ESD.v000211  subfolder :
      • 0.6 s  with  lfc-readdirxr.py
      • 4.2 s  with  lfc-readdirg-lr.py
      • 1.4 s  with  lfc-readdirg-recurse-gr.py
      • 1.1 s  with  lfc-readdirg-recurse-readdirxr.py
      • 1.0 s  with  lfc-readdirg-recurse-grs.py

    • For the whole  /grid/atlas/dq2/ddmf1  folder recursively :
      •  58 s  with  lfc-readdirxr-recurse.py
      • 319 s  with  lfc-readdirg-recurse-gr.py
      • 106 s  with  lfc-readdirg-recurse-readdirxr.py
      •  65 s  with  lfc-readdirg-recurse-grs.py

  • When the input data is a list of GUID, it is clear that the fastest method is to use the new 'lfc_getreplicas' method, with following remark :
    • When a LFC file has no replica, this method does return exactly 1 record for this file, with srm=''.

Maintained  by Etienne URBAH

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2007-11-02 - EtienneUrbah
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback