WLCG MW Readiness WG 11th meeting Minutes - June 17th 2015

WG twiki

Agenda

Summary

  • DPM 1.8.9 with DPM-DSI 1.9.5-3 deletion test with gridftp in Rucio being set-up in Edinburgh for ATLAS.
  • Triumf and NDGF, testing multiple dCache versions, discovered, in production, an issue related to a DB table memory leak in versions <= 2.10.28 and 2.12.8. The fix was released and tested by the sites. Also an srm bringonline issue was found in production, ATLAS test workflow now being extended by the experiment expert to test also this functionality at Triumf
  • Fine-tuning configuration at CNAF for StoRM testing for ATLAS.
  • DPM 1.8.9 with DPM-DSI 1.9.5-3 tests at GRIF for CMS showed checksumming issue via PhEDex tests. Fix by the DPM dev. team now being in process to be released.
  • New pakiti-client version 3.0.1 is imminent in EPEL Stable. The updated documentation is available to all Volunteer Sites, together to a new configuration file to be used due new PKG DB servers deployment. This new pakiti-client version gives the possibility to specify a tag ( --tag option).
  • MW Readiness nodes should start publishing their packages with the tag MWR. Andrea Manzi will contact the sites for this upgrade.
  • Check the MW Readiness App https://wlcg-mw-readiness.cern.ch/ now offering the management of Baseline MW versions.
  • EL7 support and the move to Java 8 are now urgent for ARGUS. The CERN testbed will be available real soon now for testing under heavy load and other scenarios.
  • PIC made progress with the dCache v.2.12.11 testing. Asking for other sites to be participate in test injections for loadtest.
  • The next MW Readiness WG vidyo meeting will take place on Wednesday September 16th at 4pm CEST.

Attendance

  • Local: Alberto Aimar (CERN IT/SDC mgnt), Maria Dimou (chair & notes), Ben Jones (T0), Maarten Litmaath (ALICE & notes), Andrea Manzi (MW Officer), Andrea Sciaba (CMS), Vincent Brillault (security).
  • Remote: C. Acosta (PIC), Ricardo Cruz (PIC), Raul Lopes (Brunel Univ.), Jeremy Coles (GridPP), Pepe Flix (PIC), Antonio Maria Perez-Calero Yzquierdo (PIC), Samuel Cadellin Skipsey (Glasgow), Vincenzo Spinoso (EGI Ops Officer).
  • Apologies: David Cameron (ATLAS), Lionel Cons (MW Readiness software tools), Alessandra Doria (Napoli); Sven Gabriel (EGI Security Officer

Minutes of previous meeting

The minutes of the last (10th) meeting HERE were approved.

MW Officer report

Andrea M.'s slides contained all recent information on our Software tools and led to this discussion:

  • bringonline testing:
    • other sites may not be able to set up a separate tape library for such tests
    • besides ATLAS the functionality is also relevant for LHCb and (eventually) CMS

  • PIC setup for CMS:
    • Xrootd monitoring plugins are also being tested
    • their monitoring info needs to be reported for a different site name
    • to be defined in the Dashboard DB, as already done for a few similar cases

Report from the ARGUS meeting

  • Argus meeting held on June 5
  • also summarized in the GDB introduction of June 10

  • main points for MW Readiness:
    • EL7 support being worked on
      • first builds expected in a few weeks
      • basic testing should follow
      • stress testing to some extent would be desirable before the release
    • Java 8 support would come in the autumn
      • when extra effort from DataCloud has become available
      • some dependencies may need to be updated
        • jetty, bouncycastle
      • some code changes may be needed
    • newer versions of such external products may bring fixes for issues that have been hampering us
    • the recurrent issue at CERN is finally getting tackled!
      • Andrea C now has a CERN account for easier access to service hosts in bad states
      • since last Thu we happen to have one bad host taken out for investigation
        • its argus-pepd developed a high load for no apparent reason
        • various traces and logs have been sent for inspection
        • at the time of writing, the cause was not yet determined
    • a separate test instance of the service is mostly ready
      • an NFS share for the gridmapdir needs to be obtained to mimic the production setup
    • the initial testing will be done from lxplus
      • it may already be largely sufficient for hammering the test setup

Discussion:

  • the Argus tests from lxplus will be against a standalone service
  • the gridmapdir may be kept on its "local" disk (the host is a VM)
  • we will try to get the test service into a bad state that subsequently can be debugged

Sites' feedback

  • Napoli
    • CREAM CE tests in Napoli are running smoothly.
  • CNAF
    • Set-up for ATLAS StoRM tests done. Storage being configured. Details in JIRA:MWREADY-61.
  • Triumf
    • Good progress with ATLAS dCache tests.
  • PIC status report
    • SRM with pre-production dcache storage:
      • SE srm-pps.pic.es, 10 TB of disk available, currently dcache 2.12.11
      • SRM, GridFTP, NFS4.1, gsidcap, xrootd protocols enabled
      • xrootd 3.3.6
    • CMS specific part:
      • voboxcms-pps.pic.es new vobox installed with Dev PhEDEx agents configured to point to srm-pps.pic.es
      • Loadtests PIC to/from GRIF_LLR established and running. Currently failing for reasons not associated to PIC or dcache
      • Loadtest to CERN next, injection from PIC setup, waiting for approval from CERN admin (request). Please, create also an injection CERN->PIC
      • HC tests: test dataset replicated to the validation storage, TFC modified accordingly, HC test jobs submitted and running at PIC
    • Results
      • tests of initial interaction with storage as a CMS user working fine (fts-transfer-submit, lcg-ls, xrootd, etc)
      • xrootd monitoring plugins not working with dcache 2.12.11. Reported and is being worked on. Action 20150617-01
      • PhEDEx and HC already setup and working fine HC_test_T1_ES_PIC.png
    • Next steps:
      • upgrade to validate dCache 2.13
      • xrootd 4 tests?
      • WLCG monitoring for xrootd activity on srm-pps. what is the procedure to implement this? (it was discussed to create a new site pic_mwr, or similar...)
      • CMS SAM tests: Put, Get, TFC
    • A final note: we are going to be validating dcache releases, however the upgrade procedure may be different with respect to upgrades between golden releases in production storages. We are in principle validating and documenting any problem found with each release (2.12, moving next 2.13), not jumps from one golden release to another (next jump 2.10 to 2.13).

Discussion:

  • currently there is no way in SAM to test a non-production SE without impact on A/R results
  • switching from Xrootd 3 to 4 probably would be good:
    • sites should anyway move to Xrootd 4 this year (e.g. for IPv6)
    • new dCache versions would no longer be tested against Xrootd 3
    • to be discussed further in CMS

Actions

Action items Done from past meetings can be found HERE.

  • 20150617-02: Andrea S. to discuss with CMS mgnt whether to stay with dCache testing with xrootd3 or move to xrootd4. JIRA:MWREADY-66 New
  • 20150617-01: Antonio Y. (PIC) to follow progress on the xrootd monitoring plugin issue found via the dCache testing at PIC for CMS. JIRA:MWREADY-65 New
  • 20150506-04: CNAF to participate in the StoRM Readiness verification. Details in JIRA:MWREADY-61 Done
  • 20150506-03: NDGF, Triumf, CNAF, PIC to install the pakiti client. Updated instructions here.
  • 20150506-02: Joel and Stefan to state if and how they wish to participate in the MW Readiness verification effort. Status?
  • 20150506-01: Maarten to check with ALICE which version use which xrootd version and if they wish to participate in the MW Readiness verification effort. Status?
  • 20150318-05: Pepe to proceed with the MW Readiness set-up at PIC Done
  • 20150318-02: Ben to set-up the ARGUS testbed at the T0 Re-opened
  • 20150318-01: Manuel to communicate to EOS and FTS managers the reminder of the Pakiti client installation instructions here. Status?
  • 20141119-03: Andrea M. to contact the GRIF site to proceed with WN testing via the CMS workflow POSTPONED
  • 20140702-06: Andrea M. & Lionel Discuss the visualization of testing results. On-going

AOB & Next meeting

  • End of July (Wed 29th?) or early September (Wed 9th?) were suggested. The WG concluded on Wednesday September 16th at 4pm CEST.

-- MariaDimou - 2015-06-15

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng HC_test_T1_ES_PIC.png r1 manage 32.0 K 2015-06-16 - 16:32 AntonioPerezCalero HC jobs reading from dcache validation storage at PIC
Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r30 - 2018-02-28 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback