Activity Reports 2016 - Alexey Anisenkov

November

  • Site movers evolution and updates:
    • general clean up and stage-in/stage-out workflow optimization
    • turning of retry policy for site movers (random sleep time, retry attempts, skip failed protocols, etc)
    • workaround implementation of upload transfer validation (for xrdcp site mover) at sites which do not support remote checksum calculation on-fly
    • Fixed direct access issue affected some ANALY sites (requesting DBRelease files not from CVMFS)
    • lsm sitemover (workaround) updates required to run new movers at US sites which retired lcg-util tools
    • Changes in stage-in workflow (ignore NO_REPLICA error and continue stage-in tries for other available protocols)
  • Final migration of Pilot sitemovers into the production (RU, IT, TW, CA, UK, NL, FR, DE clouds)
  • gfal-copy sitemover implementation

October

  • Review of Panda-Pilot JobSpec communication. Proposal to enhance shedconfig and JobSpec parameters passed by PanDA to Pilot (explicit declaration, nuclei destinations, alternative stage-out)
  • General Pilot updates to keep track if new sitemovers workflow is active (special info tag in PilotID identifier), special mode to force run pilot with new movers activated
  • Kibana based analysis to provide plots for checking various sitemover metrics (usage statistic, copytool errors, activated sitemovers, etc)
  • Final Pilot sitemovers migration into the production (ES cloud, few major sites in DE, IT,UK)
  • AGIS side upgrades required for site movers migration (WebUI implementation to overview old and new copytools settings, links to panda monitor pages, "job monitor" pages with integrated kibana plots to track and monitor how new movers works respect to old implementation)
  • AGIS side upgrades: review and clean up of protocols and copytools, migration to new Resourse based protocol implementation in AGIS (required for sitemovers)
  • Implementation of PandaQueue associated storages in AGIS and Pilot movers (represented list of local storages per activity used by the pilot to look up default DDMEndpoint source and destination)
  • SIte movers upgrades and evolution:
    • Job log processing integrated into new sitemover architecture both for normal stage-out to primary SE and special log file transfers
    • lcgcp sitemover clean up actions (force to remove remote file in case of failed transfer)
    • Stage-in logic update for proper handling DBRelease files (check data availability first at CVMFS)
    • Workflow update to proper handle extra output (split) files produced by the payload
    • Adjusted site mover timeout values and error reports
    • Stage-in and stage-out workflow optimization
    • Fixed checksum value calculation (CRC mismatch issue at some sites)

September (including all Pilot activities during 2016)

  • Pilot code review and reverse engineering to identify site movers workflow and active use-cases
  • Development and core implementation of new standalone architecture for the site movers component
  • Refactoring and Implementation of specific site movers (xrdcp, lsm, dccp, lcgcp)
  • Consolidation of Storage declaration used in the schedconfig configuration by Pilot
  • Integration of SE protocols specifics, objectstores declaration and other schedconfig extensions from AGIS into Pilot
  • Clean up and refactoring basic sitemover related workflow (RunJob, JobLog, etc)
  • Various sitemovers functional extensions and upgrades:
    • monitoring: implementation of mover traces sent to Rucio to track data transfers, errors and sitemover's misconfiguration. Kibana based analysis of copytools usage
    • implementation of special stage-out of logs files to ObjectStores
    • activity based approach of copytools and protocols declaration
    • direct access reading workflow
  • AGIS: Various upgrades, models evolution, WebUI/API extensions (AGIS side) to configure related PandaQueue specifics and operate sitemovers from AGIS
  • General description of sitemovers (documentation)
  • Pilot sitemovers tests and migration:
    • adjust Pilot logic to enable flexible testing of new site movers (parallel implementation, special switch to activate new architecture, special RCM release for HC)
    • active HC side testing, new templates and functional tests
    • active APF side testing, special auto pilot factories deployment and configuration
    • soft-testing of new sitemovers against real jobs in the production
  • Final Pilot sitemovers migration into the production
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-12-08 - AlexeyAnisyonkov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback