Jenkins automation for Tier0 replays

T0 is using Jenkins to automate replays for configuration testing. Administrative details:

  • Access to the Jenkins dasboard, OpenShift portal should be given by adding oneself to cms-tier0-jenkins-administration egroup.
  • Jenkins dashboard is hosted as a CERN Web Service on OpenShif. Therefore, Jenkins dasboard site https://cms-tier0-jenkins.web.cern.ch/ needs to be re-validated at CERN Web Services every year. As for now (April 2019), it is owned by Vytas.
  • In some cases (to re-establish a ssh connection to replay voboxes after cmst1 account password change, to redeploy Jenkins instance, etc.) it may be necessary to use the OpenShift portal.

The logic of the automation is stored on Jenkinsfile. Jenkins file is stored in the root directory of dmwm T0 repository. Normally, nothing needs to be changed there except some bugfixes, improvements, etc. Additionally, there are 3 scripts used for replay step checks:

The rest of this section will summarize the general information and instructions of DMWM-T0-PR-test-job Jenkins job, which is actually managing the replay automation.

To work with DMWM-T0-PR-test-job some preparations are needed.

  • Pull the most recent T0 repository master branch version (from https://github.com/dmwm/T0/ )
  • Create a PR to the master branch with a desired ReplayOfflineConfiguration.py configuration. (!!!) For T0 people who are testing patches, new T0 releases - it is also sometimes necessary to adjust 00_software and 00_deploy_replay scripts.
  • The PR branch versions of these files are used when deploying a new replay.
  • After a PR is created, one has to comment the PR with a phrase test this please in order to notify the cmsdmwmbot GH user (which is actually doing all the communication between dmwm-jenkins and GH and also reporting the outcome of the tests executed). For example, at: https://github.com/dmwm/T0/pull/4514
  • To trigger with PR, Github user name should be added at repository colaborator or Build Triggers/GitHub Pull Request Builder/White list.
  • The "test this please" phrase in a PR comment actually triggers the T0 replay job (configuration at https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-T0-PR-test-job/configure).

When a new pull request is opened in the project and the author of the pull request isn't white-listed, builder will ask "Can one of the admins verify this patch?". "ok to test" to accept this pull request for testing "test this please" for a one time test run "add to whitelist" to add the author to the whitelist If the build fails for other various reasons you can rebuild.

"retest this please" to start a new build The new replay make status issue at https://its.cern.ch/jira/projects/CMSTZDEV/issues/. When each process end there will be notifiaction at issue as comment about status ex)https://its.cern.ch/jira/browse/CMSTZDEV-479.

DMWM-T0-PR-test-job Jenkins job works with some procedure.

  • The main steps (the job is in principle a bash script, so the names of these steps are simply printed out in the Jenkins job log) of this Jenkins job are:
  • "CleanupBefore": The script looks for any available replay VM and once it is found, that the VM is cleaned up and prepared for a new replay. Also, a new JIRA issue (currently configured for the CMSTZDEV Jira project, not the main CMSTZ project, feel free to change this when needed) is created with the following title pattern: "Tier0_REPLAY on . ". For now, I am adding the link to GH PR to the description of the issue. We may want to add a link to Grafana monitoring of the replay, etc. - any important and relevant information.
  • "UpdateConfigurations" - copying ReplayOffline config and all deployment files from the PR to their desired location in a replay VM.
  • "DeployTheAgent". T0 WMAgent gets deployed, patched etc.
  • "StartTheAgent". T0 WMAgent is started.
  • "ReplayChecks". They all also trigger a comment to the replay issue on Jira:
  • "ExpressProgress" - checks for NOT yet complete Express workflows. Successfully exits the step when there are no incomplete Express workflows left.
  • "RepackProgress" - checks for NOT complete Express workflows. Successfully exits the step when there are no incomplete Repack workflows left.
  • "FilesetProgress" - checks for not cleaned up filesets. Successfully exits the step when there are no filesets left.
  • "PauseProgress" - checks for any paused jobs. Exits the step with a failure when there are paused jobs in the replay. Paused job will be commented with list of paused job. In this case, the paused jobs need to be checked manually. Also, we don't stop the replay right after the first paused job is found since there may be several different failures in one replay and we want to notice all of them.

Currently more information about jenkins can be found at https://docs.google.com/document/d/1YxwfmDn5xGteX7CGXViv51Jxsemd9lRCJu4EMVdA41Q.

Most replay is tested at https://cms-tier0-jenkins.web.cern.ch/job/replay_test. Some stable change will be added to https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-T0-PR-test-job.

-- VytautasJankauskas - 2019-04-02

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2019-12-09 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback