SL6 Migration Task Force of the WLCG Operations Coordination Team

Introduction

The aim of this task force is to help sites migrating Worker Nodes to SL6.

People

  • T0: Helge Meinhard, Steve Traylen
  • T1: Ian Collier (RAL), Di Qing (TRIUMF), Burt Holzman (FNAL), Andreas Petzold (KIT), Christofer Hollowell (BNL)
  • T2: Alessandra Forti (Manchester), Shawn McKee (AGLT2), Raul Lopez (Brunel), Alessandra Doria (Napoli)
  • Atlas: Simone Campana (ops), Alessandro De Salvo (ops), Rod Walker (ops), Ikuo Ueda (ops), Emil Obreshkov (sw librarian)
  • CMS: Christoph Wissing (desy), Giulio Eulisse (sw librarian), Oliver Gutsche (computing operations), Brian Bockelman (grid and xrootd expert)
  • Lhcb: Stefan Roiser (ops), Ben Couturier (sw librarian), Joel Closier (grid expert)
  • Alice: Maarten Litmaath (ops), Latchezar Betev (offline coord)
  • EGI: Tiziana Ferrari, Peter Solagna
  • SL6 tarball: Matt Doidge (Lancaster)
  • IT/ES: Andrea Valassi

All members of the task force can also be contacted at wlcg-ops-coord-tf-sl6-migration@cernNOSPAMPLEASE.ch
The list of people who were added at a later stage can be seen in the egroup page.

Deployment Status

Sites

Experiments

Question Alice Atlas CMS LHCb
Can the experiment run on SL6? yes but tested only on small scale yes, but still some compilation problems with analysis rels yes, in compatibility mode working on native exec yes, in compatibility mode working on native exec. Most of the sw stack is already on SL6 native
Can the experiment run on mixed clusters? yes no yes in compatibility mode but would prefer different queues so there are no problems when SL6 native executables are introduced yes, but would prefer separate queues to avoid having to blacklist both OSs in case things went wrong
Big bang or gradual upgrade? gradual allows to find any problem without risks gradual but can cope with big bang if sites need gradual doesn't matter as can run on mixed clusters
Does the experiment have already SL6 sites? a few small sites SARA T1 and US atlas site, Brunel (only prod) half USCMS and few test sites in EU (Brunel, Desy, CNAF) some sites
Are there time constraints? no prefer sites NOT to upgrade before 1st June 2013 no indicatively would prefer summer though no
Do the experiments have an upgrade procedure in place? yes, but informal yes, two linked from TF page see below    
Communication with sites site contacts list, EGI ops central ops and cloud support local experts  
Test procedure yes several sites with test queues yes, if sites strongly require yes

Procedures and how to contact experiments

General

  • Please install SL6 HEP_OSlibs dependency rpm.
  • A WLCG repository was created to host rpms needed by the experiments at sites. Please enable it. You'll find the yum files to enable it in it. It has SL5 and SL6 versions and already contains HEP_OSlibs for both.
  • Experiments don't want mixed CEs, they require a separate queue for different architectures.
    • Quickest way for all the experiments is to reuse a queue that has been already setup not to create new queues (if you can avoid it).
  • HS06: there is up to 20%-25% difference between SL5 and SL6 values of HS06 on newer hardware. Sites should rerun the benchmark and change the results in the BDII. Also the results should be posted on the hepix site https://w3.hepix.org/benchmarks/doku.php?id=bench:results_sl6_x86_64_gcc_445

Alice

  • Site admins can contact the site's local ALICE support
  • ALICE support should contact the alice-lcg-task-force list to discuss plans and questions

Atlas

https://twiki.cern.ch/twiki/bin/view/AtlasComputing/SLC6Readiness

CMS

  • Site admins can contact the site's local CMS support.
  • CMS local contacts should be aware of what are the CMS requirements and what needs to be done.
  • Local CMS support should open a savannah ticket for CMS in case of problems.

LHCb

  • To access SL6 resources LHCb requires the SL6 CE/queue set to production in the BDII. If it is not in the BDII LHCb doesn't see it.
    • If the queue is new or you insist on testing it in advance you'll have to contact LHCb to get it in the SUM tests and DIRAC respectively.
  • In case of problems with LHCb software sites should open a GGUS ticket for LHCb
    • Select LHCb in "Concerned VO"
    • Specify to assign to VOsupport in the text
  • Alternatively contact Vladimir.Romanovskiy@cernNOSPAMPLEASE.ch

Other Documentation

Task Force reports and presentations

T0

Meetings

Issues to solve and Tasks

  1. Understand each experiment sites status, i.e. Alice has already some sites on SL6, Atlas has 1, CMS? Lhcb?
  2. Put together the documentation necessary for sites: some twiki pages already exist are there others? Are they all visible to the external world or do they require special access?
    • 13/02/28 Atlas procedure pages are already linked from the TF twiki, other experiments need to indicate their pages if they are needed.
    • 13/04/23 Created a section with general and experiments procedures and experiments contacts
  3. Test HEPOS_libs on external sites not using SLC6
    • 13/03/19 TF sites setting up SL6 test queues will do that.
    • 13/04/25 Tested by two UK sites and two Turkish ones
  4. Setting up a WLCG repository at CERN
  5. Do we need test queues at sites? How can the upgrade be done?
    • 13/03/19 Majority of experiments prefer gradual but can cope with bing bang if sites prefer.
    • 13/03/19 Atlas has several sites with test queues, CMS has testing mechanism in place
    • 13/04/22 LHCb needs the queues to be published in the BDII to avoid to much manual intervention. Better if the queues are existing ones.
  6. What communication channels should we use to help sites move? Experiments? Tickets?
    • 13/04/18 TF communications will be handled using wlcg-tier*-contacts
    • 13/04/23 As above created a section on how to contact experiments in case of problems.
  7. Do we need to follow every site?
    • 13/03/19 We will track every site in one or more twiki tables
    • 13/04/18 Sites deployment table created with all T1 and T2 sites.
  8. Do we need to coordinate T1s? Atlas doesn't want all T1s going at the same time for example what are the other experiments thinking?
    • 5 T1s present in the group others will be contacted to ask for their plans when there is a more defined experiments situation (see point 13. and 14. below for progress on contacting sites)
  9. OSG sites? At the moment the representation is mostly EU centric + Canada.
    • 13/03/21 BNL, FNAL, AGLT2 and USCMS T2s are represented now.
    • 13/04/18 USCMS T2s migration almost completed
    • 13/04/18 USATLAS is fine and is organising the upgrade of their facilities.
  10. Do we need a target date? If yes it is important it either is far away from EMI-3 migration or it coincides with it. EMI-3 migration is by 30 April 2014.
    • 13/03/07 WLCG Ops Coord meeting it was decided to have the 31st October 2013 as target date to move the bulk of resources with an expected tail in the autumn.
  11. Issue with lxplus migration timeline raised by CMS.
    • 13/02/27 Solved between CMS and T0.
  12. Contact Tier1 sites to ask for plans
    • 13/04/18 Wrote an email to wlcg-tier1-contacts will follow up with T1s not replying
    • 13/04/22 Contacted few T1s that hadn't replied. All replied.
  13. Contact Tier2 sites to ask for plans
    • 13/04/18 Investigating which is the best channel
    • 13/04/22 Contacted T2s using wlcg-tier2-contacts
  14. Problems found during migration
  15. Checking sites using EMI-3
  16. Following up on HS06 differences

Stats progress

  1. 13/05/22
    • Total number of Tier1s Done: 4/15
    • Total number of Tier2s Done: 20/129
  2. 13/06/18
    • Total number of Tier1s Done: 6/15
    • Total number of Tier2s Done: 20/129
  3. 13/06/20
    • Total number of Tier1s Done: 6/15
    • Total number of Tier2s Done: 27/129
  4. 13/07/22
    • Total number of Tier1s Done: 8/15
    • Total number of Tier2s Done: 35/129
  5. 13/08/28
    • Total number of Tier1s Done: 8/16
    • Total number of Tier2s Done: 43/129
  6. 13/10/13
    • Total number of Tier1s Done: 10/16
    • Total number of Tier2s Done: 82/130
  7. 13/10/29
    • Total number of Tier1s Done: 13/16
    • Total number of Tier2s Done: 101/130
  8. 13/11/01
    • Total number of Tier1s Done: 14/16
    • Total number of Tier2s Done: 111/131
  9. 13/11/07
    • Total number of Tier1s Done: 14/16
    • Total number of Tier2s Done: 112/131
Edit | Attach | Watch | Print version | History: r49 < r48 < r47 < r46 < r45 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r49 - 2013-11-07 - AlessandraForti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback