Planned Significant LHC VomsCern Interventions

Migration from SL4 based voms to SL5 Tuesday May 10th 2011.

Status

The current status is confirmed and this will go ahead. It is being annouced in the following places.
  • CERN C5 report
  • GOCDB
  • 3.00 WLCG operations meeting.
  • Email to vo-managers advising they contact their members if they wish to do so.

Items to do Now

  • Declare this as happening at https://indico.cern.ch/conferenceDisplay.py?confId=136139
  • Check physics database team is okay with the date and ask for snapshots now - mail sent and they have okayed.
  • Check netops folk are okay with the date to do the alias switch, RQF:0009045 submitted.
  • Liase with OSG to check voms-admin replication and there services. e.g. GUMS.

Timescale

A complete migration of the LHC voms service on voms.cern.ch and lcg-voms.cern.ch will happen on Tuesday May 10th 2011 inline with a planned technical stop of the LHC.

This migration will start from a user perspective around 09:00.

A downtime will be declared until 15:00 but it is expected that the service will return easily during the morning. In particular voms-proxy-init commands should be operational easily during the morning.

It will affect the following VOs.

  • alice
  • cms
  • geant4
  • na48
  • test
  • unosat
  • vo.sixt.cern.ch
  • atlas
  • envirogrids.vo.eu-egee.org
  • lhcb
  • ops
  • vo.gear.cern.ch

After the intervention VOs will continue to use VOMRS as their primary interface. The removal of vomrs will happen at some later date.

Comments

The upgrade of voms software from SL4 to SL5 includes a database schema update. It was suggested and it would have been beneficial to all to migrate VOs one at a time. While this has been strongly investigated it has proved to be impossible to find a strategy where this might be done. As such all VOs will migrated in one step with extensive testing made on a parallel service in advance.

Downtime and Failover to BNL or Fermi

ATLAS and to a lesser extent CMS have alternate VOMS servers within OSG. Care will be taken to ensure that while CERN voms server is in intervention that it is down and really down and not accepting tcp connections. Concerning the replication itself to these foreign voms servers I will contact the admins of these services now.

Expected Changes in Behavior.

I consider these to be all cosmetic:
  • Calling voms-proxy-init against a VO to which you are not a member results in a rather verbose error message. The message is clear but it is quite ugly.
  • The voms-admin root url e.g. https://voms.cern.ch:8443/voms/alice does not change however some internal structure beyond this URL does change. However all the programatic URLs do not change.

Testing the SL5 Service Now.

An SL5 VOMS service are already running and in near production.

On Tuesday Apr 27th a complete snapshot was taken of the current voms and vomrs database.

The SL5 VOMS service is now pointing at this snapshot.

Via configuration of clients and using alternate endpoints in particular it is possible to test all of the SL5 based service to some extent.

During this phase of course modifications made within this SL5 service will not propagate to the production databases.

Testing VOMS

In order to test voms private copies of the voms-client configuration files should be created. e.g for VO alice.

$ mkdir ~/testvoms
$ cp $GLITE_LOCATION/etc/vomses/alice* ~/testvoms

For each of the files now in ~/testvoms edit the second argument within the file so as to contact voms5.cern.ch or lcg5.cern.ch in place of voms.cern.ch or lcg-voms.cern.ch respectively.

i.e the files should like this.

$ cat ~/testvoms/alice-voms.cern.ch 
"alice" "voms5.cern.ch" "15000" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "alice" "24"

$ cat ~/testvoms/alice-lcg-voms.cern.ch 
"alice" "lcg5.cern.ch" "15000" "/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch" "alice" "24"

Finally generate your proxy using this custom configuration adding any extra flags that you normally add. Destroy your existing proxy first to be positive you have a new SL5 proxy.

$ voms-proxy-destroy
$ voms-proxy-info
Couldn't find a valid proxy.
$ voms-proxy-init --vomses ~/testvoms --voms alice
Enter GRID pass phrase:
Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
Creating temporary proxy ............................................................................. Done
Contacting  lcg5.cern.ch:15000 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "alice" Done
Creating proxy ..................................................................................... Done
Your proxy is valid until Thu Apr 28 23:17:05 2011

$ voms-proxy-info -all
subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen/CN=proxy
issuer    : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
identity  : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u16568
timeleft  : 11:56:21
=== VO vo.aleph.cern.ch extension information ===
VO        : vo.aleph.cern.ch
subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
issuer    : /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch
attribute : /vo.aleph.cern.ch/Role=NULL/Capability=NULL
timeleft  : 11:56:21
uri       : voms.cern.ch:15013

Testing VOMS-Admin

The voms-admin can be contacted on the hostname voms5.cern.ch. When connecting your web browser will present a certificate warning that the host you are contacting voms5.cern.ch is identifying itself as voms.cern.ch. This will of course not be the case after migration.

Contact the SL5 instance of voms admin on the following URLs:

Testing VOMRS

Again connecting to lcg5 will give a certificate warning in your browser that you are really connecting to the host certificate lcg-voms.cern.ch. This is course will be corrected along with the full migration to the SL5 service.

Note that on Thurday April 28th the backend vomrs daemon is not running so synchronisation will not occur. This will be corrected very shortly.

  • Note that VOMRS has been configured to not send email since any emails sent by it will cause to much confusion to users in the wider world.
  • There is no vomrs service for aleph, delphi, l3 or opal since they do not use vomrs at all.

Impact on other VOs , i.e. the LEP ones.

The LEP VOs, vo.aleph.cern.ch, vo.l3.cern.ch, vo.delphi.cern.ch and vo.opal.cern.ch are already running the SL5 service. Following the migration they will experience a single change from their perspective. Contacting https://voms-admin.cern.ch:8443/voms/vo.aleph.cern.ch with a web browser will result in certificate<->hostname mismatch warning. Reconnecting on http://voms.cern.ch:8443.... will correct this.

The LEP VOs will almost certainly experience some disruption to their during the service migration of LHC VOs to the SL5 service, this disruption will be finished at the same time as the LHC VOs.

The LEP VOs will continue to use voms-admin only and they will never use vomrs.

Detailed Steps for VOMS Service Manager

All times here are CEST, i.e CERN time.

09:00 Service Stop

Connections to the voms and vomrs databases will be closed completely. As such the complete VOMS service will be unavailable.

09:15 Database Backup

The physics database group will be informed to make a complete back of the existing databases in case roll back is required. The list of database to snapshot are. The owner accounts will be unlocked at this time also.

The tabel below details the production accounts to be backuped. Also detailed are the snapshot names we will use during the migration phase.

Production Account Snapshot Name
lcg_voms_alice lcg_voms_validation_1
lcg_voms_atlas lcg_voms_validation_2
lcg_voms_cms lcg_voms_validation_3
lcg_voms_enviro lcg_voms_validation_4
lcg_voms_geant4 lcg_voms_validation_5
lcg_voms_lhcb lcg_voms_validation_6
lcg_voms_na48 lcg_voms_validation_7
lcg_voms_ops lcg_voms_validation_8
lcg_voms_test lcg_voms_validation_9
lcg_voms_unosat lcg_voms_validation_10
lcg_voms_gear lcg_voms_validation_11
lcg_voms_sixt lcg_voms_validation_12
lcg_vomrs_alice lcg_vomrs_validation_1
lcg_vomrs_atlas lcg_vomrs_validation_2
lcg_vomrs_cms lcg_vomrs_validation_3
lcg_vomrs_enviro lcg_vomrs_validation_4
lcg_vomrs_geant4 lcg_vomrs_validation_5
lcg_vomrs_lhcb lcg_vomrs_validation_6
lcg_vomrs_na48 lcg_vomrs_validation_7
lcg_vomrs_ops lcg_vomrs_validation_8
lcg_vomrs_test lcg_vomrs_validation_9
lcg_vomrs_unosat lcg_vomrs_validation_10
lcg_vomrs_gear lcg_vomrs_validation_11
lcg_vomrs_sixt lcg_vomrs_validation_12

10:00 Switch SL5 VOMS service to production databases.

The SL5 VomsCern service will be switch from looking at the copies of production database and instead look at the poduction databases.

10:00 Switch Aliases

Cern network operations will be requested to move the following aliases.

Alias Pre Migration Post Migration
lcg-voms prod-voms lcg5
voms vomslb voms5

10:00 Certificate Change

The voms.cern.ch certificate will be copied from the SL4 service to the SL5 service where appropriate. This will break as below the voms-admin alias as below for the LEP VOs.

10:30 Schema Update

The schemas will be updated to work with the SL5 service.

11:00 Service Partially Reopened.

By 11:00 the service voms and vom-admin will be open for business. The vomrs web service will be open also however no changes to to vomrs will be implemented in voms yet. It is probably best to avoid making changes till later just to avoid confusion.

11:30 Vomrs into send mode.

Switch vomrs from nosend mode to send mode on the new SL5 service. Start vomrs processing. This will also process any backlog from the previous step.

13:30 SLS Configuration

The SLS view of VOMS will of course be disrupted during the intervention. Some manual changes will be required to point SLS at the new SL5 service since SLS verifies the service by talking directly to backend hostnames.

15:00 End

This is the latest time by which I expect everything to be completed.

16:00 De configure SL4 service.

The SL4 service nodes will not be removed for a month or least but they must be configured not to jump back into vomrs action upon reboot or anything.

Following Days

  • Verify lemon configuration for voms that it is monitoring something sensible.
  • Reinstall the SL5 voms service partially from scratch to check it still works following the migration.
  • Update the pages on VomsCern which detail the machine layout of the service.
  • The operator procedures refer to explicit hostnames and need correcting.

Following Weeks

  • Destroy the SL4 service, i.e retire the machines.

Manuel Steps

The SL5 installs requires some post install manual steps. These will be fixed or worked into the documentation afterwards:
  • Add rm -f /etc/nologin to /etc/rc.local - GGUS:69714
  • Remove new-format CRLs properly and install old-format ones - BUG:78349
  • Reboot one more time.

-- SteveTraylen - 19-Apr-2011

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2013-12-18 - AlbertoRodriguezPeon
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback