Planned Significant LHC VomsCern Interventions
Migration from SL4 based voms to SL5 Tuesday May 10th 2011.
Status
The current status is confirmed and this will go ahead. It is being annouced in the following places.
- CERN C5 report
- GOCDB
- 3.00 WLCG operations meeting.
- Email to vo-managers advising they contact their members if they wish to do so.
Items to do Now
- Declare this as happening at https://indico.cern.ch/conferenceDisplay.py?confId=136139
- Check physics database team is okay with the date and ask for snapshots now - mail sent and they have okayed.
- Check netops folk are okay with the date to do the alias switch, RQF:0009045
submitted.
- Liase with OSG to check voms-admin replication and there services. e.g. GUMS.
Timescale
A complete migration of the LHC voms service on voms.cern.ch and lcg-voms.cern.ch will happen on
Tuesday May 10th 2011 inline with a planned technical stop of the LHC.
This migration will start from a user perspective around 09:00.
A downtime will be declared until 15:00 but it is expected that the service will return easily during the morning. In particular
voms-proxy-init commands should be operational easily during the morning.
It will affect the following VOs.
- alice
- cms
- geant4
- na48
- test
- unosat
- vo.sixt.cern.ch
- atlas
- envirogrids.vo.eu-egee.org
- lhcb
- ops
- vo.gear.cern.ch
After the intervention VOs will continue to use VOMRS as their primary interface. The removal of vomrs will
happen at some later date.
Comments
The upgrade of voms software from SL4 to SL5 includes a database schema update. It was suggested and it would have
been beneficial to all to migrate VOs one at a time. While this has been strongly investigated it has proved to be impossible
to find a strategy where this might be done. As such all VOs will migrated in one step with extensive testing made
on a parallel service in advance.
Downtime and Failover to BNL or Fermi
ATLAS and to a lesser extent CMS have alternate
VOMS servers within OSG. Care will be taken to ensure that while CERN voms server
is in intervention that it is down and really down and not accepting tcp connections.
Concerning the replication itself to these foreign voms servers I will contact the admins of these services now.
Expected Changes in Behavior.
I consider these to be all cosmetic:
- Calling voms-proxy-init against a VO to which you are not a member results in a rather verbose error message. The message is clear but it is quite ugly.
- The voms-admin root url e.g. https://voms.cern.ch:8443/voms/alice does not change however some internal structure beyond this URL does change. However all the programatic URLs do not change.
Testing the SL5 Service Now.
An SL5
VOMS service are already running and in near production.
On Tuesday Apr 27th a complete snapshot was taken of the current voms and vomrs database.
The SL5
VOMS service is now pointing at this snapshot.
Via configuration of clients and using alternate endpoints in particular it is possible to test all of the SL5 based
service to some extent.
During this phase of course modifications made within this SL5 service will not propagate to the production databases.
In order to test voms private copies of the voms-client configuration files should be created. e.g for VO alice.
$ mkdir ~/testvoms
$ cp $GLITE_LOCATION/etc/vomses/alice* ~/testvoms
For each of the files now in
~/testvoms edit the second argument within the file so as to contact
voms5.cern.ch or
lcg5.cern.ch in
place of
voms.cern.ch or
lcg-voms.cern.ch respectively.
i.e the files should like this.
$ cat ~/testvoms/alice-voms.cern.ch
"alice" "voms5.cern.ch" "15000" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "alice" "24"
$ cat ~/testvoms/alice-lcg-voms.cern.ch
"alice" "lcg5.cern.ch" "15000" "/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch" "alice" "24"
Finally generate your proxy using this custom configuration adding any extra flags that you normally add.
Destroy your existing proxy first to be positive you have a new SL5 proxy.
$ voms-proxy-destroy
$ voms-proxy-info
Couldn't find a valid proxy.
$ voms-proxy-init --vomses ~/testvoms --voms alice
Enter GRID pass phrase:
Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
Creating temporary proxy ............................................................................. Done
Contacting lcg5.cern.ch:15000 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "alice" Done
Creating proxy ..................................................................................... Done
Your proxy is valid until Thu Apr 28 23:17:05 2011
$ voms-proxy-info -all
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen/CN=proxy
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
type : proxy
strength : 1024 bits
path : /tmp/x509up_u16568
timeleft : 11:56:21
=== VO vo.aleph.cern.ch extension information ===
VO : vo.aleph.cern.ch
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=straylen/CN=613539/CN=Steve Traylen
issuer : /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch
attribute : /vo.aleph.cern.ch/Role=NULL/Capability=NULL
timeleft : 11:56:21
uri : voms.cern.ch:15013
Testing VOMS-Admin
The voms-admin can be contacted on the hostname
voms5.cern.ch. When connecting your web browser
will present a certificate warning that the host you are contacting
voms5.cern.ch is identifying itself as
voms.cern.ch. This will of course not be the case after migration.
Contact the SL5 instance of voms admin on the following URLs:
Testing VOMRS
Again connecting to lcg5 will give a certificate warning in your browser that you are really connecting to the host certificate lcg-voms.cern.ch.
This is course will be corrected along with the full migration to the SL5 service.
Note that on Thurday April 28th the backend vomrs daemon is not running so synchronisation will not occur. This will be corrected very shortly.
- Note that VOMRS has been configured to not send email since any emails sent by it will cause to much confusion to users in the wider world.
- There is no vomrs service for aleph, delphi, l3 or opal since they do not use vomrs at all.
Impact on other VOs , i.e. the LEP ones.
The LEP VOs, vo.aleph.cern.ch, vo.l3.cern.ch, vo.delphi.cern.ch and vo.opal.cern.ch are already running the SL5
service. Following the migration they will experience a single change from their perspective. Contacting
https://voms-admin.cern.ch:8443/voms/vo.aleph.cern.ch
with a web browser will result in certificate<->hostname
mismatch warning. Reconnecting on
http://voms.cern.ch:8443
.... will correct this.
The LEP VOs will almost certainly experience some disruption to their during the service migration of LHC VOs to the SL5
service, this disruption will be finished at the same time as the LHC VOs.
The LEP VOs will continue to use voms-admin only and they will never use vomrs.
Detailed Steps for VOMS Service Manager
All times here are CEST, i.e CERN time.
09:00 Service Stop
Connections to the voms and vomrs databases will be closed completely. As such the complete
VOMS service
will be unavailable.
09:15 Database Backup
The physics database group will be informed to make a complete back of the existing databases in case roll
back is required. The list of database to snapshot are. The owner accounts will be unlocked at this time also.
The tabel below details the production accounts to be backuped. Also detailed are the snapshot names we will use
during the migration phase.
Production Account |
Snapshot Name |
lcg_voms_alice |
lcg_voms_validation_1 |
lcg_voms_atlas |
lcg_voms_validation_2 |
lcg_voms_cms |
lcg_voms_validation_3 |
lcg_voms_enviro |
lcg_voms_validation_4 |
lcg_voms_geant4 |
lcg_voms_validation_5 |
lcg_voms_lhcb |
lcg_voms_validation_6 |
lcg_voms_na48 |
lcg_voms_validation_7 |
lcg_voms_ops |
lcg_voms_validation_8 |
lcg_voms_test |
lcg_voms_validation_9 |
lcg_voms_unosat |
lcg_voms_validation_10 |
lcg_voms_gear |
lcg_voms_validation_11 |
lcg_voms_sixt |
lcg_voms_validation_12 |
lcg_vomrs_alice |
lcg_vomrs_validation_1 |
lcg_vomrs_atlas |
lcg_vomrs_validation_2 |
lcg_vomrs_cms |
lcg_vomrs_validation_3 |
lcg_vomrs_enviro |
lcg_vomrs_validation_4 |
lcg_vomrs_geant4 |
lcg_vomrs_validation_5 |
lcg_vomrs_lhcb |
lcg_vomrs_validation_6 |
lcg_vomrs_na48 |
lcg_vomrs_validation_7 |
lcg_vomrs_ops |
lcg_vomrs_validation_8 |
lcg_vomrs_test |
lcg_vomrs_validation_9 |
lcg_vomrs_unosat |
lcg_vomrs_validation_10 |
lcg_vomrs_gear |
lcg_vomrs_validation_11 |
lcg_vomrs_sixt |
lcg_vomrs_validation_12 |
10:00 Switch SL5 VOMS service to production databases.
The SL5
VomsCern service will be switch from looking at the copies of production database and
instead look at the poduction databases.
10:00 Switch Aliases
Cern network operations will be requested to move the following aliases.
10:00 Certificate Change
The voms.cern.ch certificate will be copied from the SL4 service to the SL5 service where
appropriate. This will break as below the
voms-admin alias as below for the LEP VOs.
10:30 Schema Update
The schemas will be updated to work with the SL5 service.
11:00 Service Partially Reopened.
By 11:00 the service voms and vom-admin will be open for business. The vomrs web service
will be open also however no changes to to vomrs will be implemented in voms yet. It is probably
best to avoid making changes till later just to avoid confusion.
11:30 Vomrs into send mode.
Switch vomrs from
nosend mode to
send mode on the new SL5 service. Start vomrs processing. This
will also process any backlog from the previous step.
13:30 SLS Configuration
The SLS view of
VOMS will of course be disrupted during the intervention. Some manual changes will be
required to point SLS at the new SL5 service since SLS verifies the service by talking directly to backend hostnames.
15:00 End
This is the latest time by which I expect everything to be completed.
16:00 De configure SL4 service.
The SL4 service nodes will not be removed for a month or least but they must be configured not
to jump back into vomrs action upon reboot or anything.
Following Days
- Verify lemon configuration for voms that it is monitoring something sensible.
- Reinstall the SL5 voms service partially from scratch to check it still works following the migration.
- Update the pages on VomsCern which detail the machine layout of the service.
- The operator procedures refer to explicit hostnames and need correcting.
Following Weeks
- Destroy the SL4 service, i.e retire the machines.
Manuel Steps
The SL5 installs requires some post install manual steps. These will be fixed or
worked into the documentation afterwards:
- Add rm -f /etc/nologin to /etc/rc.local - GGUS:69714
- Remove new-format CRLs properly and install old-format ones - BUG:78349
- Reboot one more time.
--
SteveTraylen - 19-Apr-2011