Back to LibrarianIssues -- SPI Workbook

This page is deprecated. Please have a look at http://sftweb.cern.ch/spi/

Replicating AFS Volumes

(The following is also available as PDF file)

General info

Contact: Rainer Toebicke, Bernard Antoine

Why creating replicas

Any access to a file causes some information transferred between the client machine and the AFS server in question. Part of this information exchange is kept on the server as a thread to call-back the client in case there is a change in the file (or directory). The number of call backs is limited (about 40k), but since a call-back is established for each file, each directory and each client, this pool can be very quickly exhausted, for example if a lot of jobs start on the batch farms accessing the LCG AA s/w.

A volume (not directory) can be "replicated" in which case the volume is made "readonly" and the number of these call-backs is reduced to one per volume and client. The first replica of a volume is kept on the same server as the original volume, only the layout structure of the directories and files ("vnodes", the AFS equivalent of the unix inodes) is duplicated. This way, creating a replica takes usually a short time (we have seen 5-200 sec, the bulk being less than 10 sec). Any further replica is stored on another server and all the data will be copied over to the new server (which will obviously take significant more time).

Only if a file or directory on a replicated volume is acutally modified, the copying (cloning) of the file is triggered. The read-only replica shows the old file, only in the r/w path the changes are visible. The changes then need to be synchronized to all replicas.

AFS volumes and mount points

AFS deals with individual volumes which are mounted somewhere in a directory tree. Principally any volume can be mounted at any given place in the tree (provided the directory does not previously exist). The directory under which the volume is mounted is called a "mount point", for the rest of this document, we will call the directory in which the volume is mounted, the "parent directory".

Example:

"/afs/cern.ch/sw/lcg/app/releases/SEAL" is a mount point for the volume p.sw.lcg.seal inside the parent directory "/afs/cern.ch/sw/lcg/app/releases/".

Accessing directories and volumes

There are always two ways to access any given directory in the AFS tree:

  • a) through "/afs/cern.ch/sw/lcg/app/releases/SEAL" (the ususal way)
  • b) through "/afs/.cern.ch/sw/lcg/app/releases/SEAL" (note the "." in front of the "cern.ch" !)

The version b) is called the "r/w path". Any directory accessed this way will be accessed in r/w mode (of course the usual AFS ACL entries control also here the access). If you do not have any replicas in your tree, then this is exactly the same as accessing the directory/volume through a). Accessing a directory with a) will follow the r/o path as far as possible, once the traversal finds a non-replicated (r/w) volume, the path will be changed automatically to the r/w path and the system will stay on this (the r/w) path from there on. This way, all volumes encountered to a volume/mount-point which is intended to be readonly/replicated need to be readonly/replicated.

Creating replicas

To create a replica of a volume, simply issue the command: afs_admin create_replica -p ' ' (the -p ' ' is needed to distinguish from an older version of the same command with different semantics). The system will trigger the replication and issue the command to synchronize the replica. A typical output looks like

apiwgs01:lcg > afs_admin create_replica -p ' ' p.sw.lcg.seal  
afs_admin: Warning: no R/O volume yet at afs53/a (R/W site)
         Ignoring explicitly specified pool/server/partition
   Adding a new site ... done 
Added replication site afs53 /vicepa for volume p.sw.lcg.seal 
p.sw.lcg.seal
      RWrite: 537293618     Backup: 537293620
      number of sites -> 2
        server afs53.cern.ch partition /vicepa RW Site 
        server afs53.cern.ch partition /vicepa RO Site  -- Not released
This is a complete release of volume 537293618 
Cloning RW volume 4290704928 to permanent RO... done 
Getting status of RW volume 537293619... done 
Ending cloning transaction on RW volume 537293619... done 
Starting transaction on RO clone volume 537293619... done 
Setting volume flags for volume 537293619... done 
Ending transaction on volume 537293619... done 
Replacing VLDB entry for p.sw.lcg.seal... done 
Starting transaction on cloned volume 537293619... done 
updating VLDB ... done 
Released volume p.sw.lcg.seal successfully 

If you then do a "fs lq " on the mount-point, you will see that the volume name now has a ".readonly" appendix, this is the indication that the volume is a replica:

pfeiffer > fs lq /afs/cern.ch/sw/lcg/app/releases/SEAL/
Volume Name                   Quota      Used %Used   Partition 
p.sw.lcg.seal.readonly      5500000   5476036  100%       26%    WARNING 

Modifying replicated volumes

You can always add new volumes using the r/w path (option b) into the system, they will immediately appear in r/w mode and you can use them directly (in r/w mode until you replicate these volumes):

afs_admin create -q 1000000 /afs/.cern.ch/sw/external/gccxml/0.6.0_patch2 p.sw.lcg.gccxml060

(note that /afs/cern.ch/sw/external/gccxml/0.6.0_patch2 would not work as this is a readonly path).

This new volume is then available in read-write mode also with the /afs/cern.ch/... path until you create a replica using afs_admin create_replica p.sw.lcg.gccxml060

You can add/remove/change the content of a volume using the r/w path (/afs/.cern.ch/...). After you're done with your changes, simply do:

afs_admin vos_release

(here is the volume in r/w mode, i.e. without the ".readonly" extension

Short info

Contact: Rainer Toebicke, Bernard Antoine

Creating replicas on same server just duplicates inodes. Then stat commands (most requests) are answered faster. But if something is changed in a volume and the replica is not updated, the replica is the physical copy of the former state.

History: replicas have created for externals and AA projects in march 24 2005

Readonly and r/w volumes:
  • /afs/cern.ch : replica volume read only
  • /afs/.cern.ch : r/w volumes

Creation of replicas: once!
afs_admin create_replica < volume > 
ex: afs_admin create_replica p.sw.lcg.root

Create a replica if this volume has many hits (check info on http://pclella.cern.ch/ choose afs53 )

Update replicas:
Update with "vos_release" after an update/remove/change of file in a volume
afs_admin vos_release < volume >|< dir > 
ex: afs_admin vos_release p.sw.lcg.root

Deletion of replicas:
afs_admin delete_replica < volume > 
ex: afs_admin delete_replica p.sw.lcg.root

Once the last replica is deleted, the volume is available in r/w mode through the /afs/cern.ch/... tree.

Check status:

Not all volumes have a replica, check the list with:
  • afs_admin l_p swlcg
  • fs lq . 
    will show "readonly" or not
ex: p.sw.lcg.*(.readonly)

Rule of naming:

p.sw.lcg.nnnnnnvvvpp where :
  • 6 char for package (if name is longer, remove all vowels)
  • 3 char for version (remove ".", '-' and/or "_" if existing, change "patch" to "p")
  • 2 char for platform (with a coding rule to be defined)

For new versions of packages (or new packages of course) we will create a new volume without the platform part and "fill" it with all the actual platforms.

If new a new platform for an existing package/version exists already, a new volume for the new platform is created.

If the version number after the removal of the separators is still longer than three chars, a mapping will be established for this package and documented in this place.

Edit | Attach | Watch | Print version | History: r31 < r30 < r29 < r28 < r27 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r31 - 2012-09-12 - AntonKarneyeu
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    SPI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback