Recovering Master Node



What to do in case of MASTER (lcg-sft-publish.cern.ch or lxb2001) crash?

The current SLAVE machine should be reconfigured to become a new MASTER, and on the crashed machine install SLAVE.

  1. Stop DB replication on current SLAVE machine (as root):
    $ mysql -p<root_passwd>
    mysql> stop slave;
        
  2. Stop info directory on current SLAVE machine by removing /etc/cron.d/lcg-sft-syncInfo file.
  3. Configure MASTER on current SLAVE machine using master specific instructions in sft-server-installation.txt and configuration files in conf/master.tar
  4. Remove lcg-sft-publish alias for crashed MASTER machine (lxb2001) by updating its network data in: http://network.cern.ch/sc/fcgi/sc.fcgi?Action=SelectForUpdate
  5. Assign lcg-sft-publish alias for current SLAVE machine (lxb2089) by updating its network data. From this time the current SLAVE machine becomes new MASTER
  6. Install new SLAVE SFT server on old MASTER machine using instructions in InstallServices
    Note: Please make sure that during the configuration you set up DB replication and info directory synchronisation properly.
  7. Finish the configuration of SLAVE SFT server using config files in conf/slave.tar
  8. Check if SLAVE is working properly using instructions in wiki entry CheckingSlave
  9. Move lcg-sft alias from new MASTER to newly installed SLAVE



-- Main.jnovak - 17 Aug 2005
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2005-08-18 - JuliaAnnGraySecondary
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback