Elastic Data Analysis Clusters in the Cloud:
Using Scalr for Turnkey Cluster Deployment of Scalable Data Analysis Clusters

V. Hendrix1, D. Benjamin2

1Lawrence Berkeley National Laboratory, Berkeley, CA, USA 2Duke University, Durham, NC, USA

Introduction

Scientists want to spend their time producing science results as quickly as they can. They don't want to spend it performing system administration on the data analysis clusters they are clients of nor do they want to wait a long time for compute resources when a conference deadline is looming. This is exactly what happens to scientists at small institutions and universities. Our current work is an effort to give these scientists an easy and user-transparent way to scale out jobs in the cloud so that they may spend time analyzing data and not doing system administration.

We have previously investigated several tools for contextualizing cluster nodes and managing clusters and have chosen Scalr cloud management tool. Scalr is a commercial software as a service (SaaS) that allows you to launch, auto-scale and configure clusters in the cloud. The commercial product is actively developed and released by Scalr, Inc. under an open source license. Scalr has a robust feature set as well as an active community. It provides customizable scaling algorithms, monitoring of load statistics for single nodes or entire clusters and has both a command-line and a web interface.

This work is an exploration of Scalr open source to see if it will fit the needs of the scientist by enabling us to define an elastic data analysis cluster (E-DAC) in all its complexity without burdening the scientist with it's details. The end result is a turnkey deployment of an E-DAC in the cloud using Scalr open source.

Installing ScalrEDAC for ATLAS

This installation of Scalr for ATLAS is based on the installation notes from the Scalr Wiki site. Please understand that this is a work-in-progress. There are still things we need to tweak such as the Load Statistics visualization. We welcome any feedback you may have.

The current installation only works for configuring ATLAS T3 elastic data analysis clusters (EDAC) on EC2. We are working on adding support for other cloud providers and will update this page as they are added.

This project lives in the cern subversion repository (https://svnweb.cern.ch/cern/wsvn/scalredac). All CERN users should have read access. If you have any problems, please contact me.

There are two major steps for installing and using Scalr.

  1. Create your Amazon Machine Image(s) (AMI).
  2. Install Scalr

ATLAS Amazon Machine Image (AMI)

Create an AMI using boxgrinder:

We use boxgrinder appliances to create AMI's. You need your own amazon account credentials. You can either create an instance store or an EBS-back AMI.

  1. Find a boxgrinder meta-appliance: http://boxgrinder.org/download/boxgrinder-build-meta-appliance/
  2. Start the boxgrinder virtual machine and login
  3. Checkout the boxgrinder appliance definitions:
        $ svn export https://svn.cern.ch/reps/scalredac/tags/scalredac-0.0.1/bxgdr-appl/appliances /root/appliances --username=<cernuserid>
        $ svn export https://svn.cern.ch/reps/scalredac/tags/scalredac-0.0.1/bxgdr-appl/.boxgrinder /root/.boxgrinder --username=<cernuserid>
    
  4. Add your EC2 credentials
    1. Edit /root/.boxgrinder/boxgrinder.config adding your EC2 credentials.
    2. Copy your ec2 public and private key files to the boxgrinder meta appliance. Makesure that the correct path is reflected in /root/.boxgrinder/config
      If you are creating instance store add following the credentials to the "s3" plugin section:
           access_key:                                       # (required)
           secret_access_key:                           # (required)
           bucket:                                               # (required)
           account_number:                              # (required)
           cert_file: /path/to/cert.pem                # required only for ami type
           key_file: /path/to/pk.pem                   # required only for ami type

      If you are creating an EBS-back AMI, you need to be using the an EC2 meta-appliance. Add the following required config parameters to the "ebs" plugin section.
           access_key:                                       # (required)
           secret_access_key:                           # (required)
           bucket:                                               # (required)
           account_number:                               # (required)
      
  5. Now create the appliance. Choose the appropriate delivery option (-d) for either instance stores or ebs-backed AMIs
        $ cd /root/appliances
        $ boxgrinder-build centos5-atlas-t3.appl -p ec2 [ -d ami | -d ebs ]
    
  6. Once the build is finished, make note of the ami-id and you are ready to setup Scalr
    ...
    I, [2012-07-09T14:39:19.449167 #12554]  INFO -- : Doing bundle/snapshot
    I, [2012-07-09T14:39:19.453485 #12554]  INFO -- : Bundling AMI...
    I, [2012-07-09T14:41:05.775048 #12554]  INFO -- : Bundling AMI finished.
    I, [2012-07-09T14:41:05.776484 #12554]  INFO -- : Uploading centos5-atlas-t3 AMI to bucket '<S3-BUCKET>'...
    I, [2012-07-09T14:56:30.511743 #12554]  INFO -- : Image for centos5-atlas-t3 successfully registered under id: <AMI-ID> (region: us-east-1).
    

Scalr Installation

The following operating systems are supported: Centos 6.2 and SLC 5.8. You may choose to either install Scalr on a dedicated physical machine or a virtual machine. In the former case, skip to the step, "Get the installation script" below. Otherwise, follow the above boxgrinder instructions and change step 5 to:
    cd /root/appliances
    boxgrinder-build centos6-scalr.appl -p ec2 [ -d ami | -d ebs ]

Get the Scalr installation script

$ login to your Scalr machine
$ svn export https://svn.cern.ch/reps/scalredac/tags/scalredac-0.0.1/install-scalr.sh --username=<cernuserid>

Run Scalr Installation

The scalr installation script has been installed on fresh OS installations. If you are installing Scalr on a machine that has other dedicated functions, run at your own risk. It is suggested that you review this script before you run it.
$ sh install-scalr.sh
This script will request information from you
  1. Cernvm Proxy URL - This is the url to the cernvm proxy. It will default to your local proxy, if available, which may not be appropriate.
  2. Amazon Machine Image(AMI) ID - This is the ami id of the image you uploaded
  3. AMI Region - This is the region of the AMI you uploaded. (i.e us-east-1)
  4. MySQL password
  5. Information for creating a self-signed server certificate for use with SSL.
    1. Password:
      • You will be requested for a password for the scalr-server.key.
      • You will need to enter it four times: two times to set it, once to use it for a certificate request and once to create a passwordless key for the apache server.
    2. Distinguised Name(DN): You will also need to enter information for your server certificate's DN. Enter the information you think is appropriate making very sure that you get the hostname correct.

Configure primary AWS account

Scalr needs to have AWS access.
  1. Retrieve your EC2 credentials from your account at the Amazon Web Services site.
  2. Copy your EC2 access certificate file to /var/www/scalr/app/etc/
  3. Copy your EC2 private key file to /var/www/scalr/app/etc/

These files will be named cert-XXXXXXXXXXXX.pem and pk-XXXXXXXXXXXX.pem, where XXXXXXXXXXXX is Your Access Key name, which you specify on the "Settings->Core settings" page.

Test the Scalr Installation

To check whether your server meets (most) Scalr system requirements, run the testenvironment.php script in the app/www folder. Assuming you have completed the steps above, this can be accessed as http://your.scalr.domain/testenvironment.php from your web browser. This will find most major issues, although some (missing rewrite rules, missing cron tasks) go undetected.

Configure Scalr

Congratulations! You may now proceed to configuration.
  • Log in as admin / admin
  • Change Admin Password
    • Select admin>profile
    • Change the password
  • Edit Core Settings
    • Select Settings>Core settings
      • In AWS settings>S3cfg template, set your EC2 access_key and secret_key
      • In Application settings
        • Confirm that 'Event handler URL' has your scalr server's correct domain information and is https://
        • Confirm that the 'Statistics URL' has your scalr server's correct domain information and is https://

Create a Scalr Account

  1. Go to Accounts>Manage
  2. Click the green plus sign and enter the appropriate account information

Troubleshooting

boxgrinder-build errors

[Errno 14] HTTP Error 404: Not Found Trying other mirror.

If you get the following error, try re-running the boxgrinder-build command:
I, [2012-07-09T14:17:46.951446 #11384]  INFO -- : Executing post operations after build...
F, [2012-07-09T14:20:14.928373 #11384] FATAL -- : Guestfs::Error: sh: http://download.fedoraproject.org/pub/epel/5/x86_64/repodata/6cc48decc40015a40b9ef60aa98abecc1f1e5438-filelists.sqlite.bz2: [Errno 14] HTTP Error 404: Not Found
Trying other mirror.
Error: failure: repodata/6cc48decc40015a40b9ef60aa98abecc1f1e5438-filelists.sqlite.bz2 from epel: [Errno 256] No more mirrors to try.
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:295:in `sh'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:295:in `sh'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/rpm-based/rpm-based-os-plugin.rb:118:in `execute_post'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/rpm-based/rpm-based-os-plugin.rb:117:in `each'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/rpm-based/rpm-based-os-plugin.rb:117:in `execute_post'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/rpm-based/rpm-based-os-plugin.rb:97:in `build_with_appliance_creator'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/image-helper.rb:130:in `customize'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:176:in `customize'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:123:in `initialize_guestfs'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:169:in `prepare_guestfs'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:123:in `initialize_guestfs'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:77:in `log_callback'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:123:in `initialize_guestfs'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/guestfs-helper.rb:173:in `customize'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/helpers/image-helper.rb:129:in `customize'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/rpm-based/rpm-based-os-plugin.rb:76:in `build_with_appliance_creator'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/rhel/rhel-plugin.rb:33:in `build_rhel'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/os/centos/centos-plugin.rb:44:in `execute'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/base-plugin.rb:172:in `run'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:184:in `execute_plugin'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:207:in `execute_without_userchange'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:129:in `execute_plugin_chain'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:125:in `each'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:125:in `execute_plugin_chain'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:164:in `create'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/bin/boxgrinder-build:203
/usr/bin/boxgrinder-build:19:in `load'
/usr/bin/boxgrinder-build:19

ERROR with Amazon Account number

I was able to get this to work by using the account number with dashes

I, [2012-07-09T16:15:13.331212 #8270]  INFO -- : Validating appliance definition from centos5-atlas-t3.appl file...
I, [2012-07-09T16:15:13.349843 #8270]  INFO -- : Appliance definition is valid.
I, [2012-07-09T16:15:13.352360 #8270]  INFO -- : Validating appliance definition from ./centos5-base.appl file...
I, [2012-07-09T16:15:13.361254 #8270]  INFO -- : Appliance definition is valid.
F, [2012-07-09T16:15:13.498129 #8270] FATAL -- : BoxGrinder::PluginValidationError: Please specify a valid 'account_number' key in BoxGrinder configuration file: '/root/.boxgrinder/config' or use CLI '--delivery-config account_number:DATA' argument. See http://boxgrinder.org/tutorials/boxgrinder-build-plugins/#EBS_Delivery_Plugin for more info
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/base-plugin.rb:152:in `validate_plugin_config'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/base-plugin.rb:151:in `each'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/base-plugin.rb:151:in `validate_plugin_config'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/plugins/delivery/ebs/ebs-plugin.rb:64:in `validate'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:103:in `send'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:103:in `initialize_plugin'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:102:in `each'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:102:in `initialize_plugin'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:80:in `initialize_plugins'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/lib/boxgrinder-build/appliance.rb:160:in `create'
/usr/lib/ruby/gems/1.8/gems/boxgrinder-build-0.10.2/bin/boxgrinder-build:203
/usr/bin/boxgrinder-build:19:in `load'
/usr/bin/boxgrinder-build:19

Server Farms

Condor Cluster

Configure a Condor Cluster

  1. Login as a the Scalr user you created previously
  2. Add an EC2 environment (see How to add AWS to Scalr instructions on the Scalr Wiki)
  3. Create a Server Farm
    1. Select Server Farms->Build New
    2. Farm Tab
      • name: Condor
      • desription: A simple Condor cluster.
    3. Roles Tab: We will be adding three roles: CondorHead, CondorWorker and CondorInteractive
      • Click the green plus sign to add the roles
      • Select Base images
      • Add the roles in the following order.
        1. CondorHead: Select the role after you add it. You should now see a list of buttons stacked vertically
          • Scaling options: Maximum Instances=1
          • Placement and type: Select your preferred EC2 instance type.
        2. CondorWorker: Select the role after you add it. You should now see a list of buttons stacked vertically
          • Scaling options: Minimum Instances=N, Maximum Instances=N (select the number of worker nodes you prefer)
          • Placement and type: Select your preferred EC2 instance type.
        3. CondorInteractive: Select the role after you add it. You should now see a list of buttons stacked vertically
          • Scaling options: Maximum Instances=1
          • Placement and type: Select your preferred EC2 instance type.
        4. Click Save
    4. Start the Condor Server Farm
      • Go to Server Farms->View all
      • Find the Server Farm you just created
      • Select Options->Launch and now wait for you server farm to come up

Run Condor Jobs

Condor jobs can be run from the CondorInteractive node as the atlasadmin user.

You first need to download the ssh keys from Scalr.

  • Go to Server Farms->Condor->Roles
  • Select Options->Download SSH private key

Now you can, login to the interactive node and run jobs as atlasadmin

$ ssh -i /path/to/private.pem ec2-user@ip_address_of_interactive_node
$ sudo bash 
$ su - atlasadmin
$ condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

ip-NN-NN-NN-NN.e LINUX      X86_64 Unclaimed Benchmar 0.540   615  0+00:00:04
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     1     0       0         1       0          0        0

               Total     1     0       0         1       0          0        0
$ condor_submit job.sub 
Submitting job(s)........
8 job(s) submitted to cluster 1.
$ condor_q

-- Submitter: ip-NN-NN-NN-NN.ec2.internal : <NN.NN.NN.NN:9764> :ip-NN-NN-NN-NN.ec2.internal
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   atlasadmin      7/10 15:13   0+00:00:00 R  0   0.0  sleep 30s         
   1.1   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         
   1.2   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         
   1.3   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         
   1.4   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         
   1.5   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         
   1.6   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         
   1.7   atlasadmin      7/10 15:13   0+00:00:00 I  0   0.0  sleep 30s         

8 jobs; 0 completed, 0 removed, 8 idle, 0 running, 0 held, 0 suspended
$ 


Progress & Future Plans

Slides

Scalr Roles

We plan to add the following Scalr roles
  • PanDA
  • PROOF

Cloud Providers

We are investigating the use of these additional cloud platforms
  • Cloudstack
  • Openstack


Major updates:
-- ValHendrix - 26-Jun-2012

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2012-07-12 - ValHendrix
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback