HammerCloud
The quick user guide for HammerCloud. All you need to use it !
On this page:
1. Introduction
HammerCloud is a Distributed Analysis testing system. It can test your site(s) and report the results obtained
on that test. Used to perform basic site validation, help commission new sites, evaluate SW changes, compare
site performances...
You can find it here:
HammerCloud
in two flavours,
ATLAS
and
CMS (under development)
.
If you have any comment or question about the project, there is a Savannah project where you can do it. Find it
here
.
2. Snooping HammerCloud
Take 5 minutes to explore the
site
.
3. Tests
Tests are the human readable part of HammerCloud. In a test instance are defined all necessary parameters to submit
jobs to the
GRID. The basic parameters are:
- id: unique among tests.
- startime: when does the test start.
- endtime: when does the test end.
- state: state of the test. List of possible states below.
- host: which machine is submitting the jobs.
- sites: where are the jobs sent.
HammerCloud provides users with two different test types, stress and functional. Both have an special purpose,
but the main objective of HammerCloud stays intact.
Both tests provide metrics and results withdrawn from the submission. On it's way
to improvement, HammerCloud has become a tool for testing, analyzing, evaluating and comparing sites with a few clicks.
If you want to know more about the two different flavors, keep on this chapter. Otherwise, jump to
next chapter.
3.1. Stress Test
It's a Distributed Analysis
On-Demand test. It's a large-scale stress tests using real analysis jobs to test one or many sites simultaneously.
This can be seen as a brute force test.
A list of possible usages:
- Help comission new sites
- Evaluate changes to site infrastructure
- Evaluate SW changes
- Compare site performances
3.2. Functional Test
Functional tests perform basic site validation with the submission of few jobs frequently during large periods of time. This can be seen as a "ping" test.
A list of possible usages:
4. The first test
4.1. Getting an account
If you want to send tests you need a HammerCloud account. Please put in contact through the
Savannah project page
establishing the VO where you want to send tests.
4.2. Creating your first test
Access the HammerCloud page corresponding your VO (
http://voatlas49.cern.ch/###_VO_###
) and log in the Administration page ( Obvious, isn't it ?)
To create a test, select the desired start and endtimes, and the test_template needed. You can choose any stress template, and only deactivated functional templates.
If you need more details about the test templates, please read
here first and if your question has not been answered there, please ask a HammerCloud operator.
Save, it and you will get your test created.
4.2 Modifying your first test
All parameters from the template have been copied into your first test. Note that you might have not enough permissions to click all links ( you cannot modify neither a template nor files).
You can add the sites or clouds you desire to your test, and configure the
submission algorithm. Leave as follows for every site or cloud you add:
- Resubmit enabled: checked.
- Resubmit force: unchecked.
- Num datasets per bulk: 1.
- Min queue depth: 5.
- Max running jobs: 5.
You can add hosts, dspatterns and users as well. But it will be explained in further chapters.
Anyway, add your user to the users list. That's a good practice.
Two more steps, and your test will start running.
Save your modifications and in the tests list check yours. Find in the actions selectable list
'Send selected tests for approval' and click
'Go'. That's all !. A HammerCloud operator
will check it and aprove/modify or reject it.
Go to the main page, select your test. Maybe you want to read the chapter
metrics.
5. Test Templates
A Test Template contains all the information needed to make the HammerCloud logic behind work. Some of the parameters are completely orthogonal and have no dependencies between each other, but others are coupled and A without B will not work. That's why test templates are configured forehand. Let's see what's on a test template.
5.1. Parameters
Some of the parameters are common for every VO. They are the skeleton of the template, and loosely coupled. Others are not... The VO dependent parameters are highly coupled and must be explained under different points of view depending on the VO.
5.1.1. Common
- Type information
- category: either stress of functional.
- description: probably the most important field. This is what users see when selecting a test template. The more accurate, the better.
- period: deprecated.
- lifetime: only meaningful for active functional templates. For how long the test will run ( in days).
- active: only meaningful for functional templates. If it is active, a functional test will be started by the HammerCloud robot using this template if there is no other test running with this *template. If it is not active, users can take as a normal stress template. The lifetime parameter will be ignored and the user will be able to setup a start and endtime.
- Files
- test script: the main HammerCloud script. Mainly used for development. Nothing to worry about.
- gangabin: the Ganga version used to submit the jobs.
- extraargs: probably empty. Some jobs need extra parameters, here is where them are added.
- Hosts
- Clouds
- Sites
- Dspatterns
- Users
5.1.2. ATLAS
[comming soon]
5.1.3. CMS
- Files
- Jobtemplate: file used by the GangaCMS plugin to create crab.cfg. Contains parameters like total_num_events, events_per_job..
- Userarea [to be renamed]: user analysis code.
- Option file [to be renamed]: cmssw enviroment setup script.
- Inputtype: CMSSW version. At this moment only 3_7_0 is available.
5.2. Rules
Not many rules related with test templates.
- You cannot create/modify templates.
- Only HammerCloud operators can, so ask them if after reading this you still have questions.
- Functional templates NEVER have sites or clouds associated, independently of their active status.
- You cannot create tests with ACTIVE functional templates.
6. Submission algorithm
6.1. ATLAS
[coming soon]
6.2. CMS
The submission algorithm works in the same way independently of the test template ( functional or stress) used to fill the test.
There some parameters needed to be taken into account:
- Resubmit enabled
- Resubmit force
- Num. datasets per bulk
- Min. queue depth
- Max. running jobs
6.2.1. Generation
Once the test is starting, the GangaCMS plugin connects to DBS discovery and looks for all datasets matching every single site and pattern.
Then, takes D datasets per site ( with D=
Num. datasets per bulk) and creates a crab.cfg file for each dataset&site.
6.2.2. First Submission
At this step, every crab configuration file created during the previous step is submitted. As simple as that.
6.2.3. Loop control
This step is active from the end of the first submission to the endtime ( or lifetime). By this time, we may have some jobs in submitted, running, failed, etc... state.
The loop will always try to submit more jobs to the
GRID if
Resubmit enabled is True, but there are the following limits:
- If there are more submitted jobs in that site than Min. queue depth
- If there are more running jobs in that site than Max. running jobs
- If in the last 3 hours there were more than 20 completed(c) + failed(f) jobs, and f/(c+f)>0.7
On the other hand, all this limits can be avoided setting up to True
Resubmit force.
- id: unique among tests.
- startime: when does the test start.
- endtime: when does the test end.
- state: state of the test. List of possible states below.
7. HammerCloud models
7.1. Host
It's the machine from where the jobs are going to be sent, and the one processing all the results before uploading them to the web site.
User can select a list of hosts, but the test will run in
ONLY one machine. The one with the lowest load at the test generation moment.
7.2. Cloud
Depending on the experiment, the cloud feeling is stronger or weaker. Anyway, it's a way to group sites together giving them the same parameters.
7.3. Site
If you don't want to use clouds, you can get the same results adding the sites belonging to a cloud manually. The main difference is the time it will take you.
7.4. Dspattern
Substrings used to find datasets.
7.5. User
Every user has certain restrictions due to security. You cannot modify/delete tests unless you are the owner. So, list your user at every test you create.
7.6. File
Any kind of usercode, script, template, inputdata is represented as a file.
8. Metrics
8.1. Common
8.2. ATLAS
[coming soon]
8.3. CMS
[coming soon]
9. Contact
Find us
here
.
Thanks for reading me !