Site Availability Monitor (SAM) tests for ATLAS

CMSsandbox.ToDo list

RequirementPlansForSAM : interaction between ATLAS and SAM/GridView team (LCG)

ARDADashboardSAM: interation between ATLAS and the ARDA Dashboard team

SAM tests status

Site Availability with SAM

How to check the SAM tests

This page describes the procedure to check the SAM tests on the CMS sites.

Historical availability

Last month SAM availability

  • T1's:

Current status of tests

with Nagios

WLCG Site availability

  • The definition is here. To make a long story short:
    • the daily service availability is the fraction of time when all critical tests were ok
    • the daily site availability is the fraction of time when all services were ok
      • if a site has multiple CEs or SEs, it is enough that one of them is available
  • The availability is calculated on the basis of the CMS critical tests
  • The WLCG availability can be plotted from Gridview

SAM, Critical Tests, FCR, Site exclusion

  • EGEE production RB's pick the list of sites from a BDII which filters sites according to FCR (Freedom of Choice for Resources) input. If desired, FCR can "black-list" or "white-list" individual sites relying on the SAM tests.
At the present status FCR does NOT black list any site also if it's failing ATLAS Critical tests.

  • FCR and SAM administrator for ATLAS are Simone.Campana-at-cern.ch and Alessandro.Di.Girolamo-at-cern.ch

What to do if tests fail

Hints for site admin on how to find/fix the reasons for SAM CMS test failures

SAM Installation

These are instructions to install a SAM UI which exist only for reference. Nobody is supposed to send SAM tests on his own.

The official SAM code is in CVS.

SAM Configuration

Here there are some notes about the SAM client configuration.

List of CMS Specific tests in SAM

All test scripts are in CVS repository in package CMSSAM. Access instructions are here

Tests we use

  • test names as they appear in SAM DB (e.g. for XML access)
Name Description Provided by Status
Job Submission Verify that it is possible to run a job with an lcgadmin proxy via RB A. Sciaba' Running
Production Verify that it is possible to run a job with a production proxy via RB A. Sciaba' Running
Basic minimal "site is alive" test. Verify local site configuration, SW installation and TFC S. Belforte Running
SW Installation verification of SW installation and that CMSSW can be installed remotely C. Wissing Running
Monte Carlo verify site is OK for MC Production. Check stage out and clean up J. Hernandez Running
Squid verify Squid is working and can fetch from ORCOFF E. Wicklund Running
Frontier verify CMSSW jobs access non-event data via Frontier E. Wicklund Running
SRM verify site SRM from the SAM UI, w/o relying on other sites N. Magini Running
SRMv2 verify site SRMv2 from the SAM UI, w/o relying on other sites N. Magini Running
Analysis site is validated for user/organised Data Analysis S.Belforte Running
/store/user verify that /store/user area is usable S.Belforte + N.Magini Running
GridInformation verify site publishes correct info Computing Commissioning Do we want this?

Tests we may want (Communication Space)

Please insert here comments, feedbacks and mostly requests, indications for additional tests you think are needed

  • Production Stage Out tests should be run with cmsprd role, so that directory can be written
  • test that scramv1 project work in configuration script

FAQ

  • 1. What are SAM tests
    • a suite of short (few min) test scripts run every 4 hours at all sites, may include both tests run via grid jobs and tests done from the User Interface. SAM machinery is developed by EGEE grid operations and includes DB for
  • 2. Who define what tests we run
    • Computing Commissioning coordinators, to be eventually passed on to Facility Operations
  • 3. Who runs SAM tests ?
    • generic SAM tests on all EGEE sites are run by EGEE operations (SA1) at CERN. CERN also hosts needed hardware and srevices (dedicated UI's and RB's, DataBase, Web Server etc.)
    • LCG Experiment Support team at CERN (aka EIS group in CERN IT) runs CMS specific tests. EIS support for CMS is currently funded by CERN IT and INFN * 4. Where is SAM tests output * look at links at top of this page
  • 5. What is a SAM critical test ?
    • CMS can define (via its VO manager and the Freedom of Choice of Resources tool) any (list of) SAM tests to be critical. Sites who fail that will be removed from the grid (from BDII) until problems are fixed. This should definitely get site manager attention.

To Do List

#CMSsandbox.ToDo
what whom status
Cleanup twiki all  
make test fall back to latest CMSSW release available on site (and Warning) Stefano to do
connect to TC for list of needed releases Christoph Wissing to do
get site list from SiteDB Andrea to do
SRM tests in SAM Andrea/Nicolo' ongoing (done for SRMv1)
Add TFC check to config test using PhEDEx Utils Stefano to start
add fallback (remote) stageout to Production test ? Guillelmo/Jose H. to start
framework to run tests on demand Andrea to start
automated VOMS proxy renewal Andrea to start
scheduled downtime in dashboard Stefano/Andrea/Julia to start
XML calls to get site's scheduled downtime David Collados started
get OSG scheduled downtime in SAM DB Piotr when OSG is ready ~ November
bug OSG above that Stefano to do
SW install sanity check in swinstall, add check for space Christoph Wissing to do
SW install add consistency of scram list and tags Christoph Wissing to do

Done things

-- AleDiGGi - 13 May 2008

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2011-06-22 - AndresAeschlimann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback