---+ Site Availability Monitor ([[https://wiki.egi.eu/wiki/SAM][SAM]]) tests for ATLAS %TOC% [[SAMForATLAS#CMSsandbox.ToDo][CMSsandbox.ToDo]] list RequirementPlansForSAM : interaction between ATLAS and SAM/GridView team (LCG) ARDADashboardSAM: interation between ATLAS and the ARDA Dashboard team #SamStatus ---++ SAM tests status * [[https://lcg-sam.cern.ch:8443/sam/sam.py][SAM Visualization]] main page ---++ Site Availability with SAM * [[http://gridview.cern.ch/GRIDVIEW/same_index.php][GridView ServiceAvailability]] main page ---++++ How to check the SAM tests This [[SAMATLASChecklist][page]] describes the procedure to check the SAM tests on the CMS sites. ---++++ Historical availability ---++++ Last month SAM availability * T1's: ---++++ Current status of tests * [[http://tinyurl.com/6xcbf4][T0-T1-T2]] ( [[http://tinyurl.com/5824xt][CE only]] | [[http://tinyurl.com/62mvxj][SRM v1+v2 only]] ) ---++++ with Nagios * NagiosProbeForSAM ---+++ WLCG Site availability * The definition is [[https://twiki.cern.ch/twiki/pub/LCG/GridView/Gridview_Service_Availability_Computation.pdf][here]]. To make a long story short: * the daily service availability is the fraction of time when all critical tests were ok * the daily site availability is the fraction of time when all services were ok * if a site has multiple CEs or SEs, it is enough that one of them is available * The availability is calculated on the basis of the CMS [[https://hypernews.cern.ch/HyperNews/CMS/get/sc4/485/1/1/1/1/1/1/1/2.html][critical]] tests * The WLCG availability can be plotted from [[http://gridview.cern.ch][Gridview]] ---++ SAM, Critical Tests, FCR, Site exclusion * EGEE production RB's pick the list of sites from a BDII which filters sites according to FCR (Freedom of Choice for Resources) input. If desired, FCR can "black-list" or "white-list" individual sites relying on the SAM tests. At the present status FCR does NOT black list any site also if it's failing ATLAS Critical tests. * [[https://cic.gridops.org/index.php?section=vo&page=freedomofchoice][FCR page]] on [[http://cic.gridops.org/index.php?section=home&page=homepage][CIC Portal]] (standalone [[https://lcg-fcr.cern.ch:8443/fcr/fcr.cgi][FCR page]] ) * [[ATLASCriticalTests#List_of_critical_CE_tests][List]] of SAM tests critical for ATLAS and OPS * FCR and SAM administrator for ATLAS are Simone.Campana-at-cern.ch and Alessandro.Di.Girolamo-at-cern.ch ---++ What to do if tests fail Hints for site admin on how to find/fix the reasons for SAM CMS test failures * Failing the [[JobSubmission][Job Submission test]] * Failing the [[FailingSquid][Squid test]] * Failing the [[FailingMC][MC = Production Stage Out test]] * Failing the [[FailingAnalysis][Analysis test]] * Failing the [[SAMLocalSRMv2#TroubleShooting][SRMv2 tests]] ---++ SAM Installation [[SamUiInstall][These]] are instructions to install a SAM UI which exist only for reference. Nobody is supposed to send SAM tests on his own. The official SAM code is in [[http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/same/][CVS]]. ---++ SAM Configuration [[SAMClientConfig][Here]] there are some notes about the SAM client configuration. ---++ List of CMS Specific tests in SAM All test scripts are in CVS repository in package [[http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/COMP/CMSSAM/?cvsroot=CMSSW][CMSSAM]]. Access instructions are [[http://cmsdoc.cern.ch/cmsoo/projects/cvs_server.html][here]] ---+++ Tests we use * [[%ATTACHURL%/cms-sam-tests.txt][test names]] as they appear in SAM DB (e.g. for XML access) | Name | Description | Provided by | Status | | [[SAMJs][Job Submission]] | Verify that it is possible to run a job with an lcgadmin proxy via RB | A. Sciaba' | Running | | [[SAMProd][Production]] | Verify that it is possible to run a job with a production proxy via RB | A. Sciaba' | Running | | [[SAMBasicTest][Basic]] | minimal "site is alive" test. Verify local site configuration, SW installation and TFC | S. Belforte | Running| | [[SAMCMSSWVersionTest][SW Installation]] | verification of SW installation and that CMSSW can be installed remotely | C. Wissing | Running | | [[SAMMonteCarlo][Monte Carlo]] | verify site is OK for MC Production. Check stage out and clean up | J. Hernandez | Running | | [[SAMSquid][Squid]] | verify Squid is working and can fetch from ORCOFF | E. Wicklund | Running | |[[SAMFrontier][Frontier]] | verify CMSSW jobs access non-event data via Frontier | E. Wicklund | Running | | [[SAMLocalSRM][SRM]] | verify site SRM from the SAM UI, w/o relying on other sites | N. Magini | Running | | [[SAMLocalSRMv2][SRMv2]] | verify site SRMv2 from the SAM UI, w/o relying on other sites | N. Magini | Running | | [[SAMAnalysisTest][Analysis]] | site is validated for user/organised Data Analysis | S.Belforte | Running | | [[SAMUserArea][/store/user]] | verify that /store/user area is usable | S.Belforte + N.Magini | Running | | [[SAMGridInformation][GridInformation]] | verify site publishes correct info | Computing Commissioning | Do we want this? | ---+++ Tests we may want (Communication Space) Please insert here comments, feedbacks and mostly requests, indications for additional tests you think are needed * Production Stage Out tests should be run with cmsprd role, so that directory can be written * test that scramv1 project work in configuration script ---++ FAQ * 1. What are SAM tests * a suite of short (few min) test scripts run every 4 hours at all sites, may include both tests run via grid jobs and tests done from the User Interface. SAM machinery is developed by EGEE grid operations and includes DB for * 2. Who define what tests we run * Computing Commissioning coordinators, to be eventually passed on to Facility Operations * 3. Who runs SAM tests ? * generic SAM tests on all EGEE sites are run by EGEE operations (SA1) at CERN. CERN also hosts needed hardware and srevices (dedicated UI's and RB's, DataBase, Web Server etc.) * LCG Experiment Support team at CERN (aka EIS group in CERN IT) runs CMS specific tests. EIS support for CMS is currently funded by CERN IT and INFN * 4. Where is SAM tests output * look at links at top of this page * 5. What is a SAM critical test ? * CMS can define (via its VO manager and the Freedom of Choice of Resources tool) any (list of) SAM tests to be critical. Sites who fail that will be removed from the grid (from BDII) until problems are fixed. This should definitely get site manager attention. ---+++ To Do List #CMSsandbox.ToDo | what | whom | status | | Cleanup twiki | all | | | make test fall back to latest CMSSW release available on site (and Warning) | Stefano | to do | | connect to TC for list of needed releases | Christoph Wissing | to do | | get site list from SiteDB | Andrea | to do | | SRM tests in SAM | Andrea/Nicolo' | ongoing (done for SRMv1) | | Add TFC check to config test using PhEDEx Utils | Stefano | to start | | add fallback (remote) stageout to Production test ? | Guillelmo/Jose H. | to start | | *framework to run tests on demand* | Andrea | to start | | automated VOMS proxy renewal | Andrea | to start | | scheduled downtime in dashboard | Stefano/Andrea/Julia | to start | | XML calls to get site's scheduled downtime | David Collados | started | | get OSG scheduled downtime in SAM DB | Piotr | when OSG is ready ~ November | | bug OSG above that | Stefano | to do | | SW install sanity check in swinstall, add check for space | Christoph Wissing | to do | | SW install add consistency of scram list and tags | Christoph Wissing | to do | ---+++ Done things -- Main.AleDiGGi - 13 May 2008 * [[%ATTACHURL%/cms-sam-tests.txt][cms-sam-tests.txt]]: list of CMS specific SAM tests
This topic: Sandbox
>
TWikiUsers
>
AleDiGGi
>
AleDiGGiSandbox
Topic revision: r4 - 2011-06-22 - AndresAeschlimann
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback