TWiki
>
EGEE Web
>
SA1
>
PilotServices
>
PilotServiceArgus
(revision 32) (raw view)
Edit
Attach
PDF
--- <!-- "more magic" don't remove--> <!-- Page created from template PilotServiceHomeTemplate Having created a new home page correct the following information. The Start date will be updated automatically. Change the indico id and who ever was the chair. --> ---+!! glexec/Argus Pilot Service: Home Page --- * Start Date: Tue 24 Nov 2009 * End Date (tentative): 31 Mar 2010 * Description: Pilot Service of glexec/Argus @ FZK, SWITCH, CESNET, SRCE, INFN * Coordinator: Antonio Retico * Contact e-mail: =egee-pilot-argus@cern.ch= * Status : In Progress * [[LCG.PPIslandKickOff][Related meetings]] --- %TOC% ---+ Description ---++ Use cases * Experiment framework using glexec for production pilot jobs. * Test of grid-wise banning feature by OSCT * Gathering of requirements and analysis for monitoring tools ---++ Objective and metrics Objective: 1. Chain glexec - Argus demonstrated to interact correctly with LHC Exepriments' frameworks for pilot jobs 1. Maintenance and operations of the Argus service declared supportable by the sites 1. OSCT able to ban a user on the whole pilot infrastructure without specific intervention of the site administrators 1. Collection of exhaustive requirements for the implementation of monitoring tools ---+ Planning <!--BEGINPLANNING--> ---++ Initial plan | _Task_ | _Owner_ | _Start Date_ | _Due Date_ | _Status_ | |Set-up repositories and documentation| SA1, SA3, CNAF | 23-Nov-09|24-Nov-09| Done | |Preliminary installation (ARGUS, WN, CE)| SWITCH| 25-Nov-09|27-Nov-09| Done | |Core installations (ARGUS, WN, CE)| FZK, SRCE, CESNET, CNAF| 30-Nov-09|10-Dec-09| In progress | ---++ Constraints and milestones * kick-off with sites: 25-Nov (11 AM CET) * 1st site technically available for Experiments to test (SWITCH): 1-Dec * kick-off with experiments: 1-Dec (11 AM CET) * All sites technically available for Experiments to test: 15-Jan * Indicative start of Alice developments to integrate glexec: 18-Jan * Indicative start of CMS developments to integrate glexec: 15-Feb * END of activity (proposed): 31-Mar <!--ENDPLANNING--> ---+ Technical documentation ---++ Installation Documentation Yum repo: *Argus service* * _Repository URL_ : http://grid-it.cnaf.infn.it/apt/glite/pps/pilot/ARGUS/argus/sl5/x86_64/ * _INFO_ : the repository includes PATCH:3076 * Certification report of the Argus patch: [[https://twiki.cern.ch/twiki/bin/view/Main/ArgusCertification][ArgusCertification]] *Worker Node* * _Repository URL_ : http://grid-it.cnaf.infn.it/apt/glite/pps/pilot/ARGUS/glite-WN/sl5/x86_64/ * _INFO_ : the repository is synchronised to the production WN repository (_link_) and contains in addition * PATCH:3394 (gLExec SL5) * PATCH:3093 (LCMAPS with PEP-c client) *Computing Element* * _Repository URL_ : Production repository (I hope) * _INFO_ : ---++ Configuration instructions Both for the Argus service and GLEXEC, YAIM modules are available: * https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#ARGUS * https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#GLEXEC_wn For more fine tunings: * Argus: https://twiki.cern.ch/twiki/bin/view/EGEE/AuthorizationFramework ---++ Post configuration tests In order to test the correct deployment of Argus, after the installation/configuration some basic tests can be done using the pap-admin to store/list/update/remove policies. After this, the pepcli can be used to test authorization requests/responses. pap-admin and pepcli are documented in the Argus main twiki. In order to test the interaction glexec-Argus do something like this from a whitelisted account on the Worker Node: <verbatim> export X509_USER_PROXY=<target_proxy> export GLEXEC_CLIENT_CERT=${GLEXEC_CLIENT_CERT:-$X509_USER_PROXY} $GLITE_LOCATION/sbin/glexec /usr/bin/whoami </verbatim> And verify that the returned user is the mapped one. ---+++ Configuration requirements for sites supporting Atlas * if a myproxy server is used to pass the credentials, myproxy-logon has to be installed on the WN (it should be the default in production by now) * if a plain proxy is retrieved, and adding voms attributes on the WN is needed, the vomses file has to be reachable from the WN. * both the roles =atlas:/atlas/Role=production= and =atlas:/atlas/usatlas/Role=pilot= need to be enabled to submit to the queue ---++ General documentation (user guides) * Argus user guide: https://twiki.cern.ch/twiki/bin/view/EGEE/AuthorizationFramework ---++ Test documentation * Extract from Argus load and aging test (from [[http://indico.cern.ch/materialDisplay.py?sessionId=1&materialId=2&confId=45480][GDB 14-Nov-2009]]) %TWISTY{mode="div" showlink="Show" hidelink="Hide" remember="off" firststart="hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}% --- <font color="CadetBlue" face="monospace"> <u>Summary of the load and aging tests done before the certification </u> * Load tests: * Service Host: 1x 2.33GHz CPU, 1gig ram * client recreated - simulates what glexec would do * ~60 req/sec, ~160ms (limited by spawning processes) * client reused - simulates what CREAM/WMS would do * ~240 req/sec, ~120ms * client reused, repeat request - simulates pilot jobs * ~1000 req/sec, ~37.6ms * Aging tests: * Test operation over several days with several mio requests * Memory usage: stable --- </font> %ENDTWISTY% --- * Certification report of the Argus patch: [[https://twiki.cern.ch/twiki/bin/view/Main/ArgusCertification][ArgusCertification]] ---+ Pilot Layout ---++ SWITCH *Argus* one virtual machine with SL5 64 bit, installed from the following repository http://grid-deployment.web.cern.ch/grid-deployment/glite/cert/3.2/patches/3076/sl5/$basearch/ (an alternative is now available here http://grid-it.cnaf.infn.it/apt/glite/pps/pilot/ARGUS/argus/sl5/x86_64/) *CE* One lcg-CE (diana.switch.ch) with two WNs (SL5 64 bit): all as vmware virtual machines. VOs enabled: dteam,dech,ops,atlas (no atlas software installed). The WNs were installed pointing to the following repository *CE endpoint* http://grid-it.cnaf.infn.it/apt/glite/pps/pilot/ARGUS/glite-WN/sl5/x86_64/ ---++ FZK-LCG2 *Argus* one virtual machine with SL5 64 bit *CE* All CEs at GridKa are usable but please refer to cream-3-fzk.gridka.de. The queue must be "pps". All software installations are available on the PPS WNs. Enabled VOs: alice, atlas, cms, dteam, lhcb and ops. *WNs* The PPS cluster has been extended to 300 cores For alice a separate *VOBox* is available: test-mw-fzk.gridka.de ---++ CNAF *Argus* one virtual machine on SL5/64bit *CE* One CREAM CE with two virtual WNs (SL5 64 bit). VOs enabled: dteam, infngrid, ops. *CE endpoint* =devce.cnaf.infn.it:8443/cream-pbs-cert= ---+ Results ---++ Feedback from the experiments ---++ Comments and issues from operations ---+++ SWITCH The instructions to manually install the Argus compatible WNs are wrong. It is recommended that yaim be used instead. ---+++ FZK / KIT reduction of the policy refresh time from 4hours to 15 mins requested: Angela opened the bug <br />https://savannah.cern.ch/bugs/index.php?62281 ---++ List of issues <!--BEGINOPENISSUES--> |Issue|Reported by|Bug(s)|Status|Open/Closed| |(Affects glexec on WMS-->GLEXEC-->CREAM chain): Wrongly configured GLITE_LOCATION makes sometimes impossible the discovery of the glexec executable |CERN| BUG:62810)fixed with patch 3760 (with provider) |open| |The default policy refresh time set to 4hours seems too long |KIT|BUG:62281| To be discussed with JSPG |open| | PEPd should require client-cert authentication support for connecting pep clients|CNAF-T1|BUG:60041|fixed with patch 3536 In certification|open| |||||| |||||| <!--ENDOPENISSUES--> ---++ Recommendation for Deployment in production ---++ Final assessment ---+ Tasks and actions: Actions for SA1 are tracked via the [[http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentTaskTracking/&][Sa1 Deployment task tracker]] %INCLUDE{http://egee-pre-production-service.web.cern.ch/egee-pre-production-service/MPS/show-tree.php?task=12720}% <!-- This is an example action item, just add new action items here. Please delete the example one. Note the example gets expanded in the template, Please when you duplicate delete the uid="xxxx" , closed="DD-MMM-YYYY", and closer="Main.SteveTraylen" they will be added automatically with an increment. A valid action item should have a "created", "creator", "due", "state" and "who". Obviously the state should be "open" not "closed". --> Tasks for other participants are tracked here ---++ Open <!--STARTOPENACTIONS--> %ACTION{ closed="2010-02-19" closer="" created="2010-02-16" creator="Main.AntonioRetico" due="2010-02-19" notify="" state="closed" uid="000089" who="Main.GiuseppeMIsurelli" }% Provide a report describing the issues being faced by CNAF for the installation of glexec on the WNs. <p />INFN-T1 is experiencing a problem on the stability of GPFS interacting with the WN on demand system adopted locally into the resource center. <br />Since they decided to provide virtual WNs for the pilot, the issue is affecting consequently the deployment of the glexec WN component into the site. <br /> %ENDACTION% %ACTION{ created="2010-02-16" creator="Main.AntonioRetico" due="2010-02-19" state="open" uid="000090" who="Main.GianniPucciani" }% Provide functional specification of glexec tests being implemented at SRCE %ENDACTION% %ACTION{ closed="2010-03-02" closer="" created="2010-02-02" creator="Main.AntonioRetico" due="2010-02-03" notify="" state="closed" uid="000083" who="Main.ChadLaJoie" }% Provide instructions on how to preserve local policies during the upgrade of the Argus server to a newer version both in an e-mail to the sites and in the PATCH:3536 <p />this was done on the 2nd of February <br />This is done now at https://savannah.cern.ch/patch/?3536 %ENDACTION% <!--ENDOPENACTIONS--> ---++ Closed %ACTION{ closed="2010-02-05" closer="" created="2010-01-02" creator="Main.AntonioRetico" due="2010-02-03" notify="" state="closed" uid="000085" who="Main.AngelaPoschlad" }% Open a bug to request the reduction of the policy refresh time from 4hours to 15 mins <p />3-2-10: Angeal opened the bug <br />https://savannah.cern.ch/bugs/index.php?62281 %ENDACTION% %ACTION{ closed="2009-12-18" closer="Main.AntonioRetico" created="2009-12-01" creator="Main.AntonioRetico" due="2009-12-18" state="closed" uid="000074" who="Main.AntonioRetico" }% Provide the timeline for an installation of a reasonable scale (>100WNs) to be available to Atlas in order to test glexec in production <p />Update 18-Dec (Andrea Ceccanti) : <br />Converging on CANF offering the first large-scale installation. They are currently working to the installation at the T! and they hope to have finished before Christmas or alternatively by the 6th of January in order to be ready by the 15th. If the preliminary tests now undergoing suceed they are Ok to use Argus 1.1 whenit will be in status "Ready for Certification" <p /> %ENDACTION% %ACTION{ closed="2009-12-18" closer="Main.AntonioRetico" created="2009-11-26" creator="Main.AntonioRetico" due="2009-12-01" state="closed" uid="000068" who="Main.SWITCH, Main.NIKHEF, Main.SA3" }% Finalise the YAIM configuration for Argus -compatible GLEXEC_WN %ENDACTION% %ACTION{ closed="2009-12-08" closer="" created="2009-11-25" creator="Main.AntonioRetico" due="2009-12-04" notify="" state="closed" uid="000065" who="Main.GianniPucciani" }% enumerate available deployment scenarios and see whether new developments have to be requested (or re-negotiations are needed with the sites) <p />Update 26-Nov. <br />After discussion with JRA1 and SA3 it was proposed to extend the support of the clients on SL4 . A new patch has been requested to the developers <br />Antonio <p />Update 1-Dec <br />During the last meeting Gianni was put in charge to open the bug with the change request<p />Update 18-Dec (Gianni) : <br />All new developments are now tracked by bugs<p /> %ENDACTION% %ACTION{ closed="2009-12-03" closer="" created="2009-11-25" creator="Main.AntonioRetico" due="2009-12-01" notify="" state="closed" uid="000066" who="Main.GianniPucciani" }% provide reference for basic testing for site administrators in the twiki <p />Update 1-Dec : <br />info now available in #Post_configuration_tests <p /> %ENDACTION% %ACTION{ closed="2009-11-26" closer="" created="2009-11-25" creator="Main.AntonioRetico" due="2009-12-01" notify="" state="closed" uid="000067" who="Main.AngelaPoschlad" }% reply to proposed timelines for FZK <p />Angela confirmed that staring on the 30th is fine for her %ENDACTION% ---+ History 16-Feb-2010 : Check point (LCG.PPIslandFollowUp2010x02x16): * Installation at KIT/FZK scaled-up to 300 cores 2-Feb-2010 : Check point (LCG.PPIslandFollowUp2010x02x02): * All sites will be soon requested to upgrade to the new version of Argus PATCH:3536 . CNAF-T1 will be the first, the other will follow * All sites requested to apply the workaround in BUG:62206 in order for the Argus servers to star being published in the information system. * Integration works in progress for Alice * Integration works confirmed to start at mid February for CMS 18-Dec-2009 : Check point (LCG.PPIslandFollowUp2009x12x18): * Testing of Argus version 1.1. in progress at CNAF * installation in progress at all sites. Platform expected available by the 15th of Dec 1-Dec-2009 : Fist installaiton at SWITCH available for testing 1-Dec-2009 : kick-off with the experiments (LCG.PPIslandKickOff2009x12x01) 25-Nov-2009 : kick-off with sites (LCG.PPIslandKickOff2009x11x25) 24-Nov-2009 : Pilot Home page created
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r35
<
r34
<
r33
<
r32
<
r31
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r32 - 2010-03-02
-
unknown
Log In
EGEE
EGEE Web
EGEE Web Home
gLite
ProductTeams
SA3
JRA1
TMB
EMT
SA1
SA2
NA2
NA4
EGEE-UIG
List of
registered projects
List of EGEE-RP
interactions
Changes
Index
Search
Main.WebList
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
EGEE
All webs
Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Ask a support question
or
Send feedback