TWiki
>
LCG Web
>
LCGGridDeployment
>
GLitePreProductionServices
>
EGEE_PPS_Coordination
>
PpsPilots
>
PpsPilotWMS32
(2009-09-02,
unknown
)
(raw view)
E
dit
A
ttach
P
DF
--- <!-- "more magic" don't remove--> <!-- Page created from template PpsPilotTemplate Having created a new home page correct the following information. The Start date will be updated automatically. Change the indico id and who ever was the chair. --> ---+!! WMS3.2 Pilot Home Page --- * Start Date: Mon 03 Aug 2009 * End Date: 20 Aug 2009 * Description: WMS 32. @ CERN * Coordinators: Maarten Litmaath, Antonio Retico * Contact e-mail: =wms-operations@cern.ch= (WMS Operations at CERN) * Status : Closed * [[PPIslandKickOff][Related meetings]] --- %TOC% ---+ Description A WMS at cern is installed starting with the version currently in PPS and certification wms219.cern.ch runs these WMS patches: https://savannah.cern.ch/patch/index.php?2597 https://savannah.cern.ch/patch/index.php?2896 https://savannah.cern.ch/patch/index.php?3044 https://savannah.cern.ch/patch/index.php?3156 <-- in certification It also has this LB patch, except for its YAIM component : https://savannah.cern.ch/patch/index.php?2848 For some reason the node still has glite-yaim-lb-4.0.2-1 instead of glite-yaim-lb-4.1.0-1; ---++ Use cases The WMS will be left operating with standard load from the 4 experiments. ---++ Objective and metrics ---+ Technical documentation ---++ Installation Documentation Patches installed from PPS + Patch repository in certification ---++ Configuration Instructions standard YAIM configuration ---+ Pilot Layout wms219.cern.ch is the only machine running WMS 3.2 at CERN. The node supports the four EXP VOs + ops and dteam. ---+ Tasks and actions: Actions for SA1 are tracked via the TASK:XXXX available from the [[http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentTaskTracking/&][PPS task tracker]] <!-- This is an example action item, just add new action items here. Please delete the example one. Note the example gets expanded in the template, Please when you duplicate delete the uid="xxxx" , closed="DD-MMM-YYYY", and closer="Main.SteveTraylen" they will be added automatically with an increment. A valid action item should have a "created", "creator", "due", "state" and "who". Obviously the state should be "open" not "closed". --> Tasks for other participants are tracked here %ACTION{ closed="2008-04-16" closer="Main.AntonioRetico" created="2007-03-05" creator="Main.AntonioRetico" due="2007-03-05" state="closed" uid="000168" who="Main.CERN_PPS" }% Example Action Item %ENDACTION% ---+ Results ---++ Feedback from the experiments <verbatim> ---------- Forwarded message ---------- Date: Wed, 5 Aug 2009 02:09:39 +0200 (CEST) From: Maarten.Litmaath@cern.ch To: Andrea Sciaba <Andrea.Sciaba@cern.ch> Cc: Alessandro Di Girolamo <Alessandro.Di.Girolamo@cern.ch>, Simone Campana <Simone.Campana@cern.ch>, Roberto Santinelli <Roberto.Santinelli@cern.ch>, Patricia Mendez Lorenzo <Patricia.Mendez@cern.ch>, Nicolo Magini <Nicolo.Magini@cern.ch>, Daniel.Colin.Vanderster@cern.ch, "wms-operations (WMS Operations at CERN)" <wms-operations@cern.ch>, Johannes Elmsheuser <johannes.elmsheuser@physik.uni-muenchen.de>, Antonio Retico <Antonio.Retico@cern.ch> Subject: WMS 3.2 pilot node wms219 looks good Hi all, CMS and LHCb have confirmed that wms219.cern.ch works fine for them and I did not receive complaints from ATLAS or ALICE either, so I think we can consider the current set of rpms and adjustments to the default configuration satisfactory. We now can proceed with the formal release procedure. I will supply details to the certification and release teams. Thanks, Maarten </verbatim> ---++ Comments and issues from operations Maarten: [to get in synch with PATCH:2848] * wms219 reconfigured with glite-yaim-lb-4.1.0-1 * bunch of test jobs submitted : all looks normal. * 2k more jobs submitted: looking for unexpected increases in disk usage. No new processes in "top". * after a day with 8561 Condor-G jobs, including 5k (sic) "ops" jobs spread all over the grid, there is no sign of real trouble. The only remarkable fact seems to be a new memory consumption record for the Workload Manager: <pre> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9387 glite 25 0 3184m 3.0g 5892 S 0.0 19.3 477:11.92 glite-wms-workl </pre> For the gLite 3.1 code I have seen up to 2.7 GB (higher values not excluded). * I think we can go ahead with the release after the formal certification of patch #3156 and with the release notes I detailed earlier. Cheers, Maarten ---++ Recommendation for Deployment in production When we go ahead with the release to production, the following should be part of the release notes: ---------------------------------------------------------------------- - In /opt/glite/etc/glite_wms.conf the "--ftpconn" values typically need to be increased from 30 e.g. to 300, to avoid the limiter refusing jobs too frequently. Bug: https://savannah.cern.ch/bugs/?53297 - In /opt/globus/etc/gridftp.conf "connections_max" typically needs to be increased e.g. from 50 to 500, to avoid !GridFTP connections being refused too quickly. Site-info.def should be adjusted: GRIDFTP_CONNECTIONS_MAX=500 - The WMProxy logging is fairly useless at its default level, so the admin may want to increase it from 5 to 6. Bug: https://savannah.cern.ch/bugs/?53294 Until the issues have been addressed by future YAIM versions, the WMS admin can create a file /opt/glite/yaim/functions/post/config_glite_wms with the following function to let YAIM adjust the parameters in its post-configuration step: <verbatim> config_glite_wms_post() { perl -i -pe ' BEGIN { $flag = 0; } s/(--ftpconn) \d+/$1 300/; /^\s*WorkloadManagerProxy/ && ($flag = 1); $flag && s/(LogLevel *=) *\d+/$1 6/ && ($flag = 0); ' /opt/glite/etc/glite_wms.conf /opt/glite/etc/init.d/glite-wms-wmproxy restart } </verbatim> - The Workload Manager is observed to take even more memory than seen with the WMS 3.1 code and therefore may need to be restarted regularly. Bug: https://savannah.cern.ch/bugs/?54144 Example cron job: <verbatim> # cat /etc/cron.d/restart-wm 16 2 * * * root (date; /opt/glite/etc/init.d/glite-wms-wm restart) >> /var/log/restart-wm.log 2>&1 </verbatim> ---------------------------------------------------------------------- ---++ List of issues found |Issue|Reported by|Bug(s)|Status|Open/Closed| |WMS 3.2 job wrapper template fails when 3.1 version works|operations|*BUG:53078*|fix certified in PATCH:3156|closed| |WMS 3.2 generates unusable !BrokerInfo file|operations|*BUG:53448*|fix certified in PATCH:3156|closed| |[ yaim-wms ] glite_wms.conf hardcoded parameters|operations|BUG:53297|issue for release notes described at https://savannah.cern.ch/bugs/?48479#comment8 |open| |glite-brokerinfo does not evaluate attribute references|developer|BUG:53686|Integration candidate|open| | Some information is missing in the !BrokerInfo file|developer|*BUG:53706*|fix certified in PATCH:3156|closed| |WMS 3.2 Workload Manager memory leak?|operations|BUG:54144|None|open| There are currently no open critical issues ---+ History 12-Jul-2009 : first installation at CERN 22-Jul-2009 : EMT received the list of critical bugs to be fixed before release to production 29-Jul-2009 : PATCH:3156 with the fixes released to integration and installed on wms319 03-Aug-2009 : Pilot Home page created 05-Aug-2009 : CMS and LHCb confirmed that wms119 is running fine. No bad news from Alice or Atlas 07-Aug-2009 : further test after LB re-configuration showed significantly increased memory consumption of the WMS 28-Aug-2009 : WMS 3.2 in production with gLite 3.1 Update 53 --- <!-- Uncomment the following setting in order to lock the minutes. Consider that, by locking the minutes, actions will be also editable only by member of the pps group --> <!-- These minutes can only be changed by members of: * Set ALLOWTOPICCHANGE = Main.EgeePpsGroup -->
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r6 - 2009-09-02
-
unknown
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback