TWiki
>
CMSPublic Web
>
CompOps
>
CompOpsWorkflowTeam
>
WorkflowTeamMeeting
>
WorkflowTeamMeeting20131008
(revision 4) (raw view)
Edit
Attach
PDF
https://indico.cern.ch/conferenceDisplay.py?confId=254674 ---++ Attending * Jen - from home * SeangChan, Luis, Dave M at Fermilab * Andrew * ---++Personel * Edgar back from vacation * Oct 1 --> Oct 8 Xavier * Oct 8 --> Oct 15 Sara * Jen is dealing with a sick dog this week and will likely be working from home all week. I will be available online. ---++Infrastructure * WMAgent issues: * We spent considerable time looking into stability issues of the agents. Luis and SeangChan made patches and updated the agents and that seemed to help stabilize things a bit. * New: * Replication stops - Solved issue, 216 and 201 need to be patched * central couch problem: plan rotation for this Friday, patch is available to filter out successful jobs not to be migrated to central will drain more and more agents * on the weekend, found that local couch was also not properly deleting documents, patched * need to update documentation on restarting couch replication. * Pending: * Display last time data was updated from each agent in wmstats * Don't make JobUpdater/TaskArchiver crash with couch connection error * CondorPlugin UnitTests * Couch call take too long * upgrade of 235 * testing of the parentage Problem - Edgar and Andrew * Workflow issues: * We had a significant number of WF's that were 'stuck' for MC processing. * Jen & Luis spent time debugging these workflows * the workflows that were stuck, but over 95% we looked at first. * One of the main reasons that they were not working was that the agent lost the site information for cleanup & Merge jobs. It was determined that the cause behind this was the instablity of couch all month. Luis and SeangChan are looking into ways to prevent this from happening in the future. * Issues with closeout script * the version of the dbsTest.py that is currently in git gives us the wrong counts/answer for the ReDigi WF's. Jacob verified that the old version he had around was giving the correct answers so for the week we used the old version of the script to close out the ReDigi/ReReco WF's and the new version for MC * seems to be working OK now * All week long dbsTest.py was having problems talking to DBS on and off. https://cmslogbook.cern.ch/elog/Workflow+processing/10120 * we have not yet come up with a solution on how to fix this problem. Hopefully Edgar will have a quick fix now that he is back. * this problem is persisting * condor_overview fixed and improved <div class="twikiConflict"><b>CONFLICT</b> original 2:</div> ---++Site Problems<div class="twikiConflict"><b>CONFLICT</b> version 3:</div> <div class="twikiConflict"><b>CONFLICT</b> end</div> ---++Site Problems ---+++ Waiting Room <img alt="" src="https://dl.dropboxusercontent.com/u/137533212/CMS/CompOpsMeeting/131007/WR_Table.png" width="800"/> ---+++ Sites for Production |*Site in MC*|*Slots*|*Status*|*Notes*|*Issues*| |T2_RU_PNPI|176|skip|to be commissioned|under maintenance until Sep 30 - SAM & Links errors| |T2_IN_TIFR|355|drain|they claim they fixed site issues | Site was in WR for 22 weeks|
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r4 - 2013-10-08
-
JenniferAdelmanMcCarthy
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback