TWiki
>
CMSPublic Web
>
CompOps
>
CompOpsWorkflowTeam
>
WorkflowTeamMeeting
>
WorkflowTeamMeeting20141113
(2014-11-13,
JenniferAdelmanMcCarthy
)
(raw view)
E
dit
A
ttach
P
DF
---+!! Workflow Team Meeting - Nov 13 4PM CERN time & US Meeting Tues Nov 11 at 1PM FNAL time %TOC{depth="3" title="Contents:"}% ---++ [[https://indico.cern.ch/event/352750/][Vidyo Link]] ---++ Attending * Tues Meeting : Jen, Ian and Sean * Ian * FNAL: Jen, Luis, Dave * CERN : Julian, Andrew, Alan, Dima ---++ Personel EU | Nov 7 -> Nov 13 | ? | | Nov 14 -> Nov 20 | Sara | US US shifters have decided it works better for their schedule to commit to 10 hrs a week of monitoring. We will give it a try and see how it goes! * Julian will be in Colombia 25th Nov - 25th Dec - Working plans on being online. * Luis will be in Columbia Dec 20-through New Year will get us exact dates soon ---++ News * Christmas Production hints are coming * MC has started to start nailing things down so when we do the full on reconstruction it will be properly calibrated * startup digi-reco in March * 3-4 MC generation with reconstruction between now and then. * Upgrade MC just came in, which is the generation step * real start of MC of 1 Billion events starting in the spring. several high pressure cycles between now and then, we don't know scale or timeframe but we should be ready to take it on when it comes. * FNAL will have a downtime for storage upgrade on Thurs, we will put the site in drain * should be up at end of day FNAL time * Has everyone filled out the doodle poll? * http://doodle.com/rqfhrb4mnab9ucqy * both Sean and Ian have filled out the poll * GlideIn WMS not running new jobs - [[https://cms-logbook.cern.ch/elog/GlideInWMS/892][https://cms-logbook.cern.ch/elog/GlideInWMS/892]] Farrukh and Krista working on it. Things stuck everywhere! * Production pool there are some jobs running now matching is really slow. We need to keep an eye on it. * CERN schedd's * condor plots in dashboard not showing all jobs. there is a lot pending and not a lot running at T2's * cmsweb migration to SL6 VM's impact: * Slow down on couchdb replication * Slow down on wmstats (monitoring, debugging), reqmgr (assigning, aborting, cloning, etc) * tests on Physical machines is going on so hopefully we can be back to physical machines next ---++ Site support * Re commissioning for production * T2_PL_Swierk (new site), Julian sent test workflow. Is it ready? * Tests stuck in acquired - site not created in the resouce-control * Julian resent test today should have results tomorrow * T2_RU_INR (200 cores) moving out of the Morgue. Please test it for production * will test tomorrow ---++ EU shift notes * ---++ US Shift notes * Sean had to update his FNAL passwords * Ian still having issues loging tinto SL6 or CERN. I forwarded the FNAL error messages to Lisa G. Ian will re-bug Julian and Ivan about the CERN machines. * he can login but can't authenticate so he can't do anything with the stuck WF's on CERN machines. ---++ Agent Issues * Error Handler crashes - * crashes due to slow connection with couch * couch replication stopped all these should be fixed with cmsweb upgrade to physical machines * disabled compactions so we are running, but we can't survive long this way ---+++ Redeployment plan * Production Pool: | *production SL6* | *mc SL5* | |cmssrv217 (up) <br/> 218 (up)<br/> 219 (up) | vocms216 (up) <br/> 201 (up) <br/> 235 (up) <br/>| | *reproc_lowprio SL5* | *step0 SL5* | | vocms202 (up)<br/>234 (up)<br/>85 (drain - will be abandoned)| vocms237 (up - will be abandoned) | * Global Pool | *backfill SL6* | | submit1 (up)<br/>submit2 (up) | * All agents have latest version. * cmssrv98 and 112 agents shut down * vocms216 was rebooted on thursday. No major impact. ---++ Workflows * let's put low priority data to the submit :backfill team to further test the global pool ---+++ ReDigi * Top Priority WF's Phys14DR then miniaod's * wf's were not given a custodial site so I manually subscribed them and then they closed out. ---+++ miniaod's * ---+++ Rereco ---+++ Store Results * Shutdown of Savannah is making it so we can't do store results anymore * Julian is messing with scripts to get things working waiting for FNAL downtime to end ---+++ MonteCarlo * Taskchains giving some pain [[https://hypernews.cern.ch/HyperNews/CMS/get/dataopsrequests/5884/1/1/2/1/1/1/1/1/2/1/1/1/1/1/1/1/1.html][hypernews]] * MinBias also giving some pain.[[https://hypernews.cern.ch/HyperNews/CMS/get/dataopsrequests/5884/1/1/2/1/1/1/1/1/2/1/1/1/1/1.html][hypernews]] ---++ SL6 testing/backfill ---++ RelVal Andrew * why is FNAL in drain? * why does the site whitelist that is set in the workflow only apply to the first task? * jobcreator problem: https://cms-logbook.cern.ch/elog/Workflow+processing/17619 * can seangchan explain this cloning function: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/HTTPFrontEnd/RequestManager/ReqMgrRESTModel.py#L517 * CouchConflictError error, some workflows have moved to failed, e.g. https://cmsweb.cern.ch/reqmgr/view/details/nancy_RVCMSSW_7_3_0_pre2ZpMM_2250_13TeV_Tauola_141112_184041_8462, see this https://cms-logbook.cern.ch/elog/Workflow+processing/17634 -- Main.JenniferAdelmanMcCarthy - 2014-11-11
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r7 - 2014-11-13
-
JenniferAdelmanMcCarthy
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback