TWiki
>
CMSPublic Web
>
CompOps
>
CompOpsWorkflowTeam
>
WorkflowTeamMeeting
>
WorkflowTeamMeeting20140204
(2014-02-04,
JenniferAdelmanMcCarthy
)
(raw view)
E
dit
A
ttach
P
DF
https://indico.cern.ch/conferenceDisplay.py?confId=254692 %TOC{depth="3"}% ---+++ Attending * Jen, Dave, Seangchan, John, Luis, Julian, Alan, Andrew, Adli, Sunil ---+++Shift * Feb 4-11 Sunil * Feb 11-18 Xaviar * Anybody who can put time in this next week is incouraged to do so! * Please e-log when you come on and off shift, I am seeing very little traffic from our shifters ---+++ Extra Meetings this week: * DBS Upgrade meeting Tomorrow at 5 :https://indico.cern.ch/conferenceDisplay.py?confId=298463 * Thurs we will have a Bonus Workflow Team Meeting at 5:30 CERN time. Jen will set up Vidyo ---+++ Issues * We are pushing the agents to their limits right now, it is important that the shifters log onto all the machines and do a df to make sure that we are not getting over 90% full on disk space, when we get above that threshold couch starts having issues and the agent goes down, let's try to stay ahead of this!!! * Where are we in clearing out MC? I am focusing pretty much completely on Redigi, Julian where does MC stand? * Less than 20K jobs pending, I think we're going to be ready by tomorrow afternoon. * Any WF's not at 80% by Thurs we will kill and clone after the upgrade * Changes in clone script - Luis * changes in scripts for the dbs3 upgrade - Luis & Julian * New script for setting thresholds: [[https://cmslogbook.cern.ch/elog/Workflow+processing/12492][12492]] * I haven't checked out the new proceedure, has anybody but Luis or John done so? * Clearing out Redigi WF's - Jen, Dave and Andrew * KIT - went through these Mon most are reading all the data from FNAL via xrootd, fingers crossed that they will run * IN2P3 - Dave went through these Monday, we are in "Kill and clone" Mode for them. Luis made some tweeks to the resubmit.py script and is cloning them tonight. We will see in the morning how they are going * Other sites: * Large number of WF's that are showing duplicates in the outputs. The ones that we have checked inputs for the inputs appear to be OK. The following e-logs are documenting this issue and how we are working through it: 12485 , 12460 , 12200 * PIC WF's - no errors and WF's not at 100%, tried clone and it didn't fix the issue pdmvserv_HIG-Fall11R2-01424_T1_ES_PIC_MSS_00019_v0__140120_140109_7919 12292 pdmvserv_TOP-Summer12DR53X-00187_T1_ES_PIC_MSS_00105_v0__131208_161000_3034 12449 * CNAF WF's - these all have file read errors where recovery/acdc did not fix the issue. Input blocks have been checked. * RAL - errors even with 100% of datasets in place check with Dave but may have to clone? * who has time to check to see if there are any datasets in dbs2/3 that are not in the other database? Yuyi will want this info for the dbs3 upgrade meeting Wed morning Julian??? Sunil??? * there are some files, Andrew will post the difference and tell us what they are and when the WF's ran. We need this for tomorrow's meeting. ---+++ Site issues - John ---+++ Andrew's questions * /store/unmerged/logs problem: https://cmslogbook.cern.ch/elog/Workflow+processing/12616 * FNAL tape family issue * FNAL pinning issue * FNAL disk tape separtion ---+++ AOB * Next week should be much more sane than the last month and a half have been, let's take next weeks meeting to go through all the scripts we use, Julian where is the list? and vet them.. which ones are actually used? which ones can go away? for the ones we use, have they been properly updated for the DBS3 transfer? * we had to hack the dbsTest.py script over the weekend to get things to close out. We need to formalize the changes and get them turned into git. * Next Challenge after migration: Unifying mc, mc_highprio, reproc_lowprio, reproc_high_prio teams: * Fewer agents * Request-based priority on global workqueue. * make Agents move to drain automatically when they get above 80% once we have teams unified - new github issue -Luis -- Main.JenniferAdelmanMcCarthy - 04 Feb 2014
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r5 - 2014-02-04
-
JenniferAdelmanMcCarthy
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback