TWiki
>
CMSPublic Web
>
CompOps
>
CompOpsWorkflowTeam
>
WorkflowTeamMeeting
>
WorkflowTeamMeeting20140327
(2014-03-27,
JenniferAdelmanMcCarthy
)
(raw view)
E
dit
A
ttach
P
DF
%TOC{depth="3"}% ---+++[[https://indico.cern.ch/event/254699/][Vidyo Link]] ---+++ Attending * Sara, Jullian, SeangChan, Andrew * Dave, John, Luis, Jen ---+++ Staffing | March 18 -> March 25 | Adli | | March 25 -> April 3 | Sara | * SeangChan at CERN this week ---+++ Shift News * one MC WF is failing but not really a problem * issues page needs to be looked at to find out what WF's that need to be looked at * help clear out orphaned ACDC's ---+++ News * This week: We need to tart using GGus for ticketing! * Update documentation related - Jullian will do this * we will need to do a full site drain for FNAL for the final disk tape separation. * Possibly as soon as Monday April 1, but wait for Official word on Monday * how long will FNAL be down for the switch? * we are changing where data will be registered so it needs to be fully drained * reads from decache will change and will be inaccessable * unmerged area has been mounted on all the wmagent machines and we will still have that access to get to the log file * do changes need to be pushed to PhEDEx endpoint? no we have activated the DISK endpoint a while ago and we are actively reading from it so the WMAgent config will not need to change only the plugin to the TFC and the Stageout Plugin need to be changed * currently testing plugin * Monday - stop submitting to FNAL * spend week clearing out all WF's that run at FNAL * Monday April 7th we flip the switch and FNAL is disk tape sparated and we start running again * we want our first to be GEN_SIM_RAW and an AOD output so we can make sure that we are doing all the right things ---+++ Agent Issues * 201 upgraded * how is 202 doing? * taking longer to go through files hopefully tomorrow * who is next? 85 will go into drain next Week and when it comes up it will become a low_prio reprocessing machine * 234 will be in drain after the disk tape separation at fnal * couch and Oracle are both reset during the upgrades ---+++ Workflow Issues ----+++ MonteCarlo * one WF is stuck and SeangChan is looking at it. ----+++ Redigi/ReReco * We have 11 WF's that are showing up as "complete" in WMStats but are actually "closed-out" when you click on them. Dave/Andrew please announce and lets see if we can get them moved off the list: [[https://cmslogbook.cern.ch/elog/Workflow+processing/13633][13633]] * jen will double check the list and e-mail Seangchan * 2 WF's still waiting for IN2P3 to get all data on disk. Have we heard anything lately? may have to clone them???? * Problems cloning wf's due to mismatch between cmssw and scramArch [[https://cmslogbook.cern.ch/elog/Workflow+processing/13614][13614]] Dave/Andrew?? * pdmvserv_FSQ-LowPU2010DR42-00007_T1_US_FNAL_MSS_00002_v0_BS2011_140319_161539_6642 - All step0's ran successfully but merge jobs all failed [[https://cmslogbook.cern.ch/elog/Workflow+processing/13612][13612]] * pdmvserv_FSQ-LowPU2010DR42-00005_T1_ES_PIC_MSS_00001_v0_BS2011_140311_201627_213 - 100% failure [[https://cmslogbook.cern.ch/elog/Workflow+processing/13588][13588]] * number of WF's still having file read problems at FNAL even after we thought we had pileup on disk and sitting in acquired for a long time [[https://cmslogbook.cern.ch/elog/Workflow+processing/13623][13623]] ---+++ Site issues for the Workflow team * Site Support Database - please have a look * let us know if there are any plots that would be of interest for the workflow team that you would like us to include * https://test-wrdb.web.cern.ch/test-wrdb/index.php * https://test-wrdb.web.cern.ch/test-wrdb/summary.php ---+++ Andrew's questions/Luis & Seangchan's answers ---+++ Issues from last week's meeting * Problems getting logs at FNAL - Dave * FNAL people ask T1 people if we can have the unmerged space on the agent machines so we can see logs. * find out why srmcp was not working over the weekend * still need to document how to get files * High Load on agents causing couch/components to crash: * we need a max jobs submitted by an agent. [[https://github.com/dmwm/WMCore/issues/5032][github 5032 April-May low priority, need for loosing teams]] * lost information when agent went down [[https://cmslogbook.cern.ch/elog/Workflow+processing/13518][ elog 13518]] -closed * Summer12DR53X - at FNAL failing due to pileup * IN2P3 disk full so we can not copy any more data to it for input.. put off ACDC for now ---+++ AOB -- Main.JenniferAdelmanMcCarthy - 20 Mar 2014
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r6 - 2014-03-27
-
JenniferAdelmanMcCarthy
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback