TWiki
>
CMSPublic Web
>
WebPreferences
>
WorkflowTeamMeeting20150205
(2015-02-05,
JenniferAdelmanMcCarthy
)
(raw view)
E
dit
A
ttach
P
DF
---+!! Workflow Team Meeting - Feb 5 4PM CERN time %TOC{depth="3" title="Contents:"}% ---++ Vidyo Link * https://indico.cern.ch/event/372501/ ---++ Attending * FNAL: Jen, Luis, Jorge, Seangchan, John * US: Ajit, Ian * EU: Vincenzio, Julian, Xaviar, Alan, Andrew, Dima ---++ Personel * Julian will be off Feb 10-13th - in Istanbul Already updated calendar * Jen May be off Feb 13-16, Pending weather ---++ News * About EU Operators: * Two new guys from Seoul Univ. will start training next two weeks. * Already added to workflow team e-group * Also news from Belgium. * in principal - still using Sara for 2 mo she will disappear before Summer * Xavier will have 4 wks for us * new postdoc in group that we will have for next year * Xavier will put us in contact with the people who he is arranging, we need to work on haveing better feedback to the operators * give shifters more specific tasks. * US Operations news: * Sean will not be shifting for the time being he has too many responsiblities with classwork and teaching * FNAL will have a downtime on the 11th * Old requests monitoring (Dima) * we need to check periodically for requests that are in production for too long: * http://dmytro.web.cern.ch/dmytro/cmsprodmon/requests_in_production.php * Typical issues: * rejected workflows that were not properly communicated back to PPD and still reflected as "submitted" in McM * lost/forgotten workflows. Example: pdmvserv_HIG-Summer12DR53X-01991_T1_US_FNAL_MSS_00212_v0__140502_155516_4571 * PPD - Lots of local GEN-SIM to produce minbias sample, but they injected everything ---++ Site support * SAM and HC problems in all sites * drain list script had some issues and sites have not been updated. SSB list is ok, we can do it manually. ---++ Agent Issues * JobAccountant unstable on Feb 3 - SeangChan had to run Alan's script a number of times to get things running again. Not sure why this is happening, but he spent some time looking at it. * Condor sent back FWJR that was corrupted there is missing info on jobtype. We have a script how to fix it, but we should try to figure out what the source of this is. ---++ Redeployment plan * Submit2 redeployed on Wed * Global Pool | *production SL6* || | submit1 (up)<br/> submit2 (up) <br/> cmssrv217 (up) <br/> 218 (up)<br/> 219 (up) | vocms0308 (down) <br/>vocms0309 (down) <br/>vocms0310 (down) | * Production Pool: * All Production machines have been retired. * CERN machines installed and tested: * We are waking them up when one of the FNAL agents reach 75% disk -> submit1 most probably (is at 61% now) * The idea -> drain submit1 and submit2 and use them as backup. * Please check that you have access to the machines! * Also check access to vocms049 (for scripts running) * Condor scripts have been moved to vocms0308 ---++ Workflows * Backfill again: two TMNT (Teenage Mason's Nuclear Trolls) * 1x10^9 events ~ 770K jobs, 4.5K prio * maybe next time we put in backfill choose WF's better so we have shorter lag time. 3-4 hr time * Priorities working fine - however check this elog [[https://cms-logbook.cern.ch/elog/Workflow+processing/18635][see elog]] ---+++ ReDigi ---+++ miniaod's ---+++ Rereco ---+++ Store Results * 1 wf waiting for resources at FNAL ---+++ MonteCarlo * Huge load of RunIIWinter15GS injected yesterday, 61 in acquired, 9 running ---++ RelVal Andrew * old issue of WF's not closing out if multiple WF's running off same input dataset is still happening ---+++AOB * getting low level backfill's going on all sites to keep health of sites happy. Good idea, Julian is looking into it. -- Main.JenniferAdelmanMcCarthy - 2015-02-04
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r6 - 2015-02-05
-
JenniferAdelmanMcCarthy
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback