TWiki
>
CMSPublic Web
>
WebPreferences
>
WorkflowTeamMeeting20141023
(2014-10-23,
JenniferAdelmanMcCarthy
)
(raw view)
E
dit
A
ttach
P
DF
---+!! Workflow Team Meeting - Oct 23 4PM CERN time %TOC{depth="3" title="Contents:"}% ---++ [[https://indico.cern.ch/event/347739/][Vidyo Link]] ---++ Attending * FNAL: Jen, Dave, Luis, Ian, SeangChan, Jorge, Juan * CERN : Andrew, Julian, Dima (New Dave) ---++ Personel EU | Oct 16 -> Oct 23 | Sara | | Oct 23 -> Oct 30 | Jasper | US | Oct 22 -> Oct 30 | Ian | * New US Operator Ian Dyckes ---++ News * Dima - first to help make process more streemlined, also SanDiego * Power outage! * We somehow managed to survive the power outage rather unscathed. Agents needed to be restarted but it doesn't appear that they lost their minds. * This does bring up the fact that we should at least think about what to do in case of a longer more catastrophic outage. We have agents at FNAL that can run the data, but we are then running blind. Something to think about. * Two key points two work: * Documentation backup: google cache [[http://webcache.googleusercontent.com/search?q=cache:NiqqWe24YtoJ:https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsWorkflowTeamWmAgentRealeases+&cd=1&hl=en&ct=clnk&gl=ch][example]] * Alternative communication channels: Gtalk, Skype, AIM, etc. * wmLHE+GEN-SIM and DIGI-RECO 53x (Phys14 MC for next run) * First wf ran with 100% CMSSW failure: https://cmslogbook.cern.ch/elog/Workflow+processing/17282 * assigned before release was made, it should be there now, acdc going to see if it works now. * Possible Urgent data coming late in the week? So far all we have is a rumor that something is coming and we have no idea what! * Urgent upgrade, 4 new campains starting * Week number 3 of having this in our news notes * still no new news or dates, we think this is a re-run of data we ran in Sept so the data should be on disk. * Dima is tasked with the job of getting more information for us. * Monitoring scripts: have to point to the global pool. Can we ignore analysis jobs? * Production jobs are showing in Dashboard, and being monitored, but the backfill going to global pool still are not and should be watched via the WMAgent * Backfill is now showing up as well to the global pool so we can start using it. * Ian, new US operator At FNAL for training this week. * vocms174 and vocms227 will be given back to Ivan on October 31st. Make sure to copy your stuff before this data. * vocms049 (already available) is the replacement for vocms174. * does not have git, it's an sl6 machine but it isn't big a small virtual machine just for running scripts * Let's have Julian ask git, xrd, xrootd so we can fetch our logs, mounting cvfms added to the puppet for sl6 machines, * Julian working in a new WorkflowPercentage and closeOut script: * Include taskchains and deal with FilterEfficiency * Better / faster - run as a cronjob with html output. * Waiting for requests to test ---++ Site support * John is on vacation. Not sure if there is any site news * problems with cpu bound on site status board, sites would disappear, Adli is catching up but thinks it has been taken care of ---++ Sara's notes ---++ Agent Issues ---+++ Redeployment plan Production Pool | *production* | *mc* | |cmssrv217 (drain) <br/> 218 (drain)<br/> 219 (up/new) | vocms216 (drain/redeployed soon) <br/> 201 (up/new) <br/> 235 (drain) <br/> cmssrv98(up - will be abandoned) | | *reproc_lowprio* | *step0* | | vocms202 (drain)<br/>234 (up/new)<br/>85 (up - will be abandoned)<br/>cmssrv112 (up - will be abandoned)| vocms237 (up/new - will be abandoned) | Global Pool | *backfill* | | submit1 (up/new)<br/>submit2 (up/new) | * vocms216 caught a few reproc_lowprio jobs, they will be over soon. * any word on new SL6 machines for CERN? * What machine did we finally decide to reshoot to SL6 for restesting? * cmssrv95 (old StoreResults) * also cmssrv112, 98 are good candidates once we get our new machines. ---++ Workflows ---+++ ReDigi * pdmvserv_EXO-Phys14DR-00009_00001_v0__141017_161834_6747 - CMSSW failure https://cmslogbook.cern.ch/elog/Workflow+processing/17282 ---+++ miniaod's * cleared out ---+++ Rereco * nothing... literally ---+++ Store Results * NTR ---+++ MonteCarlo * running smoothly - had to extend a couple workflows but that is it ---++ SL6 testing/backfill * Monitoring scripts are not grabbing the Production - aka SL6 WF's properly And need to be modified - Luis ---++ RelVal Andrew * why is it possible to move aborted to rejected. Andrew says it is possible but it shouldn't be. Aborted should only move to aborted archived -- Main.JenniferAdelmanMcCarthy - 22 Oct 2014
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r4 - 2014-10-23
-
JenniferAdelmanMcCarthy
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback