Workflow Team Meeting - June 4 4PM CERN, 9 FNAL time

Vidyo Link

Attending

Personel

  • Julian Off June 8-10
  • Jen off June 27-July5 (tenitive)
  • Jen off Aug 10-26 (tenitive )
  • SeangChan July 27-31

News

  • We have Beam! OK no magnets (yet????) but data taking has started again.. are we ready?
    • Luis give us an update.. how are our magnets? We may have Magnets by Sunday
  • T0 would like to use T1 resources for testing. Can they run multicore?
    • T0 can put in their work with high priority
  • OK to do Condor upgrades on submit2, just shut down agent, do the upgrade and bring it back up. No reason to drain it.
  • Requests and Campaigns
    • 74DR Monte Carlo is close to be done for the bulk. 200-300M of events that were invalidated are expected to be resubmitted in a week or so.
    • Nothing major is expected in near future - life may get easier, but we do have a number of incomplete requests still in production.
    • just finishing tails
  • there may be a re-running of miniaod, high IO but low cpu time, 1.5 billion events
  • We are back to needing US operators, with data taking starting again, we need to look over what we are doing and possibly re-define roles

3 top issues effecting production

  • Clearing rest of xrootd issues at RAL
  • closeout script crashing
    • it seems like it crashes on one particular WF at a time, can we put some debugging into the script, like saying "now looking at X" so if it crashes we know what WF to look at?
  • missing miniaod, problem with robust merge. We are getting them on clone. solve it by setting merge threshold for miniaod higher???
    • could be a good chance to get our recovery script running again.

Site support - John

ESTONIA, UERJ, CUNSTDA

Waiting Room

# Site New (In) Out Total Week SAM Exit Code HC Exit Code Links Disk pledge (TB) Real cores Ticket ID No reply
1 T2_AT_Vienna     4 org.sam.CONDOR-JobSubmit (_cms_Role_production)     500.0 500 113641  
2 T2_UA_KIPT     3 org.cms.WN-frontier (_cms_Role_lcgadmin)
org.cms.WN-mc (_cms_Role_production)
org.cms.WN-squid (_cms_Role_lcgadmin)
org.cms.WN-swinst (_cms_Role_lcgadmin)
org.cms.WN-xrootd-access (_cms_Role_lcgadmin)
org.cms.WN-xrootd-fallback (_cms_Role_lcgadmin)
org.cms.glexec.WN-gLExec (_cms_Role_pilot)
  Good T2 links to T1s: 2/12 550.0 400 113781  
  TOTAL             1050 900    

Workflows

* Back down to a low number of WF's in the queue, did we really blow through our month of solid work so fast or did something fall through the cracks?

ReDigi

TaskChains

Rereco

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

RelVal Andrew

  • Is SeangChan there -yes... wink
  • what is the highest priority github issue? dealing with MaxRSS/task - it's pushed into testbed today
  • injecting files into - many WF's finish in one day, but log files are not available for another day. SeangChan needs to look to see how hard of an issue this is. If it's easy it's in otherwise it has to wait until we have fewer issues on our plate.

  • testbed validation, we don't know where output files are going for T2_CH_CERN, not providing enough memory and it's not multicore

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources - Stefan

HLT

SDSC

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2015-06-04
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-06-04 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback