Workflow Team Meeting - May 14 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, SeangChan
  • US: Ajit, Ian
  • CERN : Julian,
  • EU:

Personel

  • Luis to Colombia around May 1-15th, working remotely (very little)
    • John will be in charge of the T0 while Luis is gone
  • Matteo will be at CERN May 4-14
  • Jen will not be available the weekend of 16-17

News - DIMA

  • We have tons of work: new 74X DigiReco campaign has started.

3 top issues effecting production

  • Once the memory limits on the workflows were adjusted, things have been running smoothly
  • Workflows that were stuck in acquired due to having sites in both the white and black list
    • we have a hacked in work around for this right now, we need to remember to tell Alan when these WF's are done so he can rip it back out again *one 2023 needs to be cloned and memory increased

Site support - John

Workflows

ReDigi

  • Newest campaign RunIISpring15DR74
    • Looks like out test WF's are running smoothly. Chatted with Dave on Wed, considering we have no MC going on, and a LOT of Redigi to do, was wondering about the possibility of running some of the low priorirty work on T2's? Work we can afford to have failures on and re-do again if we need to. Discuss
    • we should identify some low priority workflows whose input doesn't include PU and is relatively light so it stands a chance of succeeding and give it a try.
    • try low and slow, Madison might be a good place to start so we can Kill Ajit's site and we have him to help us figure out how to do this.

TaskChains

  • is now pretty much MC processing
  • everything is flowing just fine

Rereco

  • ongoing cosmic processing where are we? Julian will take a look tomorrow.

Store Results

  • there are 2 in failed, can we move them to rejected? OK Julian is doing so

MonteCarlo

  • nothing to report

Agent Issues

  • Only one weird thing, every 2-3 days we are seeing PhEDEx is getting behind and closeout is getting delayed, looks like PhEDEx injector gets stuck and it needs to be restarted it's not giving an error. Next time Julian sees this he will bug Alan and SeangChan
  • Alan - issue with condor ssh to job not ever going way agents were getting overloaded, SeangChan will look for elog
    • may be related to the condor version rather than the ssh version
    • known issue, Brian said they fixed and it will come out in the next version of Condor

Redeployment Plan

  • next month after production deployment we will test new version and it will include a couple things that Andrew wants. We are currently testing and debugging. Separate the complete workflow and cleaning up of workflows.

RelVal Andrew

  • Issue about dataset name checking, did it get discuessed what to do about it? Yes check both sides when we create the request, not deployed this month. new parameter for ignoreing the fact that we are writing to same dataset when we do ACDC/extend workflows.

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources - Stefan

HLT

SDSC

Automatic Assignment And Unified Software

AOB

Last Agent status

-- JenniferAdelmanMcCarthy - 2015-04-30

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-05-14 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback