Workflow Team Meeting - Feb 18 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Jorge, Gaston, Matteo, Dirk, Dave, Eliana
  • US:
  • CERN : Paola, Dima, Alan

Personnel

  • JR in Zurich Feb 18-20
  • Jen to CERN Feb 29-March 4 - tickets are booked!
  • Jorge to Columbia April 15-May 2, Talk on April 27

News - Dima

  • nothing new
    • get ready and prepaired for ichep campain - multicore, fast sim etc
    • automatic splitting in WMCore is not taking advantage in request, we have been having to create bigger jobs and then do the split
    • time for event was a global number no matter how many cores, was a mess we need to make a change to take What we want is time per event per core
    • we will make the change in WmCore, we need to make sure that the ACDC and Recovery workflows are inheriting all the info from the parent job
  • plans for deployment, and changes for ReqMgr2
    • Dima, Christoph^2, Dave, SeangChan, Jen, JR Need to have a meeting, we need to have MCM people in the meeting.
    • there is some code parsing web pages, and that is not available in ReqMgr2, looking for the info that we need for that info. Right now he is creating the exact same web page in ReqMgr2. Is there a better way to get the info? Are the scripts that use that code still being used? Can we setup ops scripts against testbed.

3 top issues affecting production

  • We need to fix Pileup, the way we read it saturates the networks everywhere causing things to break (Dave talking to Dima)
    • The way that CMS is doing pileup is consuming a huge amount of bandwidth. We are using 50-60% over pledge which is causing high failure rates on file read (xrootd) How this is done is a Physics decision, but it is affecting production and we will have to throttle back the resources that we can use because the network is saturated otherwise. Reading 30GB/sec out of storage. It's hitting KIT even running at pledge. ICHEP has to be premixed. Test is already setup for doing premix, otherwise we need to do the throttling because of the AAA node. CMS has to stop abusing the networks, only when we exceed the pledged resources when we have this problem but we can't get our work done unless we do it. The test is coming we need to have it premixed or getting the data for ICHEP will risk success for ICHEP. What is the nature of the test? They want to compare 2 samples done different ways, one with premixing and one the old way so they can compare results. It's in 8.0 Dima will make it clear when it is coming so Dave can watch the network and see how it is doing.

  • switch to request manger 2 - we need to go through each and every script and determine if it is still used...
  • fabozzi_Run2015A-ZeroBias-boff-27Jan2016_763p2_160128_144915_7389
    • I think we've beaten this file to death, and are going to have to declare the last lumi unprocessable so I put it in bypass, we need to mark this lumi as bad??
    • Probably just a release issue, declare victory, we should hand to the framework people to look at. Dima will turn it over to David Lang, send to compops list.
    • https://ggus.eu/index.php?mode=ticket_info&ticket_id=119428

Site support - Gaston

  • No changes in Waiting Room / Morgue this week.

Transfers - Jorge

  • files disapearing before data is transferred.
  • nothing new, just the issue with try to produce a dataset at KIT and transfer it and then check to see if files are missing while file are fresh to see if we can.
  • setup backfill to KIT - check logs

Workflows

ReDigi

MiniAOD

*

TaskChains

StepChain

Rereco

Store Results

  • NA

MonteCarlo

Agent Issues

  • had some database/cmsweb issues over the weekend, but they are behaving now.

Agent redeployment

  • no news
  • we have to redeploy 217 , 310, 308
  • after these are starting a new version of the agents in March, should

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

  • Dima and Alan - we have one site in the T3 Syracuse - we we have the local link there we should send test workflow there. How do we do that?
    • once it is setup we will set up 213 to run there, Ajit needs to talk to Farruk to get the mapping done properly in the Front end

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2016-02-17

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2016-02-18 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback