Workflow Team Meeting - Jan 28 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Gaston, SeangChan, Jorge
  • US: Matteo, JeanRoch
  • CERN : Dima, Rokas- site support

Personnel

  • Rokas - new site support team member
  • New Julian starting Feb 1
  • JR in Zurich 18-20
  • Jen to CERN Feb 29-March 4
  • Possible training sessions Feb 8-12 - ND student Alison, Matteo, Paola/Kathrine

News - Dima

  • Not much to say
  • How can we use resources for disk space and transfer to get things done. We are running at 1/2 resources
  • Monster request - 2 wks of all cpu
  • we have digi-reco coming, but output
  • looks like samples 400M events in requests

3 top issues effecting production

  • couple "bad" T2's seem to be the cause of most of our task chain and miniaod issues Estonia, EURJ, NCP, Ioannioa Working with Gaston to track down issues
    • esp bad for miniaods when the data is transferred to only an unreliable site for running and then we can't get anything to merge, any reason we don't limit list to "good" sites if they require input and run only at 1-2 sites?
  • white list issues
  • Why acquired jobs are not getting into GlideinWMS: acquired_job_priority_history.png

Site support - Gaston

  • Current Waiting Room:
    • T2_RU_IHEP, T2_RU_SINP, T2_TH_CUNSTDA, T2_RU_INR .
  • Curent Morgue:
    • T2_PL_Warsaw, T2_TR_METU, T2_RU_PNPI, T2_RU_ITEP, T2_MY_UPM_BIRUNI, T2_RU_RRC_KI
  • Out of the Waiting Room:
    • T2_IN_TIFR

Transfers - Jorge

Workflows

  • when we have digi-reco and we don't save intermediate steps we are reading files from store/unmerged
  • proposing all acdc go to -all sites we will trust wmagent to communicate with ssb to do the right thing
  • Dima has noticed that we have workflows sitting in acquired for a long time and not starting

ReDigi

TaskChains

  • Some "new" issues have popped up, will see if they clear on acdc

StepChain

Rereco

  • Last Christmas Production WF's have been passed along!!!

Store Results

MonteCarlo

Agent Issues

Agent redeployment

production SL6
FNAL CERN
cmsgwms-submit1 (up) vocms0308 (up)
cmsgwms-submit2 (ready to redeploy) vocms0309 (up)
cmssrv217 (up) vocms0310 (up)
cmssrv218 (redeployed) vocms0311 (ready to redeploy)
cmssrv219 (drain redeployed) vocms0304 (on HLT tests)
  vocms0303 (up / highprio)

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

Automatic Assignment And Unified Software

  • We need documenation!!!!!!! Matteo is working on it, will continue to look at it.
  • will be worked on when we are training in Alison

AOB

-- JenniferAdelmanMcCarthy - 2016-01-27

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-01-28 - DmytroKovalskyi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback