EVERYONE needs to start working on the paper more.


FNAL-Jen, Luis, John CERN: Edgar, Andrew

Issues last week

  • issues with 237 - where do we currently stand ?
  • issues with PhEDEx over the weekend - Oli in Taipai causing delays in subscriptions


  • Sep 3 --> Sep 10 Sunil
  • Sep 10 --> Sep 17 Xavier

Site Issues

  • FNAL in drain for Upgrades
  • FNAL in load shed due to weather
  • Bristol and Belgium have been fixed
  • Russian sites are in green they should be fine
  • IRFU - they have a group of 6 sites with different SE's, only 4 of the sites agreed to support CMS
    • They spilt their two sites into IRFU GRIF that have separate worker nodes so we should be able to use them
    • they have a combined pledge not individual pledges for processing

Sites for Production

No changes in waiting room from last week


  • 235 in drain we do not fully under stand what is going on with agents
    • couch is dying not sure what is going on
    • changed polling cycle to 10 min waiting for Seanchang to look at it
  • 216 - couch issues limited running /pending jobs not sure what is causing couch issues
  • 201 - changed job updater pollers and max numbers but is still unstable
  • 237 is fixed - for sure...
  • 113 - JobStatusLite is down Andrew will restart it



  • we noticed there were jobs getting killed because the RSS was 3G but when when we looked in the runlog it says it used 2.5 G and the condorlogs it is showing 3G
  • error was not repeatable, not sure if it was a machine issue or if it landed on a workernode that was not up to date
  • happened several times on different Workflows, not sure if it was the same site or workernode,
  • Andrew will try to track down workernode and see if that is the issue and if it needs to be updated
  • submit failed error - looked at condor logs for merge jobs and doesn't see anything wrong with it.
    • where is it staging out, Edgar told him to go look to see if he can see the files manually

IEEE Paper

Draft Outline #1

  • Introduction (Why we need to run so much simulations, why we need to do a rereconstruction of the data) (Edgar/Jen)
  • a brief discussion of what the different types of workflows are, and how they are processed differently (Diego/Jen/Edgar)
  • monitoring for T1 & T2 sites(Diego/Jen/Edgar)
  • How we ran prior to 2011
    • ProdAgent vs WMAgent ( Diego/Alan) (Focus on differences and improvements)
    • Reprocessing and Production (Jen/Xavier) (How this was handled with ProdAgent and why the need to move to another framework
    • How we ran with WMAgent (after 2011)
  • WMAgent /ReqMgr/Workqueue (Diego/Edgar/Alan) General comment on how it works * PREP/ReqmG Interaction (Vincenzo?) * Organization of the workflow team and operations around it (Edgar)
  • Achievements
  • Events reconstructed (L3s)
  • Usage of the grid (Edgar/Jen/L3s)
  • Conclusions / Outlook (Edgar/Jen)

Action Items

  • Write twiki disk/tape separation T1_IT_CNAF. Edgar
  • Recovery workflows - Jen - suspend
    • first 2 workflows are completely through and now we are waiting for people to really look and make sure that there are no show stoppers before we do the other 50.
    • Guillemo is bothering JeanRoc about if people have actually looked at the data
  • A new state for completed and already dealt with ACDC.
  • How many workflows running, pending, waiting, stuck
    • Is it documented yet? yes
    • Luis is working on a script to pull these numbers automatically. - script done but we are still tweeking it
  • solve the problem of how to use a non-production scram architecture (waiting for Alan to come back)
  • Updating documentation on scripts with github now that we aren't using svn anymore
    • docuentation needs to be updated and everyone needs to start ramping up on github


  • Diego will continue to work in the paper
  • problem with the creation of WF's if you change the number of files per job
  • CouchDB will be rotated on Wed so we will be running without being able to watch WMStats on Wed/Thurs should be back Friday
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2013-09-10 - JenniferAdelmanMcCarthy
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback