https://indico.cern.ch/conferenceDisplay.py?confId=254680

Attending

Fermilab - Dave M, SeangChan, Luis, Jen CERN: Adli, Julian, Andrew Xaviar

Personel

  • coming off shift: Sunil
  • coming on shift: Sunil
  • Shift schedule until Jan 7 done: EU Shifts
  • Julian's holidays: (dec 27th to 29th) and (jan 7th to 11th)
  • list of everyone's holidays? (Cern closeout)
    • CERN closed Dec 22-Jan6

WMAgent issues:

  • vocms174 heavy load:
    • Monitoring script bug?: https://cmslogbook.cern.ch/elog/Workflow+processing/11157
    • this script is the one that reports the alarms to dashboard. Without it dashboard is blind.
    • gets the pledge and number of running jobs and feeds it into the dashboard and computs alarms
    • script takes 3-4 min/site so 60 sites.... so 3-4 hrs to run
    • re-write into python or perl Julian. Xavier will give talk to Julian about it so we can revamp this.
  • DBS3 missing blocks due to missconfiguration.
    • first migration failed but 2nd one was successful - issue can be closed out
  • Collector Negotiator priorities set: RelVal same priority than HighPrio Reproc
  • WF's running outside the white list: https://cmslogbook.cern.ch/elog/Workflow+processing/11175
    • Luis and Seangchan will work with that after the meeting.

Workflow issues:

  • Stuck completed-workflows: PhedexInjector issue.
    • Do we need to run phedexFix?
  • Step0 at CERN: taking too long to close out.
  • Closeout script is taking a LONG time to run, and we are still getting timeout to dbs errors
  • ReReco recovery status:
franzoni_Fall53_2011B_MuOnia_Run2011B-v1_Prio2_5312p1_130916_235459_5746 recovery step 2/5
franzoni_Fall53_2011B_PhotonHad_Run2011B-v1_Prio2_5312p1_130916_235450_3606 recovery step 1/3
linacre_Fall53_2011B_DoubleMu_Run2011B-v1_Prio1_5312p1_131014_173309_8207 100% - closed out
linacre_Fall53_2011A_MinimumBias2_Run2011A-v1_Prio2_5312p1_131106_191345_6976 recovery 1/3
linacre_Fall53_2011B_MuEG_Run2011B-v1_Prio1_5312p1_131028_194646_8732 100% - backfill can I just close this out?
franzoni_Fall53_2011A_PhotonHad_Run2011A-v1_Prio2_5312p1_130916_235401_5280 recovery 2/3
franzoni_Fall53_2011B_MultiJet_Run2011B-v1_Prio2_5312p1_130916_235344_733 recovery 1/3
linacre_Fall53_2011A_SingleMu_Run2011A-v1_Prio1_5312p1_131014_173248_9770 recovery 5/12
franzoni_Fall53_2011A_HT_Run2011A-v1_Prio1_5312p1_130916_235328_2218 recovery 2/4
franzoni_Fall53_2011A_MinimumBias_Run2011A-v1_Prio2_5312p1_130916_235256_6562 recovery 1/5
franzoni_Fall53_2011A_MET_Run2011A-v1_Prio2_5312p1_130916_235241_8492 manually closed out - https://cmslogbook.cern.ch/elog/Workflow+processing/11158
linacre_Fall53_2011B_Photon_Run2011B-v1_Prio1_5312p1_131014_173330_3061 recovery 3/6
franzoni_Fall53_2011B_TauPlusX_Run2011B-v1_Prio2_5312p1_130916_235152_7420 recovery 1/2
franzoni_Fall53_2011A_TauPlusX_Run2011A-v1_Prio2_5312p1_130916_235135_3200 recovery 1/3
franzoni_Fall53_2011A_SingleElectron_Run2011A-v1_Prio1_5312p1_130916_235126_4483 recovery 6/7
linacre_Fall53_2011B_DoubleElectron_Run2011B-v1_Prio1_5312p1_131014_173259_942 recovery 1/6
franzoni_Fall53_2011B_MinimumBias_Run2011B-v1_Prio2_5312p1_130916_235110_4471 recovery 1/6
franzoni_Fall53_2011A_MuHad_Run2011A-v1_Prio2_5312p1_130916_235102_3244 recovery 3/3
linacre_Fall53_2011B_SingleMu_Run2011B-v1_Prio1_5312p1_131014_173340_3758 recovery 4/12
franzoni_Fall53_2011A_MuOnia_Run2011A-v1_Prio2_5312p1_130916_235034_9966 recovery 1/5
franzoni_Fall53_2011B_MET_Run2011B-v1_Prio2_5312p1_130916_234955_4564 recovery 1/1
franzoni_Fall53_2011B_MuHad_Run2011B-v1_Prio2_5312p1_130916_234948_412 recovery 1/3
linacre_Fall53_2011B_ElectronHad_Run2011B-v1_Prio2_5312p1_131014_173321_5822 recovery 3/3
franzoni_Fall53_2011A_MultiJet_Run2011A-v1_Prio2_5312p1_130916_234935_6377 recovery 2/3
franzoni_Fall53_2011B_SingleElectron_Run2011B-v1_Prio1_5312p1_130916_234926_3581 recovery 1/7
linacre_Fall53_2011B_SingleElectron_Run2011B-v1_Prio1_5312p1_131028_194633_1883 100% backfill

AOB:

  • DBS3-Only test:
    • jbadillo_TestDBSOnly_131111_171701_593
  • WmAgentScripts code tide-up:
    • Cleaning.
    • Documenting (is it correctly spelled?) - will happen Wed 4PM CERN time 9AM FNAL time via skype SeangChan, Luis and Julian will be there
    • What should go inside WMCore - ReqMgr?
    • What should be running as a cronjob?
    • Separate meeting.

-- JenniferAdelmanMcCarthy - 12 Nov 2013

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2013-11-12 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback