https://indico.cern.ch/conferenceDisplay.py?confId=254690

Attending

Alan, Andrew, Adli, Julian
Jen, Luis, Dave, Seangchan

Shift

jan 21 - jan 27 Xavier
jan 28 - Feb 3 next Shift. Adli

Agent Issues

  • Threshold setup for T1
    • Lower thresholds for merge jobs.
    • Higher thresholds for FNAL.
    • New script:
      • It will run as a cron job
      • pull info about thresholds from site status board
      • too many merge jobs KIT - Luis set up.
        200 max running merge jobs on the same T1.
        Keep and eye, because it was probably a backlog.
  • IN2P3 disk/tape separated site 12281:
    • Should we switch configuration inside the agents?
    • Then, what do we do about the unfinished workflows?
      • See if there is damage.
      • Kill and rerun workflows if they are bad damaged
      • Recover if workflows were small.
  • Few reprocessing jobs running at FNAL at the start of the week. (solved)
  • PhedexInjector crashing 12242
    • Patch the rest agents - Julian will apply.
  • Couch disk space issue on vocms235 12156
  • Stuck in acquired workflows, few workflows with problems - most probably related with FNAL being at top capacity.
  • Aborted workflows cleanup: When moved to rejected it looses the worklow summary.
    • Loosing the workload summary.
  • Stuck in acquired workflows, few workflows with problems - most probably related with FNAL being at top capacity.

Site Issues that affected workflows

  • FNAL stageout error, still issues 12211
  • PIC Storage Element name error 12077
  • can we use T2_TH_CUNSTDA and T2_IN_TIFR now? SR in the waiting room for a long time. Wait still some more time.
  • Pledges view
    • Pledge [cores] - Prod Pledge [cores] - Real [cores]
    • Reprocessing Thresholds
    • Merge Thresholds (different for MC & Reprocessing?)

Workflows Issues

Rereco/Redigi

  • KIT Summer12DR re-do's
  • Fall11R1 requests assigned.... and failed.
  • Few Redigi running with still work pending * we currently have 57 Redigi WF's in complete waiting action
    • IN2P3 - Disk tape separation issues: 14 [[https://cmslogbook.cern.ch/elog/Workflow+processing/12274][12274] WF's are complete with no errors but not 100% what do we do? rerun?
    • PIC - HIG-Fall11R1-015* : 8 12286 open block issues, ACDC didn't work
    • BTV WF's (various sites) : 23 12200 xrootd and raw file issues
    • WF's with more than 100% done 6 - Julian is looking at these now
    • pdmvserv_HIG-Fall11R2-01424_T1_ES_PIC_MSS_00019_v0__140120_140109_7919 [https://cmslogbook.cern.ch/elog/Workflow+processing/12292][12292]] no errors but only 46% done
    • alahiff_EXO-Fall13dr-00155_T1_US_FNAL_MSS_00027_v0__140117_104822_1632 12293 no errors only 94% done
    • pdmvserv_BPH-Fall13dr-00003_T1_DE_KIT_MSS_00049_v1_tsg_131219_182944_2299 - ACDC running
    • pdmvserv_BPH-Fall13dr-00011_T1_DE_KIT_MSS_00051_v1_tsg_131219_183626_9169 - no errors 90% done but KIT do we just rerun?

  • we currently have 57 Redigi WF's in complete waiting action
    • IN2P3 - Disk tape separation issues: 14 [[https://cmslogbook.cern.ch/elog/Workflow+processing/12274][12274] WF's are complete with no errors but not 100% what do we do? rerun?
    • PIC - HIG-Fall11R1-015* : 8 12286 open block issues, ACDC didn't work
    • BTV WF's (various sites) : 23 12200 xrootd and raw file issues
    • WF's with more than 100% done 6 - Julian is looking at these now
    • pdmvserv_HIG-Fall11R2-01424_T1_ES_PIC_MSS_00019_v0__140120_140109_7919 [https://cmslogbook.cern.ch/elog/Workflow+processing/12292][12292]] no errors but only 46% done
    • alahiff_EXO-Fall13dr-00155_T1_US_FNAL_MSS_00027_v0__140117_104822_1632 12293 no errors only 94% done
    • pdmvserv_BPH-Fall13dr-00003_T1_DE_KIT_MSS_00049_v1_tsg_131219_182944_2299 - ACDC running
    • pdmvserv_BPH-Fall13dr-00011_T1_DE_KIT_MSS_00051_v1_tsg_131219_183626_9169 - no errors 90% done but KIT do we just rerun?

Monte-Carlo

RelVal (Andrew's question)

DBS Migration Plan

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2014-01-23 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback