Week from 31082009 to 07092009

Job Statistics

  • Summary:
    • Almost 260 K jobs run last week
    • Over 15% failed
    • Daily peak of over 2 K jobs
    • 206 K Production jobs run to end
    • 13 K User jobs run to the end
    • 31 K Production Jobs Failed
    • 9 K User Jobs Failed

  • Total number of Jobs by Final Major Status
Total_Number_of_Jobs_by_FinalMajorStatus.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s9:_typeNames3:Jobs9:_groupings16:FinalMajorStatuse )

  • Daily number of Jobs by Final Mayor Status
Daily_Number_of_Jobs_by_FinalMajorStatus.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames12:NumberOfJobss13:_timeSelectors5:86400s9:_typeNames3:Jobs9:_groupings16:FinalMajorStatuse )

  • Done|Completed Jobs by User Group
DoneComplete_Jobs_by_UserGroup.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s17:_FinalMajorStatuss14:Completed,Dones9:_typeNames3:Jobs9:_groupings9:UserGroupe )

  • Done|Completed Production Jobs by Job Type
DoneComplete_Production_Jobs_by_JobType.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s10:_UserGroups9:lhcb_prods17:_FinalMajorStatuss14:Completed,Dones9:_typeNames3:Jobs9:_groupings7:JobTypee )

  • Failed Jobs by User Group
Failed_Jobs_by_UserGroup.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s17:_FinalMajorStatuss6:Faileds9:_typeNames3:Jobs9:_groupings9:UserGroupe )

  • Failed Production Jobs by Minor Status
Failed_Production_Jobs_by_MinorStatus.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s10:_UserGroups9:lhcb_prods17:_FinalMajorStatuss6:Faileds9:_typeNames3:Jobs9:_groupings16:FinalMinorStatuse )

  • Failed User Jobs by Minor Status
Failed_User_Jobs_by_MinorStatus.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s10:_UserGroups9:lhcb_users17:_FinalMajorStatuss6:Faileds9:_typeNames3:Jobs9:_groupings16:FinalMinorStatuse )

Running at Tier1's

  • Summary:
    • 62 K Production Jobs at Tier1s
      • 21 % CERN share
      • 20 % CNAF share
      • 0 % GRIDKA share
      • 15 % IN2P3 share
      • 28 % NIKHEF share
      • 8 % PIC share
      • 9 % RAL share
    • 13 K User Jobs at Tier1s
      • 34 % CERN share
      • 17 % CNAF share
      • 0 % GRIDKA share
      • 19 % IN2P3 share
      • 15 % NIKHEF share
      • 5 % PIC share
      • 10 % RAL share

  • Done|Completed Production Jobs by Site
DoneComplete_Production_Jobs_at_Tier1_by_Site.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s10:_UserGroups9:lhcb_prods5:_Sites86:LCG.CERN.ch,LCG.CNAF.it,LCG.GRIDKA.de,LCG.IN2P3.fr,LCG.NIKHEF.nl,LCG.PIC.es,LCG.RAL.uks17:_FinalMajorStatuss14:Done,Completeds9:_typeNames3:Jobs9:_groupings4:Sitee )

  • Done|Completed User Jobs by Site
DoneComplete_User_Jobs_at_Tier1_by_Site.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors5:86400s10:_UserGroups9:lhcb_users5:_Sites86:LCG.CERN.ch,LCG.CNAF.it,LCG.GRIDKA.de,LCG.IN2P3.fr,LCG.NIKHEF.nl,LCG.PIC.es,LCG.RAL.uks17:_FinalMajorStatuss14:Done,Completeds9:_typeNames3:Jobs9:_groupings4:Sitee )

Job Failure Analysis

  • Summary:
    • Production Jobs Failed mostly due to:
      • Application Finished With Errors everywhere (29.00 K)
      • Exception During Execution mostly at LCG.GLASGOW.uk (0.78 K from 0.87 K)
      • Watchdog identified this job as stalled everywhere (0.76 K)
      • Pending Requests everywhere (0.35 K)
    • User Jobs Failed mosty due to:
      • Input Data Resolution everywhere (3.85 K)
      • Application Finished With Errors everywhere (3.27 K)
      • Watchdog identified this job as stalled mostly at LCG.PIC.es (0.41 K from 1.01 K)
      • Uploading Job Outputs mostly at LCG.CERN.ch (0.53 K from 0.65 K)

  • Failed Production Jobs (Application Finished With Errors) by Site
Failed_Production_Jobs_Application_Finished_With_Errors_by_Site.png
  • Failed Production Jobs (Exception During Execution) by Site
Failed_Production_Jobs_Exception_During_Execution_by_Site.png
  • Failed Production Jobs (Watchdog identified this job as stalled) by Site
Failed_Production_Jobs_Watchdog_identified_this_job_as_stalled_by_Site.png
  • Failed Production Jobs (Pending Requests) by Site
Failed_Production_Jobs_Pending_Requests_by_Site.png
  • Failed User Jobs (Input Data Resolution) by Site
Failed_User_Jobs_Input_Data_Resolution_by_Site.png
  • Failed User Jobs (Application Finished With Errors) by Site
Failed_User_Jobs_Application_Finished_With_Errors_by_Site.png
  • Failed User Jobs (Watchdog identified this job as stalled) by Site
Failed_User_Jobs_Watchdog_identified_this_job_as_stalled_by_Site.png
  • Failed User Jobs (Uploading Job Outputs) by Site
Failed_User_Jobs_Uploading_Job_Outputs_by_Site.png

  • Failed Jobs at CERN by Minor Status
Failed_Jobs_at_CERN_by_MinorStatus.png
  • Failed Jobs at CNAF by Minor Status
Failed_Jobs_at_CNAF_by_MinorStatus.png
  • Failed Jobs at IN2P3 by Minor Status
Failed_Jobs_at_IN2P3_by_MinorStatus.png
  • Failed Jobs at NIKHEF by Minor Status
Failed_Jobs_at_NIKHEF_by_MinorStatus.png
  • Failed Jobs at PIC by Minor Status
Failed_Jobs_at_PIC_by_MinorStatus.png
  • Failed Jobs at RAL by Minor Status
Failed_Jobs_at_RAL_by_MinorStatus.png

Hardware Status

  • Various volhcb09:
    • CPU utilization: Idle > 60%.
    • Network utilization: less than 500 K.
    • Swap Used: less than 500 M.
    • Partition Used: Stable at 50 G.

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=1d&entity=volhcb09&detailed=yes )

volhcb09_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb09_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb09_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb09_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

* DMS volhcb10:

    • CPU utilization: Idle < 20%.
    • Network utilization: Mostly less than 200 K.
    • Swap Used: about 200 K.
    • Partition Used: Stable at 92 G.

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=1d&entity=volhcb10&detailed=yes )

volhcb10_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb10_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb10_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb10_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

  • LogSE volhcb06:
    • CPU utilization: Idle < 40%.
    • Network utilization: Peak is 10.6 M.
    • Swap Used: about 200 K.
    • Partition Used: Almost stable.

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=1d&entity=volhcb06&detailed=yes )

volhcb06_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb06_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb06_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb06_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

* WMS volhcb13:

    • CPU utilization: Idle > 60%.
    • Network utilization: less than 1 M.
    • Swap Used: about 200 K.
    • Partition Used: Almost stable.

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=1d&entity=volhcb13&detailed=yes )

volhcb13_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb13_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb13_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb13_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

-- JiboHE - 2009-09-07

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2009-09-07 - He
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback