Week from 2009-04-20 to 2009-04-26

Job Statistics

(Follow the URLs, change the dates, save the new plot and add as attachement, then update the images)

  • Total number of Jobs by Final Major Status
Total_Number_of_Jobs_by_FinalMajorStatus.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s9:_typeNames3:Jobs9:_groupings16:FinalMajorStatuse)
  • Daily number of Jobs by Final Mayor Status
Daily_Number_of_Jobs_by_FinalMajorStatus.png

(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames12:NumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s9:_typeNames3:Jobs9:_groupings16:FinalMajorStatuse)

  • Done|Completed Jobs by User Group
Done+Complete_Jobs_by_UserGroup.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s17:_FinalMajorStatuss14:Completed,Dones9:_typeNames3:Jobs9:_groupings9:UserGroupe)

  • Done|Completed Production Jobs by JobType
Done+Complete_Production_Jobs_by_JobType.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_prods17:_FinalMajorStatuss14:Completed,Dones9:_typeNames3:Jobs9:_groupings7:JobTypee)

  • Failed Jobs by User Group
Failed_Jobs_by_UserGroup.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s17:_FinalMajorStatuss6:Faileds9:_typeNames3:Jobs9:_groupings9:UserGroupe)

  • Failed Production Jobs by Minor Status
Failed_Production_Jobs_by_MinorStatus.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_prods17:_FinalMajorStatuss6:Faileds9:_typeNames3:Jobs9:_groupings16:FinalMinorStatuse)

  • Failed User Jobs by Minor Status
Failed_User_Jobs_by_MinorStatus.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_users17:_FinalMajorStatuss6:Faileds9:_typeNames3:Jobs9:_groupings16:FinalMinorStatuse)

  • Summary:
    • Almost 100 K jobs run last week
    • Over 10% failed
    • Daily peak of over 25 K jobs
    • 60 K Production jobs run to end
    • 24 K User jobs run to the end
    • > 13 K Failed Jobs

Running at Tier1's

  • Done|Completed Production Jobs by Site
Done+Complete_Production_Jobs_at_Tier1_by_Site.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_prods5:_Sites86:LCG.CERN.ch,LCG.CNAF.it,LCG.GRIDKA.de,LCG.IN2P3.fr,LCG.NIKHEF.nl,LCG.PIC.es,LCG.RAL.uks17:_FinalMajorStatuss14:Done,Completeds9:_typeNames3:Jobs9:_groupings4:Sitee)

  • Done|Completed User Jobs by Site
Done+Complete_User_Jobs_at_Tier1_by_Site.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_users5:_Sites86:LCG.CERN.ch,LCG.CNAF.it,LCG.GRIDKA.de,LCG.IN2P3.fr,LCG.NIKHEF.nl,LCG.PIC.es,LCG.RAL.uks17:_FinalMajorStatuss14:Done,Completeds9:_typeNames3:Jobs9:_groupings4:Sitee)

  • Summary:
    • 24 K Production Jobs at Tier1s
      • Shares?
    • 15 K User Jobs at Tier1s
      • 50 % CERN Share
      • Very small shared for NIKHEF

Job Failure Analysis

(Change Error and User Group as Appropriated)

  • Failed Production Jobs (Application Finished With Error) by Site
Failed_Production_Jobs_Application_Finished_With_Errors_by_Site.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_prods17:_FinalMajorStatuss6:Faileds17:_FinalMinorStatuss32:Application%20Finished%20With%20Errorss9:_typeNames3:Jobs9:_groupings4:Sitee)

  • Failed Production Jobs (Input Sandbox Download) by Site
Failed_Production_Jobs_Input_Sandbox_Download_by_Site.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_prods17:_FinalMajorStatuss6:Faileds17:_FinalMinorStatuss22:Input%20Sandbox%20Downloads9:_typeNames3:Jobs9:_groupings4:Sitee)

  • Failed User Jobs (Input Data Resolution) by Site
Failed_Users_Jobs_Input_Data_Resolution_by_Site.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_users17:_FinalMajorStatuss6:Faileds17:_FinalMinorStatuss21:Input%20Data%20Resolutions9:_typeNames3:Jobs9:_groupings4:Sitee)

  • Failed User Jobs (Application Finished With Error) by Site
Failed_Users_Jobs_Application_Finished_With_Errors_by_Site.png
(https://lhcbweb.pic.es/DIRAC/LHCb-Production/lhcb_prod/systems/accountingPlots/job#ds9:_plotNames17:TotalNumberOfJobss13:_timeSelectors2:-1s10:_startTimes10:2009-04-19s8:_endTimes10:2009-04-26s10:_UserGroups9:lhcb_users17:_FinalMajorStatuss6:Faileds17:_FinalMinorStatuss32:Application%20Finished%20With%20Errorss9:_typeNames3:Jobs9:_groupings4:Sitee)

  • Summary:
    • Production Jobs Failed mostly due to:
      • Application Finished with Error everywhere (5K)
      • Input Sandbox Download mostly at GRIDKA (3K)
    • User Jobs Failed moslty due to:
      • Input Data Resolution mostly at GRIDKA (3K)
      • Application Finished with Error (1.5K)
    • 50% of Failed jobs at GRIDKA

Hardware Status

  • WMS volhcb09:
    • Huge IOWait 23, 24, 25 (source identified, LoggingAgent making a huge MySQL query)
    • swap usage close to limit (2GB).
(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=0&entity=volhcb09&detailed=yes)
volhcb09_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.pngvolhcb09_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb09_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.pngvolhcb09_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png
[root@volhcb09 ~]# df -h /opt/dirac
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda6             100G   87G  7.8G  92% /home

  • DMS volhcb10:
    • swap usage close to limit (2GB).
(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=0&entity=volhcb10&detailed=yes)
volhcb10_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.pngvolhcb10_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb10_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.pngvolhcb10_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png
[root@volhcb10 ~]# df -h /opt/dirac
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda6             100G   60G   35G  64% /home
  • LogSE volhcb06:
    • Must watch disk usage more closely, probbably need additional disk.
    • Daily IOwait peaks, probably due to full disk scan by updatedb cron (to be removed).
(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=0&entity=volhcb06&detailed=yes)
volhcb06_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.pngvolhcb06_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb06_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.pngvolhcb06_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png
[root@volhcb06 ~]# df -h /opt/dirac/ /storage/
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda11             92G  3.0G   84G   4% /opt
/dev/sdb1             917G  790G  118G  88% /storage
  • Various volhcb01:
    • what are the peaks in the CPU usage? an agent?.

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=0&entity=volhcb01&detailed=yes)
volhcb01_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.pngvolhcb01_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb01_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.pngvolhcb01_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png
[root@volhcb01 ~]# df -h /opt/dirac /storage
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda9             129G   67G   56G  55% /opt
/dev/sdb1             917G  169G  702G  20% /storage
-- RicardoGraciani - 27 Apr 2009

-- PaulSzczypka - 03 May 2009

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Daily_Number_of_Jobs_by_FinalMajorStatus.png r1 manage 40.5 K 2009-05-03 - 20:17 PaulSzczypka  
PNGpng Done+Complete_Jobs_by_UserGroup.png r1 manage 35.5 K 2009-05-03 - 20:17 PaulSzczypka  
PNGpng Done+Complete_Production_Jobs_at_Tier1_by_Site.png r1 manage 59.5 K 2009-05-03 - 20:14 PaulSzczypka  
PNGpng Done+Complete_Production_Jobs_by_JobType.png r1 manage 28.9 K 2009-05-03 - 20:16 PaulSzczypka  
PNGpng Done+Complete_User_Jobs_at_Tier1_by_Site.png r1 manage 56.2 K 2009-05-03 - 20:14 PaulSzczypka  
PNGpng Failed_Jobs_by_UserGroup.png r1 manage 38.7 K 2009-05-03 - 20:16 PaulSzczypka  
PNGpng Failed_Production_Jobs_Application_Finished_With_Errors_by_Site.png r1 manage 121.2 K 2009-05-03 - 20:15 PaulSzczypka  
PNGpng Failed_Production_Jobs_Input_Sandbox_Download_by_Site.png r1 manage 55.1 K 2009-05-03 - 20:15 PaulSzczypka  
PNGpng Failed_Production_Jobs_by_MinorStatus.png r1 manage 50.6 K 2009-05-03 - 20:16 PaulSzczypka  
PNGpng Failed_User_Jobs_by_MinorStatus.png r1 manage 48.4 K 2009-05-03 - 20:15 PaulSzczypka  
PNGpng Failed_Users_Jobs_Application_Finished_With_Errors_by_Site.png r1 manage 88.1 K 2009-05-03 - 20:14 PaulSzczypka  
PNGpng Failed_Users_Jobs_Input_Data_Resolution_by_Site.png r1 manage 46.3 K 2009-05-03 - 20:15 PaulSzczypka  
PNGpng Total_Number_of_Jobs_by_FinalMajorStatus.png r1 manage 35.3 K 2009-05-03 - 20:16 PaulSzczypka  
PNGpng volhcb01_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png r1 manage 22.5 K 2009-05-03 - 20:18 PaulSzczypka  
PNGpng volhcb01_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png r1 manage 19.9 K 2009-05-03 - 20:18 PaulSzczypka  
PNGpng volhcb01_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.png r1 manage 12.5 K 2009-05-03 - 20:18 PaulSzczypka  
PNGpng volhcb01_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png r1 manage 11.3 K 2009-05-03 - 20:17 PaulSzczypka  
PNGpng volhcb06_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png r1 manage 25.5 K 2009-05-03 - 20:19 PaulSzczypka  
PNGpng volhcb06_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png r1 manage 20.8 K 2009-05-03 - 20:19 PaulSzczypka  
PNGpng volhcb06_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.png r1 manage 13.7 K 2009-05-03 - 20:19 PaulSzczypka  
PNGpng volhcb06_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png r1 manage 12.5 K 2009-05-03 - 20:19 PaulSzczypka  
PNGpng volhcb09_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png r1 manage 28.0 K 2009-05-03 - 20:20 PaulSzczypka  
PNGpng volhcb09_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png r1 manage 22.5 K 2009-05-03 - 20:20 PaulSzczypka  
PNGpng volhcb09_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.png r1 manage 13.5 K 2009-05-03 - 20:20 PaulSzczypka  
PNGpng volhcb09_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png r1 manage 13.5 K 2009-05-03 - 20:20 PaulSzczypka  
PNGpng volhcb10_1_0_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png r1 manage 28.1 K 2009-05-03 - 20:21 PaulSzczypka  
PNGpng volhcb10_1_0_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png r1 manage 26.0 K 2009-05-03 - 20:21 PaulSzczypka  
PNGpng volhcb10_1_0_PARTITIONUSEDPERC_STACKEDP_1.gif.png r1 manage 12.5 K 2009-05-03 - 20:21 PaulSzczypka  
PNGpng volhcb10_1_0_SWAP_SPACE_USED_STACKEDS_1.gif.png r1 manage 12.3 K 2009-05-03 - 20:21 PaulSzczypka  
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-05-03 - RicardoGraciani
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback