Template for Weekly Report

Week from 08062009 to 15062009

Job Statistics

  • Summary:
    • 171K jobs run last week
    • 8.3% failed
    • 148.5K Production jobs run to end
    • 8.5K User jobs run to the end
    • 12.3K Production Jobs Failed
    • 1.7 K User Jobs Failed

  • Total number of Jobs by Final Major Status
Total_Number_of_Jobs_by_FinalMajorStatus.png

  • Daily number of Jobs by Final Mayor Status
Daily_Number_of_Jobs_by_FinalMajorStatus.png

  • Done|Completed Jobs by User Group
Done+Complete_Jobs_by_UserGroup.png

  • Done|Completed Production Jobs by Job Type
Done+Complete_Production_Jobs_by_JobType.png

  • Failed Jobs by User Group
Failed_Jobs_by_UserGroup.png

  • Failed Production Jobs by Minor Status
Failed_Production_Jobs_by_MinorStatus.png
Comment: The BK-LFC mismatch are failures in the WMS because the FEST data of this week was registered incorrectly in the BK/LFC by the RunDB
  • Failed User Jobs by Minor Status
Failed_User_Jobs_by_MinorStatus.png

Running at Tier1's

  • Summary:
    • 60.4K lhcb_prod Jobs at Tier1s
      • Shares descend from CERN, CNAF, RAL, GRIDKA, IN2P3, PIC, NIKHEF
    • 7.4K User Jobs at Tier1s
      • 33% CERN share
      • 15% IN2P3 share lower than expected
      • 8% NIKHEF share significantly lower than expected
      • 4% CNAF share significantly lower than expected

  • Done|Completed Production Jobs by Site
Done+Complete_Production_Jobs_at_Tier1_by_Site.png

  • Done|Completed User Jobs by Site
Done+Complete_User_Jobs_at_Tier1_by_Site.png

Job Failure Analysis

  • Summary:
    • Application Errors were observed in large number at all Tier1 sites (except NIKHEF)
    • User Jobs failed primarily because of Application Errors then because of Input data resolution
    • Input data resolution caused significant problems at CERN over the last week. This was because of the on-going tidy up of the data lost by Castor. This has been completed and users asked to regenerate their input data from the BK. CNAF also has input data resolution problems due to one of the Castor disk pools being incorrectly configured.

  • Failed Production Jobs (Application Finished With Error) by Site
Failed_Production_Jobs_Application_Finished_With_Errors_by_Site.png

  • Failed User Jobs (Input Data Resolution) by Site
Failed_Users_Jobs_Input_Data_Resolution_by_Site.png

  • Failed Jobs at CERN by Minor Status
Failed_Jobs_at_CERN_by_MinorStatus.png
  • Failed Jobs at CNAF by Minor Status
Failed_Jobs_at_CNAF_by_MinorStatus.png
  • Failed Jobs at GRIDKA by Minor Status
Failed_Jobs_at_GRIDKA_by_MinorStatus.png
  • Failed Jobs at IN2P3 by Minor Status
Failed_Jobs_at_IN2P3_by_MinorStatus.png
  • Failed Jobs at NIKHEF by Minor Status
Failed_Jobs_at_NIKHEF_by_MinorStatus.png
  • Failed Jobs at PIC by Minor Status
Failed_Jobs_at_PIC_by_MinorStatus.png
  • Failed Jobs at RAL by Minor Status
Failed_Jobs_at_RAL_by_MinorStatus.png

Hardware Status

  • WMS volhcb09:
    • Significant peaks observed in IOWait (50) on the 8th, 11th and 12th
    • Significant peak observed in Load Average (20) on the 12th with smaller peaks observed at the same time as IOWait peaks
    • Swap Used: Close to 2GB limit 6 times on the 8th, 9th, 11th and 12th (time corelated with peaks in IOWAit and Load Average)

volhcb09_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif
volhcb09_1_-86400_LOADAVG_STACKEDL_1.gif
volhcb09_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif
volhcb09_1_-86400_CPUUTILPERCIOWAIT_STACKEDC_1.gif

  • LogSE volhcb06:
    • Partition Used: Completed full almost all of the time. Significant cleaning need to be done.
volhcb06_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Daily_Number_of_Jobs_by_FinalMajorStatus.png r1 manage 46.6 K 2009-06-16 - 17:44 AndrewCSmith  
PNGpng Done+Complete_Jobs_by_UserGroup.png r2 r1 manage 33.7 K 2009-06-16 - 17:48 AndrewCSmith  
PNGpng Done+Complete_Production_Jobs_at_Tier1_by_Site.png r1 manage 67.7 K 2009-06-16 - 17:29 AndrewCSmith  
PNGpng Done+Complete_Production_Jobs_by_JobType.png r1 manage 42.3 K 2009-06-16 - 17:29 AndrewCSmith  
PNGpng Done+Complete_User_Jobs_at_Tier1_by_Site.png r2 r1 manage 63.4 K 2009-06-16 - 18:22 AndrewCSmith  
PNGpng Done_Complete_Jobs_by_UserGroup.png r1 manage 46.3 K 2009-06-16 - 17:27 AndrewCSmith  
PNGpng Done_Complete_User_Jobs_at_Tier1_by_Site.png r1 manage 63.4 K 2009-06-16 - 18:27 AndrewCSmith  
PNGpng Failed_Jobs_at_CERN_by_MinorStatus.png r1 manage 45.5 K 2009-06-16 - 17:29 AndrewCSmith  
PNGpng Failed_Jobs_at_CNAF_by_MinorStatus.png r1 manage 54.0 K 2009-06-16 - 17:30 AndrewCSmith  
PNGpng Failed_Jobs_at_GRIDKA_by_MinorStatus.png r1 manage 54.8 K 2009-06-16 - 17:33 AndrewCSmith  
PNGpng Failed_Jobs_at_IN2P3_by_MinorStatus.png r2 r1 manage 52.5 K 2009-06-16 - 17:34 AndrewCSmith  
PNGpng Failed_Jobs_at_NIKHEF_by_MinorStatus.png r1 manage 56.9 K 2009-06-16 - 17:31 AndrewCSmith  
PNGpng Failed_Jobs_at_PIC_by_MinorStatus.png r1 manage 53.9 K 2009-06-16 - 17:31 AndrewCSmith  
PNGpng Failed_Jobs_at_RAL_by_MinorStatus.png r1 manage 49.5 K 2009-06-16 - 17:31 AndrewCSmith  
PNGpng Failed_Jobs_by_UserGroup.png r1 manage 37.1 K 2009-06-16 - 17:49 AndrewCSmith  
PNGpng Failed_Production_Jobs_Application_Finished_With_Errors_by_Site.png r1 manage 113.7 K 2009-06-16 - 17:34 AndrewCSmith  
PNGpng Failed_Production_Jobs_by_MinorStatus.png r1 manage 67.6 K 2009-06-16 - 17:34 AndrewCSmith  
PNGpng Failed_User_Jobs_by_MinorStatus.png r2 r1 manage 65.1 K 2009-06-16 - 18:00 AndrewCSmith  
PNGpng Failed_Users_Jobs_Input_Data_Resolution_by_Site.png r1 manage 57.4 K 2009-06-16 - 17:35 AndrewCSmith  
PNGpng Total_Number_of_Jobs_by_FinalMajorStatus.png r1 manage 36.4 K 2009-06-16 - 17:42 AndrewCSmith  
GIFgif volhcb06_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif r1 manage 14.3 K 2009-06-16 - 18:28 AndrewCSmith  
GIFgif volhcb09_1_-86400_CPUUTILPERCIOWAIT_STACKEDC_1.gif r1 manage 16.6 K 2009-06-16 - 18:35 AndrewCSmith  
GIFgif volhcb09_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif r1 manage 30.1 K 2009-06-16 - 18:31 AndrewCSmith  
GIFgif volhcb09_1_-86400_LOADAVG_STACKEDL_1.gif r1 manage 14.8 K 2009-06-16 - 18:32 AndrewCSmith  
GIFgif volhcb09_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif r1 manage 15.6 K 2009-06-16 - 18:32 AndrewCSmith  
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2009-06-16 - AndrewCSmith
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback