Week from 13072009 to 20072009

Job Statistics

  • Summary:
    • Almost 351 K jobs run last week
    • Over 7% failed
    • Daily peak of over 60 K jobs
    • 278 K Production jobs run to end
    • 26 K User jobs run to the end
    • 19 K Production Jobs Failed
    • About 5 K User Jobs Failed

  • Total number of Jobs by Final Major Status
Total_Number_of_Jobs_by_FinalMajorStatus.png

  • Daily number of Jobs by Final Mayor Status
Daily_Number_of_Jobs_by_FinalMajorStatus.png

  • Done|Completed Jobs by User Group
Done+Complete_Jobs_by_UserGroup.png

  • Done|Completed Production Jobs by Job Type
Done+Complete_Production_Jobs_by_JobType.png

  • Failed Jobs by User Group
Failed_Jobs_by_UserGroup.png

  • Failed Production Jobs by Minor Status
Failed_Production_Jobs_by_MinorStatus.png

  • Failed User Jobs by Minor Status
Failed_User_Jobs_by_MinorStatus.png

Running at Tier1's

  • Summary:
    • 87 K Production Jobs at Tier1s
      • 41% at GRIDKA
      • 13% at CNAF
      • 12% at CERN
      • 11% at IN2P3
      • 10% at RAL
      • 8% at PIC
      • 5% at NIKHEF
    • 20 K User Jobs at Tier1s
      • 19 % CERN Share

  • Done|Completed Production Jobs by Site
Done+Complete_Production_Jobs_at_Tier1_by_Site.png

  • Done|Completed User Jobs by Site
Done+Complete_User_Jobs_at_Tier1_by_Site.png

Job Failure Analysis

  • Summary:
    • Production Jobs Failed mostly due to:
      • Application finished with errors ( ~7200)
      • Watchdog identified this job as stalled (~6400)
    • User Jobs Failed mosty due to:
      • Watchdog identified this job as stalled (~1400)
      • Application finished with errors (~1200)

  • Failed Production Jobs (Application Finished With Error) by Site
Failed_Production_Jobs_Application_Finished_With_Errors_by_Site.png

  • Failed User Jobs (Input Data Resolution) by Site
Failed_Users_Jobs_Input_Data_Resolution_by_Site.png

  • Failed Jobs at GRIDKA by Minor Status
Failed_Jobs_at_GRIDKA_by_MinorStatus.png

  • Failed Jobs at CERN by Minor Status
Failed_Jobs_at_CERN_by_MinorStatus.png

  • Failed Jobs at PIC by Minor Status
Failed_Jobs_at_PIC_by_MinorStatus.png

Hardware Status

  • WMS volhcb09:
    • CPU utilization: Idle > 50%, IO Wait peaks?,
    • Network utilization: < 700K
    • Swap Used: less than 1.2 G, under the limit (2 GB).
    • Partition Used: Stable at 101G

volhcb09_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb09_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb09_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb09_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

  • DMS volhcb10:
    • CPU utilization: Idle < 17%
    • Network utilization: between 90K to 250K
    • Swap Used: < 700M, under the limit (2 GB).
    • Partition Used: Stable at 84G

volhcb10_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb10_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb10_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb10_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

  • LogSE volhcb06:
    • CPU utilization: Idle > 50%
    • Network utilization: < 200K
    • Swap Used: < 230K, under the limit (2 GB).
    • Partition Used: Stable

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=1d&entity=volhcb06&detailed=yes )

volhcb06_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb06_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb06_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb06_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

  • Various volhcb01:
    • CPU utilization: Idle > 80%
    • Network utilization: <6K
    • Swap Used: about 135K, under the limit (2 GB).
    • Partition Used: Stable at 740G

(https://lemonweb.cern.ch/lemon-web/info.php?time=1&offset=1d&entity=volhcb01&detailed=yes )

volhcb01_1_-86400_CPUUTILPERCUSER_CPUUTILPERCSYSTEM_CPUUTILPERCNICE_CPUUTILPERCIDLE_CPUUTILPERCIOWAIT_CPUUTILPERCIRQ_CPUUTILPERCSOFTIRQSTACKEDC_1.gif.png volhcb01_1_-86400_NUMKBREADAVG_NUMKBWRITEAVGOVERLAYN_1.gif.png
volhcb01_1_-86400_PARTITIONUSEDPERC_STACKEDP_1.gif.png volhcb01_1_-86400_SWAP_SPACE_USED_STACKEDS_1.gif.png

-- JiboHE - 20 Jul 2009

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-08-10 - FedericoStagni
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback