Questionnaire for job processing monitoring

  • What is currently used in your VO for monitoring of the production jobs, analysis jobs?
  • Is there any web interface provided for monitoring of the production jobs, analysis jobs?
  • How user support and help with troubleshooting in general are organized in production, in analysis?
  • Do you think that current set of failure reasones and codes returned by the Grid or application is sufficient to understand the underlying problem?
  • For the application related failures is it possible to understand the reason of the failure (data access, sw distribution , user error...) from the exit code (reason) returned from application? If not, is any work planned/ongoing to improve the situation?
  • Is some parsing of the stdout, stderr, application log file currently implemented by the job wrapper to create the meaningful error report? If yes , how is this information propagated back to the job submission UI or some monitoring system?
  • Would you consider to be useful to define a generic way for communicating of the application specific smry information from the job to the job submission UI or some monitoring system?
  • Do you have SAM tests for identifying sites problems related to job processing? Are they useful? If not - why? If yes- would you consider to increase their number, make them more specific?

-- Main.julia - 12 Feb 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2007-02-12 - JuliaAndreevaSecondary
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback