Dirac monitoring tool

The Dirac monitoring tool provides information about job status in different sites.

The minimum time granularity in 1 hour.

The user can choose a site and a period and then a histogram is displayed with the number of jobs versus time. All the jobs are plotted in the same histogram, with a different colour depending on the status. Possible status are: Done, failed, running, stalled.

Is this the status from the application point of view? or Grid?

The jobs are not separated by job type (MC, user analysis and so on..). Asked about that in the dirac-developers mailing list on Aug 12 2008. Summary of the mail thread:
Answer from Andrei Tsaregorotsev: "As far as your question is concerned - what is exactly the granularity of different kinds of activities you want to have ? In the new system user and production jobs are put together. We can not classify user activities (analysis or custom MC or whatever). However, each production job has associated production ID which details should be available on the Production Summary pages. Have a look and let's then discuss further." Which production Summary? ask more info.

Answer from Philippe Charpentier: "I think this has always been a requirement that we should be able to classify jobs in broad categories. Even in DIRAC2 there are "Production" (should be "Simulation") jobs and "Processing" jobs. This is provided by the "JobType". We should consider whether we want to extend this to other activities such as splitting "Processing" into "Reconstruction" / "Stripping" / "Merging" etc....�
In DIRAC3, the JobType of the currently running reconstruction productions is... "test" which is somewhat misleading...
Can we make a proposal to be discussed on the usage of the JobType, including the valid request from Elisa. I think it would be very important as well for the accounting to be able to group our CPU usage by activity if we want to be able to answer simple questions like "how much CPU is used for stripping?"

My answer: About the granularity of the classification: we do not need a very fine granularity. From the site perspective it would be interesting to have a first broad classification as:
1- user analysis jobs
2- jobs of an official production of the experiment.

In this way they can set priorities. For example, failures of jobs of an official production of the VO will be investigated with higher priority than failures of jobs of a particular user analysis.

Then, among the category 2, it would be useful to distinguish:
2.a- MC simulation
2.b- following steps of digitization and reconstruction
so that they can check that jobs requiring very high CPU usage are MC simulation jobs. Of course, if a finer granularity is possible, it will be even better.

Do you think it is feasable to publish this information through the dirac monitoring? As Philippe says, the jobtype should provide this information. And for the first broad classification between user jobs and LCHb jobs maybe also the certificate owner used to submit the job could be useful (maybe)?

Monalisa

Questions:

  • can monalisa distinguish among user analysis jobs and production jobs?
We only store centrally either site-wide or per user statistics, but not /site and /user at the same time; this information was not considered relevant for alice.

  • for each metric define: the unit? in time granularity? usually 1 minute.
  • data transfer: transfer rate?
  • is monalisa used at the sites? any feedback from site admins? yes, they have a test suite to set the overall status of a site. It is in the Monalisa web, under Services/Site Services/ site overview. The test are done every 15 minutes. If this is green, the site is ok, even if there is no job running.

-- ElisaLanciotti - 12 Aug 2008

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2020-08-19 - TWikiAdminUser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox/SandboxArchive All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback