What information experiments can provide?

The type of information strictly depends on every VO specific monitoring system. See a review for each of them.

LHCb: Dirac monitoring

The Dirac monitoring tool provides information about job status in different sites.

The minimum time granularity is 1 hour.

The user can choose a site and a period and then a histogram is displayed with the number of jobs versus time. All the jobs are plotted in the same histogram, with a different colour depending on the status. Possible status are: Done, failed, running, stalled.

Is this the status from the application point of view? or Grid?

The jobs are not separated by job type (MC, user analysis and so on..). Asked about that in the dirac-developers mailing list on Aug 12 2008. Summary of the mail thread:
Answer from Andrei Tsaregorotsev: "As far as your question is concerned - what is exactly the granularity of different kinds of activities you want to have ? In the new system user and production jobs are put together. We can not classify user activities (analysis or custom MC or whatever). However, each production job has associated production ID which details should be available on the Production Summary pages. Have a look and let's then discuss further." Which production Summary? ask more info.

Answer from Philippe Charpentier: "I think this has always been a requirement that we should be able to classify jobs in broad categories. Even in DIRAC2 there are "Production" (should be "Simulation") jobs and "Processing" jobs. This is provided by the "JobType". We should consider whether we want to extend this to other activities such as splitting "Processing" into "Reconstruction" / "Stripping" / "Merging" etc....�
In DIRAC3, the JobType of the currently running reconstruction productions is... "test" which is somewhat misleading...
Can we make a proposal to be discussed on the usage of the JobType, including the valid request from Elisa. I think it would be very important as well for the accounting to be able to group our CPU usage by activity if we want to be able to answer simple questions like "how much CPU is used for stripping?"

My answer: About the granularity of the classification: we do not need a very fine granularity. From the site perspective it would be interesting to have a first broad classification as:
1- user analysis jobs
2- jobs of an official production of the experiment.

In this way they can set priorities. For example, failures of jobs of an official production of the VO will be investigated with higher priority than failures of jobs of a particular user analysis.

Then, among the category 2, it would be useful to distinguish:
2.a- MC simulation
2.b- following steps of digitization and reconstruction
so that they can check that jobs requiring very high CPU usage are MC simulation jobs. Of course, if a finer granularity is possible, it will be even better.

Do you think it is feasible to publish this information through the dirac monitoring? As Philippe says, the jobtype should provide this information. And for the first broad classification between user jobs and LCHb jobs maybe also the certificate owner used to submit the job could be useful (maybe)?

Conclusion Andrei provides data about job processing, specifying several subactivities. See here for the link to the URL where the metrics are published.

ALICE: Monalisa

I report here some questions asked to Costin Grigoras about Monalisa. Questions:

  • can Monalisa distinguish among user analysis jobs and production jobs?
No. They only store centrally either site-wide or per user statistics, but not /site and /user at the same time; this information was not considered relevant for alice.

  • for each metric define: the unit? in time granularity? usually 1 minute.
  • data transfer: transfer rate? no. at least not for the time being.
  • is monalisa used at the sites? any feedback from site admins? yes, they have a test suite to set the overall status of a site. It is in the Monalisa web, under Services/Site Services/ site overview. The test are done every 15 minutes. If this is green, the site is ok, even if there is no job running.

ATLAS: Dashboard

ATLAS Dashbord offers an API which we can use to extract the data we need. Both for job processing and for data transfer. This is very flexible and efficient, as we don't need to ask for the metrics, we just extract them.

CMS: Dashboard

-- ElisaLanciotti - 08 Jun 2009

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-06-08 - ElisaLanciotti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback