WMStats Monitoring for Tier0

WMStats is status based monitoring tool for WMAgent. It is a user interface that presents the current status of the tier0 system. In WMStats you can find:

  • A summary of the production the tier0 is working on: run processing status.
  • Job creation for the requests in the system, job success rate and failure modes - if any.
  • Tier0 WMAgent status and health.

Run-Workflow-Jobs.png

WMStats creates a summary of the run, request/workflow and job progress. Each run can have several workflows depending on the streams it has, i.e. Run228566 can have workflows like:

  • Express_Run228566_StreamExpressCosmics
  • Repack_Run228566_StreamA
  • Repack_Run228566_StreamCalibration ...

And each workflow can have a different number of jobs created.

These are the WMStates instances the tier0 is using:

Observations: Percentages DO NOT REFLECT the workflow status, it varies according to job counts and gets to 0% when the workflow is complete. Also, when all the processing is done, it counts on merge and slow logCollect jobs to declare workflow complete.

FAQ

Are Express and Repack done for a given run?

As known, we want it done as we take data. The best way to know it so far is for the run states, namely, for a run still being processed "Real Time Processing", to know how far the run is, click the "L" Button, it will take you to the expanded workflows. Then you will see what is still left to finish. Hopefully not Express.

If the run is past the point we need to worry about, its state will be "Real Time Done"

Are we keeping up with data taking?

We are going to use the previous concept here to define that, and a healthy WMStats, should show the first few runs as "Real Time Processing" (the fewer the better). The subsequent runs have all to stay in the "Real Time Done" state, which, again, means we're done with express and repack.

Despite of the run status, did we upload the PCL Payload?

Just go to the Run, click "L", you will see the Express workflow like :

Express_Run228548_StreamExpressCosmics

Click "L" on it, you should see this task with all jobs complete and success status :

ExpressAlcaSkimwrite_StreamExpressCosmics_ALCARECOAlcaHarvestALCARECOStreamPromptCalibProdSiStrip

Too big name to be obvious, so I would just search in the browser for "alcaharvest".

Despite of the run status, did we upload data to the DQM GUI?

Just go to the Run, click "L", you will see the Express workflow like :

Express_Run210611_StreamExpress

Click "L" on it, you should see this task with all jobs complete and success status :

ExpressMergewrite_StreamExpressCosmics_DQMEndOfRunDQMHarvestMerged

To be a bit more didactic, it harvest the output from the task :

ExpressMergewrite_StreamExpressCosmics_DQM

What is the PromptReco state?

All you need to do is go to the "workflow" filter box and type "promptreco".

What you will see are all the runs that have PromptReco released, and their status, namely :

  • PromptReco - Running the processing jobs - you will want to filter out those to know about the farm usage
  • Reco Harvest - DQM and AlCa Harvest jobs running, processing+merge is past
  • Processing Done - All Primary Datasets are done in all levels of their workflows (merge, harvest, skim, etc)

Observations Be aware of the reco triggering delay: expect runs older than 6h in the past. More recent runs will not appear.

Edit | Attach | Watch | Print version | History: r10 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2014-10-28 - LuisContreras
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback