Reasons why Batch Accounting is Under-reporting

Or more specifically, why we can rule out some possible scenarios.

Current logging under-reports by about 2-5%. This is pretty significant.

Busted scenarios should be strikethroughed and shifted to the bottom of the list for reference.

Events may be reported late

What needs to be checked is how frequent events are logged on the database. Two things need to be done here:

  • The number of records for a recent hour should be logged over a period of time. Extensively. This means logging every single event that appears so we can perform cross-checks on which events are being delayed in reporting if this is the case.
  • The most recent event to be entered into the database should be monitored over a period time to test for any delay in logging.

Daily reporting might be done too early and an hour of logs are being trimmed

Hasn't been tested, but this is a possible scenario if it's trying to fetch logs that happened after the daily reporting period. An hour is 4% of a day, after all.

Currently the resacc cronjob is set to run at four minutes past midnight every day which may be causing problems in the event of a bad timezone configuration.

All CERN services, including batchmon, seem to be configured to be set to CEST, so this might not be the problem.

The last second of each day might not be accounted

stopdate = stopdate.replace(hour=23, minute=59, second=59, microsecond=999)
...
where = "WHERE loc.eventTime >= :startdatebind AND loc.eventTime < :enddatebind"

I'm not sure how Oracle_cx handles dates with milliseconds, but in any case the Oracle DATE field does not support any time unit smaller than seconds.

This being an issue or not depends on if Oracle_cx is accounting for milliseconds when binding the field. If it is, then this should not be an issue. If it isn't, then we're losing logging data for one in every 86,400 records.

Obviously not the cause of the 2-5% loss, but a potential discrepancy nevertheless.

Some events might not be recorded in the database

Not probably likely, however this scenario hasn't been completely confirmed to not happen yet. I tested this before I understood more about batch and batchtzero.

I need to check this a second time with both LSF job sources in consideration to make sure there is 100% parity between AFS logging and the records in the database.

Potential imezone differences between AFS logging and the database

There are no differences in timestamps for AFS logs and database entries so there's no problem there.

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2015-09-18 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback