Data Quality Anomaly Detection

For offline monitoring we agreed to have the following interface for anomaly detection:

compare(base_URL, candidate_URL, histogram_list)

base_URL -- reference file URL (whatever ROOT supports -- AFS, EOS, CASTOR, ...)

candidate_URL -- candidate file URL

histogram_list -- list of paths of histograms in specified files to compare

which returns json:

 'error_code': error_code
 'error_description': error_description,
 'anomaly_weights': [0,0,0,...,0], // weights of histograms containing anomaly

list of weights from the interval [0, 1]. The closer to 1, the higher likelihood of anomaly on that histogram. Length of returned anomaly_weights array is equal to length of histogram_list.

in case the corresponding histogram is not found in base_URL file, its weight is -1

in case the histogram is not found in candidate_URL file, its weight is -2

this interface will be published as HTTP method on Anomaly Detection service. It should be accessible by urllib2.urlopen method (

Access to this service should be restricted by networking means, i.e. iptables (no internal authentication would be required to access it).

Internally Anomaly Detection & Prediction Service should be able to cache its predictions.

The service should be deployed eventually nearby Presenter (somewhere at online farm).

Will ask Niko for details. Preferred deployment method - via CVMFS.

-- AndreyUstyuzhanin - 2015-02-14

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2015-02-14 - AndreyUstyuzhanin
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback