This Twiki is envisioned as a knowledge repository in which ATLAS members can share Machine Learning related tips, success stories and tutorials. The goal is to create a common knowledge base and to break down the barriers when it comes to collaborating. Groups should not be responsible for re-discovering that scaling features is important or that high weights may be problematic. Groups are invited to share generic tips, provide working examples and advertise their solutions in order for others to gain knowledge and inspiration, and validate solutions by applying them to different problems.


This is currently just a concatenation of various notes and streams of thoughts I had lying around. It needs serious feedback and improvement, but it's a good start.

  • ML is doing the exact same thing as cut based analysis, except that it’s able to carve out signal regions in a non rectangular and non-binary way. It allows for smooth contours and multiple signal regions.

  • ML is not just “signal vs background”. When approaching a problem, ask yourself: is my problem a classification or regression problem? is it a multi-class problem?

  • Describe your objects by set of features

  • Make sure features are not too spiky --> might need to transform distributions, e.g. take the log, ...

  • For NNs, rescale features to (mu = 0 , sigma = 1) to bring all features to the same scale, because big features get upweighted in the cost function during optimization

  • Don't constrain yourself to hand-made features when using DL, the rawer the features the better (4-vectors, etc) —> reduce loss of info + deep net can find some high level features by itself! (Whiteson)

  • For high level inputs, try to use BDTs of SVMs first

  • Use good libraries (scikit-learn, keras, ...)

  • Use ReLu in NNs for sparsity and non-vanishing gradients

  • For variable length inputs, consider LSTMs and GRUs

  • For variable length inputs, you should probably use padding or batch your data in similar size objects

  • If you can treat your problem as an image in 2D or 3D (spatial or temporal) or even nD, consider Conv nets

  • convNN can also be good at picking up translationally invariant features out of continuous spacial-temporal dimensions; in practice, this means that, for example, the presence of a boat in a picture might be more important than where the boat is, or the presence of a phrase in a sentence might be more important than where the phrase is

  • In physics, Conv nets can be applied to series of objects (i.e. tracks that describe a jet) whenever it is reasonable to think that important features may be contained in adjacent subgroups of objects instead of just one object itself. In other words, some important characteristic about a jet might not be contained in the features of each individual track that makes up the jet, but in a subgroup of adjacent tracks.

  • When training a NN, keep an eye on both training loss and validation loss. If training loss keeps decreasing and validation loss doesn’t, stop the training to avoid overtraining. Save the net with the best validation loss.

  • For integration, look for things that can easily be integrated into C++ (such as xgboost) or for converters

  • Look into what your ML algorithm has learned to maybe learn more about Physics (Schwartzman, Whiteson)


Recommended libraries for Machine Learning


A list of known python --> C++ converters for algorithms trained using standard python libraries

Working Examples

This section will contain tutorials on how to apply methods to specific problems. These can be in the form of notebooks, asciinema videos, etc. -- MichelaPaganini - 2016-03-30
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2016-03-30 - MichelaPaganini
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback