Decorrelated Adversarial Neural Network

In this project the decorrelation was achieved by introducing a second network which tries to estimate the mass of an event based on the output of the classifier and the $p_T$ of the event. The loss functions of the combined net, which we will call the adversarial net from now on, is given by

\[\mathcal{L}=L_{clf}-\lambda L_{reg}\]

where $L_{reg}$ is the loss function for the neural net, that tries to estimate the mass and $L_{clf}$ the one for the classifier. The classifier uses cross entropy as a loss function while the regression uses mse. At first only the classifier is trained (as of now) just for fix 50 epochs, from epoch 50 to 100 then only the regression is trained, after epoch 100 both nets are trained simultaneously.

The decorrelated neural Network can be used by setting the parameter 'massless' to 'adversarial' . The following hyperparameters can then be set:

'massless_importance': This is the term $\lambda$ in combined the loss function

tbd: 'nNodes_reg': regression network architecture

Chi Square for the Histograms

In order to get a quantisation of the sculpting a chi-square difference, given by:

\[\frac{1}{MN} \sum_{i=1}^{r}\frac{(Mn_{i}-Nm_{i})^{2}}{n_{i}+m_{i}}\]

i,j runs over the bins of the two histograms and a Poisson distribution is assumed.

The two histograms to be compared are the background without any cuts applied on the DNN variable and the background where some cuts on the DNN are applied.

In the following two extreme cases can be seen:

However we want to balance the two extreme cases, one example for that can be seen in the following figure:

Too make it easier to check the goodness of the chosen Hyperparameters in first order a heatmap is used. The value of each field is given as the difference between the scaled significance and the scaled $\chi^2$, both scaled to 0 with 1 standard deviation. A plot for this can be seen in the section "Scanned Hyperparameters"

Scanned Hyperparameters

There has been conducted a grid search with the following parameters, the learning rate was set to be:

for the initial classifier training: 1.

for the initial regression training: 1.


'learning_rate' (from epoch 100 to 150): [1.,0.1,0.001,0.0001]

'learning_rate' (from epoch 150 to 200): [1.,0.1,0.001,0.0001]

The result in this grid search can be seen here:

, both normalized to 0 with 1 standard deviation">firstgridsearch.png, both normalized to 0 with 1 standard deviation" />
First Grid search, empty fields mean that the DNN only classified everything as the same class, the value of each gridpoint is given by the difference between the significance and the $\chi^2$, both normalized to 0 with 1 standard deviation

From this we get the order of magnitude of the importance parameter.

-- BennoKach - 2020-03-23
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng chisquare0.png r1 manage 10.1 K 2020-03-26 - 18:16 BennoKach  
PNGpng chisquarehigh.png r1 manage 10.8 K 2020-03-26 - 18:16 BennoKach  
PNGpng chisquareinbetween.png r1 manage 11.3 K 2020-03-31 - 01:26 BennoKach  
PNGpng firstgridsearch.png r1 manage 8.7 K 2020-03-26 - 17:29 BennoKach First Grid search, empty fields mean that the DNN only classified everything as the same class
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2020-03-31 - BennoKach
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback