Quantum Annealing for stop paper

General comment: This manuscript applies the quantum annealing for machine learning with zooming (QAML-Z) algorithm to the search for top squarks at the LHC. The authors’ main contribution appears to be the addition of a preprocessing step of the input variables with principal component analysis (PCA), which improves the performance of the algorithm. In particular, there is a possible improvement over the classical algorithm used in a CMS search: a boosted decision tree (BDT). While the analysis presented here is sound, I have some major comments. My main major comments are (1) to expand the description of and comparison to the classical ML algorithm (in terms of performance and time to solution) and (2) to clarify the novel contributions of this work relative to the previously published literature on QAML and QAML-Z methods. Below, please find my line-by-line comments and questions.

Physics comments:

ALERT! Q1: 79 How is the BDT trained? Is it optimal? Could a DNN or other classical ML algorithm achieve better performance? Given that one of the main claims of the paper is that there may be an improvement for the quantum algorithm compared to the classical ML algorithm, it would be good to better understand the specifics of the classical ML algorithm used.

The BDT has been trained with the TMVA package, which is the standard multivariate analysis package of root. The number of trees NT is 400. The maximal depth of the trees MD is 3. The maximal node size MN, which is the percentage of number of Signal or Background events at which the splitting of data stops, is 2.5%; this node size is a stopping conditions of the training. Finally, and as mentioned in the paper, the data is diagonalized.

The internal parameters of the BDT (NT, MD, MN, diagonalization or not) have been varied, a new BDT trained, and its performance assessed via the FOM maximization, these while ensuring that there is no over-training. The chosen parameters correspond to the best performance, while having a trustable BDT training. As for the choice of input variables for the BDT: as briefly explained in lines 80-84 of the version 1, a new variable v is added to an already existing set of variables S, a new BDT training, and the FOM maximized versus the output of the BDT. If the maximal FOM reached for the set S+v is higher than for S, v is incorporated as input variable; if it is compatible with the one of S, it is not. In view of our approach, we are confident that the BDT is optimal.

After the study of the BDT, we purposefully took the same input variables to train a DNN, varying the internal parameters of the latter, re-training a DNN and considering its performance via the FOM maximization. The question to address here is whether for a given classification problem (here stop versus SM), with the same set of input variables, a DNN architecture can achieve a better result than a BDT. Among all options explored, namely different number of nodes and hidden layers, activation functions, we did not observe an improvement of the performance with a DNN. Shall we say it as such in the paper, or report there the performances of the DNN for different internal parameters ?

DONE Q2: 91 How big of an impact does the choice of f=20% play? Do the optimal solutions vary depending on this choice? Similarly, does assuming no systematic uncertainty on the signal play a role (I assume it’s an even smaller effect than the background systematic uncertainty)?

It should first be stressed that extreme values of f do not correspond to any realistic analysis in HEP: no measurement of the SM background is without any systematic uncertainty (f=0), nor do we have f=100% which corresponds to the case where the prediction of the background is totally out of control and the corresponding search isn't worth pursuing. values of f between 15% and 40%, which correspond to realistic precisions of background prediction, have been tested. They mainly result in the FOM (1) being maximized at a different value of the BDT output, and (2) having a different maximal value. However, the very choice of the input variables, which is what we want to determine with the FOM maximization, doesn't change with values of f in this range. Please note that a systematic uncertainty on the signal is not considered in similar metrics, as most discoveries are foremost limited by statistical uncertainty. Now indeed, assuming some systematic uncertainty on the signal has no significant effect on the outcome of the FOM maximization.

DONE Q3: 95-98 Maybe point to a reference that shows why maximizing FOM is a good idea

This is done in the second version, thank you.

No Q4: 122 Could a quick summary of the weak classifier construction be given? In particular the equation shown in the Methods section of [3] doesn’t seem to give binary values as this paper claims.

No Q5: 210 Could you report how many events are used for training/testing/validation for both signal and background?

ALERT! Q6: 216 Is it possible to run the algorithm on a D-Wave Advantage machine with a Pegasus graph? Or discuss the gains possible by doing that?

We don't have access to the Pegasus version of D-Wave graphs yet, the access to the latest hardware being more difficult. However, we indeed hope to get access to this machine, our plan being to run the different options of table III as to obtain a systematic comparison of the same settings, input variables, etc across 2 different machines. The discussion of the possible gains is in fact provided in lines 366-375.

No Q7: 237 How are the cutoff C and variable-fixing scheme related? If you use both, does it effectively remove more variables?

No Q8: 237 Could a citation or brief description of the variable-fixing theme be added?

ALERT! Q9: 240 Could a full comparison be made to classical ML and/or classical simulated annealing?

The paper is built to provide an as complete as possible comparison between the quantum annealing and classical ML (here BDT) approaches for classification: same problem (stop versus SM background), same input variables, same pre-selection, diagonalization of data applied to both. We are confident that the comparison of the performances of quantum annealing (different settings and input variables) with the BDT provided in table III is the most complete we can provide.

The comparison with a classical simulated annealing is beyond the scope of this paper where we make the second attempt on classification in high energy physics with quantum annealing, and where we want to focus on the quantum annealing (i.e. different settings) and show how it compares (given the present graph) with a classical ML approach. However, we plan to publish in a separate paper the results of a classical approach also based on the optimization of an Ising Hamiltonian, this for the same classification problem, and with the same input variables.

No Q10: 253 Is there a justification that 10 times is enough?

No Q11: 342 It is my understanding that the variable fixing scheme is necessary to put it on a physical quantum computer with limited qubits. Is that correct? So is the point of this statement that once more qubits are available, the performance will improve?

DONE Q12: 399 I was under the impression that the BDT used was the same/similar to the one from the CMS publication [5]. However, clearly, the data used here is based on Delphes simulation. So, presumably the BDT was retrained on this more simplified dataset?

Indeed: in order to make the comparison with the results of quantum annealing as valid as possible, a BDT was re-trained with the the Delphes simulation. It has to be noted that the performance of the BDT (for the same signal) is compatible between this new simulation and the full simulation of the CMS detector.

Text comments

DONE 200 Usually in ML, the convention of “training” (used in training), “validation” (used to validate / select the “best” model), and “testing” (held out for final performance checks) datasets are used.

We acknowledge the usual convention. However, since the "QA", "Train" and "Test" samples are unequivocally defined and better correspond to the needs of this work (i.e. 3 samples with one specifically sent to the quantum annealing algorithm), we prefer to keep this notation.

Stop1 direct search at 8 TeV

8 TeV stop studies can be found here

Replies to JHEP's referee can be found here

Stop1 grid

This can be found here

SW guide

This can be found here

Old Jet and MET studies

Some Jet and MET studies can be found here

Old triggers studies studies

trigger studies

-- PedrameBargassa - 28 Jan 2009

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng LM_Parallel_Coor_Back.png r1 manage 79.6 K 2013-01-16 - 17:33 PedrameBargassa LM ttbar VarCorr
PNGpng LM_Parallel_Coor_Sig.png r1 manage 87.2 K 2013-01-16 - 17:35 PedrameBargassa LM sig VarCorr
PNGpng MVA_4j_antiiso_El.png r1 manage 26.6 K 2012-11-08 - 20:49 PedrameBargassa Single-electron MVA(LM) with RelIso>0.3
PNGpng MVA_4j_antiiso_Mu.png r1 manage 25.9 K 2012-11-08 - 20:50 PedrameBargassa Single-muon MVA(LM) with RelIso>0.3
GIFgif cumulativehltrate.1e31.gif r1 manage 25.5 K 2009-05-27 - 22:42 PedrameBargassa Cumulative, per sector rates
GIFgif ele10sumhtscanthreshold.1E31.gif r1 manage 12.6 K 2009-05-21 - 12:17 PedrameBargassa Rate variation versus HT and Jet-input threshold for Ele10_HT
GIFgif mu5sumhtscanthreshold.1E31.gif r1 manage 11.4 K 2009-05-21 - 12:18 PedrameBargassa Rate variation versus HT and Jet-input threshold for Mu5_HT
GIFgif sumhtscanthreshold.1E31.gif r1 manage 11.9 K 2009-05-21 - 12:19 PedrameBargassa Rate variation versus HT and Jet-input threshold for HT
Edit | Attach | Watch | Print version | History: r148 | r122 < r121 < r120 < r119 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r120 - 2021-07-19 - PedrameBargassa
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback