Higgs decays to ZJPsi, JPsiJPsi and YpsilonYpsilon (HIG-20-008)

CADI: https://cms.cern.ch/iCMS/analysisadmin/cadilines?line=HIG-20-008&tp=an&id=2332&ancode=HIG-20-008


Color Code

Color Meaning
BLACK Question or comment from ARC/conveneers
RED Authors know but not yet added/implemented
GREEN Answer from Authors
ORANGE Authors working on this item
BLUE authors answered but might need iteration with ARC

Comments for data-card

NP naming: in order for the NP to completely follow the HComb convention, it would be nice if you could add _hzz to the analysis-specific nuisances names


Lumi NP: you mention that for the lumi you are using a weighted average of the three years as NP. However, this is not taking into account the partial correlations among different years. Also, the value quoted by the Lumi POG for the full Run II uncertainty on the luminosity should be 1.8%, somewhat different than the 2.4% you are quoting. Please consider using the correct value of 1.8%, this should be fine as you are considering a single signal process inclusive on the three years. OTOH note that when using this 1.8% you would not be considering the partial correlations among years, but this should not have any substantial impact on the analysis results.

I changed to the recommended luminosity uncertainty of 1.8%.

Blinded limits: since you are using HybridNew to extract blinded limits, you should generate an asimov toy first (with GenerateOnly --saveToys) and use -D <file_with_toys.root>:toys/toy_asimov to have a blind limit, otherwise you are getting post- fit expected, which already uses data.


CMS_H_mean_e NP: In the ZJpsi->2e2mu and ZJpsi->4mu cards (so the separate ones) there are param lines in the datacards that are not associated with any params that actually exist in the input workspace. For example there is a parameter CMS_H_mean_e in the workspace, and a param line CMS_H_mean param 124.690 0.047 Please fix this and use consistent naming between datacard and workspace.


Background: Both in the AN and in the presentations given at HZZ/HIG PAG meetings you mention that this analysis uses "data-driven background". OTOH you also quote three MC samples used to model the background. Our understanding is that the background is not really data driven, but rather parametrized with an analytical expression using these MC samples. Can you please clarify on this?

The background is dominated by associated production which is sampled from the side-bands. In addition, possible peaking background is estimated from dedicated MC samples and is found to be negligible (far less than 1 event). These backgrounds are not included in the fit. Hence, we characterize the procedure as purely data driven.

Also, you have a NP associated to the background (CMS_bkg_frac param) only in the JJ4mu card. I think you should have NPs associated to the background in all your cards.

The parameters of the background shape functions are free to float. Therefore, they are not accounted for as NP – the inclusion of the background fraction which is floated in the JJ card was a mistake (double counting).

text2workspace: if you are using any particular model, please upload it to the repository. It would be nice, just for completeness, if you could also share the text2worskpace commands you use to generate the workspaces starting from the datacards, especially if they contain any particular option for the POIs ranges and/or re-definitions.


Data-card approval message

Thanks for uploading the datacards for review so early. After some iteration with the authors we're now happy with the cards so they are approved. If major changes to the structure are made between now and the pre-approval presentation, please do let us know so that we can double check. [ https://hypernews.cern.ch/HyperNews/CMS/get/HIG-20-008/3.html]

MUON POG approval message from Jonatan

Thank you for filling the muon documentation required for HIG-20-008[ https://twiki.cern.ch/twiki/bin/view/CMS/TWikiHIG-MUO][ https://twiki.cern.ch/twiki/bin/view/CMS/HIG20008muons]. I have reviewed it and I haven't found any outstanding issue, therefore you have the HIG-MUO green light.

JME approval message from Alexis

Thank you for completing the JME questionnaire. Given that you are using neither jets nor MET, we only have to wish you good luck with the rest of the analysis review! [ https://hypernews.cern.ch/HyperNews/CMS/get/HIG-20-008/5.html]

Comments From Giacomo

Abstract: I think the abstract is the only part needing a somewhat larger revision. The way It is written is a bit too convoluted imho. It would be better to rewrite it making the sentences more straightforward. For example the first sentence could read as “This paper presents the first measurement of the H->Z J/Psi decay. H->Z J/Psi candidate are studied in the four leptons final state exploiting the full LHC Run 2 dataset of 137fb-1. In addition, H and Z boson decays into pairs […] Different polarisation scenarios of the Z boson are studied.” etc.

⇒ We rewrote the abstract in the next version.

L35 (second paragraph). Is it possible to include a table with the available predictions for the SM? Or add it in a new column in table 3?

⇒ It would be nice to summarize all measured and predicted BF in a table - for measured done.

Majority of the Quarkonium channels do not have SM predictions:

1. No predictions for Higgs to Y(ns)Y(mS) (except Y(1S)(1S)) and Z boson to Y(nS)Y(mS) channels;

2. Feed-down quarkonium channels (H→ J\psi \psi(2S), H→ \psi(2S) \psi(2S)) have no SM calculations.

In the case of the H→ Y(1S)Y(1S) channel, SM theoretical prediction is not consistent:

Previous calculation using a phenomenological approach provides BF in order of 10^-5 [31]. But Latter, Ref.[35] assumes dominant contribution from indirect Higgs coupling, and provides BF in order of 10^-9 [35].

L35: I’m not sure it is a good idea to cite HL-LHC here, since this is a purely LHC paper. But this can be discussed with ARC and in CWR.

⇒ We removed HL-LHC sentence

L48: I should probably go through ref [29], but it is not clear to me how is it possible to have a BR>1 in any channel. Surely the maximum value should be 1-(whatever has been observed for other channel), isn’t it?

⇒ Ref.[29], page 4, last paragraph: 𝜎(pp→ H). BF(H->ZJ\psi) = 100 pb.

L51: Maybe I miscounted, but isn’t this the “second” class you are speaking of? Otherwise, mention explicitly which is the second when you introduce it.

⇒ L:22 A related class of such processes…….. ⇒ A second related class of such processes…...

L207: You select 4mu events as well, so this should be “event with at least 2 muons plus two leptons”

⇒ done

L210: does the 5 GeV cut applies both to Z and J/psi? If yes, “each dilepton resonance…”

⇒ done

L214-216: Invert the 2 sentences: put “A total of…” first, and “The range of…” second, so the range is already defined when you mention it. L214: you should specify which threshold you are referring to.

⇒ “The range of the four-muon invariant mass is chosen to exclude the region close to the threshold. A total of 164 (124) single candidate events are found in the 4μ(2e2μ) invariant mass between 112 and 142 GeV ” is changed to “ ⇒A total of 164 (124) single candidate events are found in the 4$\Pgm$ ($2\Pe2\Pgm$) invariant mass between 112 and 142~\GeV. The lower range of the four-muon invariant mass is chosen to exclude the region close to the threshold.”

L230: “before unblinding”

⇒ done

L232: I think at the end of the line it should be Y (without (1S))

⇒ done

L259: either here or in the systematics section you should mention you did the bias studies and the effect of other shapes was found to be negligible

⇒ L259: Added sentence ⇒ “A possible bias in the choice of the background parametrization is probed with alternative functional forms and found to be negligible.”

L293: Please note there is a new prescription for the treatment of the luminosity uncertainty https://hypernews.cern.ch/HyperNews/CMS/get/physics-announcements/6191.html

⇒ Lumi uncertainty is changed to 1.6 % from 1.8 %

(Data-cards are updated. The expected BF will be updated in the next version.)

It is not necessary for preapproval I think, but is nevertheless better if you start including it Section 8: It would be nice to quote how much you are improving wrt previous results (for those channel where we had a previous result).

⇒ We will quote improvements in the next version.

Table 3: Can you add 2 columns for the results under the alternative polarisation hypotheses? And maybe a third one with SM expectations?

⇒ we will add a column for the polarization hypothesis in table 3 in the next version.

Finally it would be nice to have a summary plot of the results, maybe something similar in style to this one [ https://cms-results.web.cern.ch/cms-results/public-results/publications/HIG-19-001/CMS-HIG-19-001_Figure_014.png] but with BF instead of cross-sections

⇒ Some of our channels do not have SM predictions. Measured Upper limit in our channels are 3 (5) orders of magnitude smaller than the SM prediction in ZJ\psi (Quarkonium) channels. SM predictions for ZJ\psi (Quarkonium) channels are 10^-6(10^-9--10^-12). We prefer showing these result summaries in the table.

Comments From Maria, Nick and Jan

(*) General: It would be great if you can add a comparison of the expected results of this analysis and HIG-18-025, and point out any differences and updates since then.

⇒This concerns the channels H->JJ, H->YY, Z->JJ, Z->YY and we added a table in the next version of the AN (Table 19 and Text L 447 to L452).

L292: You give as source for the muon efficiencies the following link https://twiki.cern.ch/twiki/bin/viewauth/CMS/MuonReferenceEffs2017 - this is for 2017 only, so please update the information

⇒ For the 2016, 2017, and 2018 period the full list is given here: https://twiki.cern.ch/twiki/bin/view/CMS/MuonReferenceEffsRun2 which is included in the next version of the AN.

L329-331: What does it mean that "the event uncertainty is [...] less than unity"?

⇒ The resulting relative uncertainty is found to be less than one percent. ‘Unity’ is a typographical error.

L323-338: It does not become clear how you evaluate the momentum scale uncertainties given that the signal shape is obtained from a fit. Please clarify.

⇒ The change in the resonance mass was finally measured from the J/psi and Z di-lepton signals, in data and Monte Carlo. There we find, that the relative shift between the reconstructed MC signals and the signals in data when fit with the same parameterization is at about 0.2% (PDG values were generated). We conservatively assume that combining the signals leads to a 0.4% relative shift of the Higgs mass. Repeating the UL extraction with this shifted Higgs moves the UL by less than 1% in the 4mu (3% in the (Z->ee J->mumu) channel). We conservatively adopted the 1% in the 4mu, 3% in the 2e2mu channel.

Himal started out using the Rochester (electron) correction method and described this in the note. The scale factors are found to be one. Since we actually use the conservative estimates based on the control signals described above, we updated the text in the AN (L323 to L 333).

Figure 15: Do you have an explanation why the background fraction parameter has such asymmetric uncertainty and impact?

⇒ The impact plot is obtained with toy samples. The toy Monte Carlo demonstrates that this functional form, the combination of an exponential and a uniform function leads to an asymmetry in the fraction.

Figure 20: For the YY channel in the bottom right plot, the signal is far out in the tail, so the background from the fitted function must be negligible. Have you tested fit functions that have a larger tail than exponential + uniform? (I understand from section 7 that you tested different functions, but it is not clear if this led to any uncertainty or whether all tests were satisfactory)

⇒ Other parameterizations were exponential plus uniform and exponential plus exponential. In both cases we found that the extra contribution fits to zero fraction.

Comments From Pre Approval

Many thanks for the clear pre-approval talk yesterday at the Higgs PAG meeting. We have collected the few to-do items to follow up on. Since you have completed the list of pre-approval checks, once we are satisfied with the responses to these few items, we can move onto unblinding.

We think there should be further study of background functions that could provide alternative fits to assess the level to which the choice of background function could impact the result. During the pre-approval it was mentioned that other functions were tried but the fits were unstable. We think in some of the fits, something like a power-law function (a*x^{b}, with x as the mass and a, b are the free parameters) would work. At the very least it would be helpful to compare the nominal fit to at least one other and show the resulting background functions under the signal peak are similar (within the uncertainty on the fit).

We implemented the power law function with floating power (b) in the YY final state as discussed above. We obtain 95% CL upper limits with this new background pdf. We find at most a 2% change in the BF for the Higgs bosons (6% for the Z boson). In addition, the number of expected events for 3 sigma, 5 sigma significance are within one event with the previous estimate.

We find that the fit quality has deteriorated (the chisquare becomes worse above 75 GeV) and the change in the BF is within or close to the estimated systematic uncertainty due to the background parameterization.

For the fit on slide 12, it looks like there would be enough data to split into categories based on the resolution of the events. Could you try to see what would be the expected gain (if any) of splitting the events (just for this particular case where there are more events) into two regions based on the mass resolution (or say leading lepton eta which would make a good proxy for the resolution)?

The sample is about evenly split when requiring that the leading muon is either to be found in the barrel region (|eta| < 1.1 ) or in the forward region (|eta| >= 1.1 ). Maintaining the full background sample, the reduction of the upper limit for the barrel region sample was about 5%, while for the forward region sample the increase in the BF was about 12%. Hence, we do not expect to gain by splitting the event sample.

For slide 18 (just for the AN) can we have a version of the bottom plot with the 3 resonances split into different lines (with different colors) to see more clearly the different peaking contributions?

We now display the simulated samples in different colors: H→ Y(1S)Y(1S) in black , H→ Y(2S)Y(2S) in red and H→ Y(3S)Y(3S) in green.

Followup Comments From Nick

Many thanks for your responses. I have a couple of follow up questions,

1. For the test using a power-law, its good to see that the expected limit is not changing very much however, the significance from observing a fixed number of events (which is in the end what we would report) seems to change. What I was hoping to see was a direct comparison of the background estimate under the peak using the two functions and there are 2 ways to do it

a) plot the background functions that are fitted overlaid on top of each other all the way out to 140 GeV (so that they can be seen), with the uncertainty bands from the fit included (e.g using the RooFit method VisualizeError taking each fit result : https://root.cern/doc/v608/rf610__visualerror_8C_source.html )

b) calculate the integral of each fitted background function in a suitable range under the signal peak (say 120->130 GeV) and their uncertainty.

In either of the above, what we want to see is that the difference between the background estimate under the peak is much smaller than the uncertainty.

The quality of the fit (chisquare) with the power law disqualifies this function. Hence, it was inappropriate by Himal to use it for further extraction of results. The distribution of events in the four-muon invariant mass decreases rapidly with an endpoint around 85-90 GeV, which makes the use of any function to extrapolate the behavior into the Higgs signal region questionable. The discussion about this happened during the review of the previous paper. Given that the lower sideband, say extending 20 GeV, is depopulated and the depopulated upper sideband in principle extends to infinity, the interpolation of the background into the Higgs signal region results in zero background events. The 95% CL upper limit signal yield for this zero-counting experiment is about 3. This is reproduced with the exponential function which was used as vehicle to apply the Higgs combiner tools.

In case there is indeed one event in the signal region after unblinding, there will be a completely different treatment of the background. In coordination with the statistics committee we formulated the publication strategy last time: “The proposal, agreed with the SC, in case of a single event under the Higgs peak, is therefore to publish the event without a significance; rather provide observables (4mu, 2mu invariant masses); possibly an event display.” We do not report any significance in the present paper draft.

2. For the splitting of the events in H -> ZJ -> 4mu, thanks for checking the individual sensitivities, but can you also just go ahead and produce the expected limit from the combination of the two categories (to be sure what the combined result would be compared to the single category scenario)

From the simultaneous evaluation of the expected 95% CL BF upper limit with the two sets categorized by |eta|, Himal yields: BF = 3.34 e-3. The single category result is: BF = 3.3 e-3. This is done with 1000 toys – the closeness is probably coincidence as the sample by sample fluctuations are more like 5% – but they are close.

Followup Comments From Nick

However, for the limit, I would still like to clarify my conceptual concern. Right now you have upper limits consistent with 3 events (which is what one expects if you see no events), snd since the background expectation is ~0 those 3 events can be converted into a branching ratio. However, you seem to have concluded that you expect 0 background events and then confirm it with the exponential - I think it would be even more convincing that you expect ~0 by trying with more than one function (doesn’t have to be a power-law, that was a suggestion based on what was discussed in the pre-approval), and show that either function would yield a prediction of 0 (or at least something close such that in either case, you’re back to ~0). The point being that with >1 function predicting ~0, the arbitrary choice of that function becomes irrelevant - I think the plot I asked for would directly show that (unless I’m missing why an exponential is really the physics driven function)

Sorry, I forgot to add that the discussion on the effect of different background parameterisations was so far only concerning the YY channel, but equally (probably more important) a similar concern about the background choice is there for the J/psi J/psi and Z J/psi channels..

Please find the background functions that are fitted overlaid on top of each other with the uncertainty bands from the fit included for the Z J/psi and J/psi J/psi final states in the attached slides. We used the RooFit method VisualizeError. Please find the calculated integral of each fitted background function in the range 120->130 GeV together with their uncertainty listed in the table on the last slide.

I separately evaluated H->YY using the exponential and the power law function. The integrals yield 0.00 +/- 0.00 and 0.03 +/- 0.03 events, respectively, with the caveat mentioned earlier, that the power law function has a much lower fit quality.

[ https://hypernews.cern.ch/HyperNews/CMS/get/AUX/2021/08/02/20:52:38-74320-Pre-ApprovalFollowUpComments.pdf ]

Followup Comments From Nick

Many thanks for these - the plots are exactly what I was looking to see. I think this answers my concern so we can give the go ahead to unblind.

Please get the un-blinded results [ https://hypernews.cern.ch/HyperNews/CMS/get/AUX/2021/08/09/20:06:59-73314-UnblingingOfHIG-20-008-Step3.pdf]

Pre-Approval Message From Maria, Nick and Jan

Thanks for providing the unblinded results (though we were expecting first to see the distributions before the observed limits, but don’t worry ). Everything looks ok to us so please go ahead and update the AN and paper draft with the unblinded plots/results and after that I think we can pre-approve.

Comments From ARC member Keith

general: how did you determine the selection cuts?

The selection cuts are determined using sideband data to estimate background and signal simulations to estimate the efficiency times acceptance. The cuts are optimized to maintain high signal efficiency times acceptance while keeping background to the level necessary to obtain the best expected upper limit.

5.1 I still can't quite tell what the final muon selection is. Is it one of the standard Muon POG approved IDs? Be clear as this will justify using standard POG derived scale factors and uncertainties.

The official soft muon ID is used in the analysis. Standard scale factors are available, and the treatment of the muons is confirmed by the muon POG.

5.1 You don't mention muon isolation. But you must use some isolation criteria, right? Again, if it is a POG derived version, make that clear.

The muon isolation is described in Sec. 5.4. I moved this text to Sec 5.1 in the next version of the AN.

5.2 Also here, is it a standard POG ID and isolation?

The same standard muon ID definition is used in all channels. We use a loose track isolation. ID and isolation criteria are approved by Muon POG.

5.2 What is the advantage of track-based isolation over particle flow-based isolation?

Our criteria including isolation are based on the muon track reconstruction. The isolation cut is designed to maintain a 99% efficiency as estimated from simulation.

5.4 muon pt>3 GeV seems very low to me for a Z->mu,mu selection. The single mu trigger requirements must already be quite a bit tighter than this, no?
The single muon trigger requires at least one of the muons from the Z boson to be higher than 27 GeV. From Fig:4(c) and (e), almost all the muons from the Z boson have pT greater than 10 GeV. On the other hand, the Jpsi Fig:4(d)(f) decay muons have very low pT. Hence, we choose 3 GeV.

5.4 You mention the electron isolation here, but it would be easier to follow if you put it up in Sec. 5.2.

The muon and electron isolation descriptions are moved now to Sec. 5.1 and 5.2, respectively.

Section 7 In equation (4) does "Uniform" just mean a constant?


In 7.1.2 and 7.1.3 it would be nice if you could give the actual expected background numbers (like you do in 7.1.1).

I have now provided the expected number of events in each case. In case of gg-->ZZ*-->4 mu and qq->ZZ*->4mu, we expect 0.04 and 0.4 events, respectively.

Fig. 14. What range in the 4 lepton mass is used for these fits? The plots would be a lot easier to evaluate if you should also the sideband used for the fit. Also the fit range should be specified somewhere in the event selection section (maybe I just missed it).

These plots are designed to visualize the possible bias due to the choice of the background function near the signal (see discussion in Twiki). The fit is applied to the full range. The description of the fit region is added to the AN.

Fig. 14. How are the uncertainties plotted? The quoted numerical change in the background predictions look smaller than the plotted bands.

The details of the method are discussed here: https://twiki.cern.ch/twiki/bin/view/Sandbox/HimalAcharyaSandbox#Followup_Comments_From_Nick_AN1

The background yield is obtained from the integration of the center curve. The uncertainties are the differences between the integration of the curves that bound the bands and the center curve.

Sec. 7.3 line 280 seems to say "Fit 2" is used for the quarkonium channel. I think "Fit 2" means an exponential plus a constant. But back in Eq. 5 I thought it was just an exponential.

Exponential + Uniform is used for Jpsi Jpsi channels. Exponential only function is used for YY channels. I updated the text to state this clearly.

Sec. 10 -I'm confused about what is being fit. Is there one big simultaneous fit to the 4mu and 2mu+2e mass shapes? Or is each fit simultaneously? And within 4mu, is it one fit separately in the Higgs mass range and another in the Z range, or a combined fit to all simultaneously?

The 4mu and 2e2mu four-muon invariant mass spectra in the ZJ/psi channel are fitted simultaneously using the Higgs combiner tool (RooFit). The model allows for a common slope for the background as found from the individual fits. The signal shapes are parameterized and fit individually. In the 4 mu quarkonium final states the Higgs and Z boson upper limits are obtained independently.

-For whatever number of fits is being done, make clear what the floating parameters are and what the fixed parameters are.

Sec 10 (L 425) I added :“The signal parameters are fixed and the background parameters are floated in the maximum likelihood fits. The uncertainty associated with each signal parameter is incorporated as a nuisance parameter in the upper limit calculation.”

-For the background-only fit, are you still blinding the signal mass window, or is it now fitting the whole range?

For this exercise it was agreed to use the Asimov dataset created from unblinded data (made background in a combiner as -t -1 --expectSignal=0), feeding the combiner without inspecting the sample.

-Lines 438-444. You use all of the Upsilon resonances, which seems good. But when Y(2S) or Y(3S) feeds down to Y(1S)->mu,mu, the 4mu mass must be off from the Higgs mass. How do you handle the feeddown in the fit? With separate line shapes for each?

For the Z->YY channels we obtained the combined signal shape from the simulation of all the possible transitions, adding them with the ratios obtained from PDG, while assuming that the YY coupling to Z is the same for all Y(nS) pair states. The contribution from the indirect feed-down channels Y-> Y(1S) X is small compared to the direct decay channels Y(3s)->mumu, Y(2S)->mumu and results in a modest increase in the width. The change in the upper limit is found to be small (at the level of the systematics due to the uncertainty in the width). The details are given in Sec. A 28 (page 68). Since there is no background contribution above 75 GeV, the Higgs boson is not affected by this.

-For each fit, please also give the fitted signal yields. This will allow us to compare the final upper limits with the results of the fit. I'm guessing you are statistical limited (and the systematics aren't so important), but it's hard to tell from how this is presented currently.

I have added observed signal yields in the table. In addition, I attached the slide “FinalUnblindedUpperLimitCalcHIG-20-008.pdf”, which shows details of the calculation.

-Show the data mu+mu- distributions and the e+e- distributions. Are these shapes as expected? Do the fits agree with the MC? Are there any signs of peaking backgrounds beyond those discussed already?

The Figure 3 shows the di-lepton invariant mass distributions. In each case, the signal is modeled from simulation and fits the data together with a smooth background function.

-Table 16. In the first row with number there are some values in the parenthesis. What are those?

They are the next significant digit. We removed this.

-Table 18. Why is the Z->J/psi,J/psi result worse than expected from scaling the 2016 result. Is there some background that is worse in the 2017-2018 data?

The change is less than 1 sigma consistent with an upward fluctuation.

-Table 17 gives the expected significances for various scenarios. Did you do a test where you artificially inject such a signal into MC backgrounds and check that your fitting procedure would recover the signal as expected?

Yes. Fit diagnostics for 0 and 1 signals are done and verified that we get the same signal as we injected. This is one of the exercises for the data-card approval.

Comments From ARC member Jonatan

- Abstract. Why it is important to remark that you are "using online event filters"?

removed this statement

- Abstract. I don't fully understand the word "decay" in the sentence "Longitudinal polarization is expected for the Z boson and assumed for the decay mesons".

Shortened the statement: Different polarization scenarios for Z and quarkonium resonances are considered.

- Line 12. decay into --> Higgs decay into


- Line 12. I think CMS recommends {\rm \bar{c}c}.


- Lines 17, 19, 20, 23. I would remove the spaces in the different decays.


- Line 20. You don't need to specify again that the vector meson is V, as it has been done in line 17.


- Line 23. represent --> represents


- The Figure 1 caption is incomplete.

The caption reads now: Sample Feynman diagrams depicting direct (left) and indirect quark coupling contributions to the $\PH \to \cPZ \rm{Q}$ decay, where $\rm{Q}$ represents a quarkonium resonance. The diagrams represent Higgs boson decays into quarkonium pairs when replacing the bottom section with the upper half in each.

- Line 53. How about representing the corresponding Feynman diagrams? It would be more clear.

We also add text in the caption addressing this. We would be prepared to add more diagrams - at this point we assumed that can save the space as the diagrams can be simply derived?

- Line 66. What is Z_a?
This is a typo as it should refer to the pair decay: H -> Z_d Z_d

- Lines 68, 75, 78, 81. Paper --> paper


- Line 72. You have written twice the BR of H -> Z J/psi.

typo before → one BR of H->ZJ/psi is changed to ZPsi(2S)

- Line 81. (p p) --> (pp)


- Line 83. In the Abstract you have 137/fb and here the number is 133/fb.

Quarkonium pair trigger started later in 2017 and is missing in 2017 B, which makes this only 133 /fb.

- Line 109. It might be necessary to write what X_0 stands for.

radiation lengths - we use here standard text which assumes the common … interleaved with lead corresponding to a total of three radiation lengths …

- Line 110. Particles should be written in non-italic form, therefore you should write ${\rm Z \to ee}$.


- Line 116. In this sentence the MET doesn't take into account electrons or muons. Is this correct?

Yes. The sentence describes the standard Run-2 primary vertex calculation.

- Line 130. ZJ/psi channel --> the ZJ/psi channel


- Line 135. probability of greater --> probability greater


- Lines 138-139. As you have already required an invariant mass for the dimuon system, I would remove the requirement of pt > 0 GeV for 2016. In fact this is not a requirement.


- Line 165. In this Paper --> In this paper


- Table 1. I would explicitly add "Acceptance relative change [%]" (or something like that) in Table 1.

Use your proposal - done

- Line 214. selectrion --> selection


- Line 215. shonw --> shown


- Line 229. with each pT > 4 GeV --> with pT > 4 GeV each


- Overall, a table with the different selections would be very helpful. Maybe also another table with the yields.
Selection criteria are described in the text and final yields are presented with the 4-lepton invariant mass distributions. The intermediate yields do not contribute to the calculation of upper limits BF at 95% C.L. These are our arguments not to include such table and save space.

- Line 252. invariant mass distribution --> invariant mass distributions


- Line 257. Which alternative functional forms have been tried?

exponential only, power law, double exponential, low order (2) Chebyshev polynomials

- Line 331. this Paper --> this paper


Follow Up Comments From ARC member Keith

I still don't see why track-based isolation would be better that PF isolation.

Both, PF and track-based isolation for muons are available. For both, studies for Run-2 are published and the scale factors are established. The agreement between simulation and data is found with scale factors close to 100% and systematic uncertainties of less than 0.5% for both methods. In this as in the previous analysis we chose track-based isolation, approved by the muon-pog.

I'm sorry, but I still don't understand what the "full range" of the fit is here. Where in the AN is it described?

In each channel, the fits are performed within the ranges as chosen for the final results. These visualization plots were requested to focus on the regions near the signal to demonstrate the level of difference under the signal. The ‘full ranges’ are described where applied to the different channels. We will collect the information in a table here.

All of the different fits need to be spelled out clearly in the AN. Let me see if I understand it. For Z+J/Psi, you simultaneously fit the 4mu mass and the 2e+2mu mass. Those shapes are both fit with an exponential + a constant. You float a single value for the exponential and apply it to both backgrounds? What about the constant part? Is it common between the 4mu and 2m2e as well? What would really be the most helpful is a table with all of the floating and fixed parameters for each fit.

Then you have limits for both Z+J/Psi and Z+Psi(2S). Do you do two different versions of the 4mu and 2mu2e shapes, one to get the Z+J/Psi limit and one to get the Z+Psi(2S) result? Or you have a single signal shape that includes both of them in the appropriate ratio? Again, I think a table with all of the fit parameters would help me a lot.

The above example is for Z+J/Psi, but similar information is needed for the J/Psi+J/Psi and Upsilon+Upsilon fits as well.

We are going to implement the description to address this.

OK. So is the signal line shape in Fig. 21 bottom right including all of the Upsilon(nS) states then?


I guess you mean AN table 16, right? I don't really understand the "observed yield" column. For example for the H -> YY case you observe 2.86 events? From the plot in Fig. 21 bottom right it looks like your fitted yield should be very close to zero. Also include uncertainties on the fitted yields.

Observed yield corresponds to the 95% CL upper limit. The corresponding BF is the observed BF. In case of H->YY with no signal and marginal background the yield is close to the zero-counting value.

-Table 17 gives the expected significances for various scenarios. Did you do a test where you artificially inject such a signal into MC backgrounds and check that your fitting procedure would recover the signal as expected? >
> Ans: Yes. Fit diagnostics for 0 and 1 signals are done and verified that we get the same signal as we injected. This is one of the exercises for the data-card approval.

That study should go in the AN for the next version.

We will add the information in the next version of the AN.

Comments on AN From ARC member Nuno


+ selection

it was not clear, incl. from the AN, whether a specific criteria,

  1. e. quantitative figure of merit, was used to decide on the selection; I trust this will have been discussed extensively in previous review steps leading to unblinding. this element being always central for a search, it would be appropriate to try to motivate the rationale for the selection cuts, in addition to listing them.

We will describe this in new version of the AN. Please see the answer to Keith on that, too. We also propose to add the description in the paper.

I noted this addition in paper v6 (wrt v3) "Further selection criteria are applied to the different four-lepton final states to achieve the lowest expected upper limit at 95% CL. This optimization was performed with data with the direct decay signal regions removed [64]." It is nice to see the reference to blinding technique; indeed the main point here would be ensure the reader our selection is determined in an unbiuased fashion. But is the optimization really done with signal (sideband) only, not MC?? could some more info (or reference!) on method used to quantify lowest 'expected upper limit at 95%CL' be added perhaps.

We added details the following:

Further selection criteria are applied to the different four-lepton final states to achieve the lowest expected upper limit at 95\% \CL~\cite{junk,read,combiner}. This optimization was performed with data with the direct decay signal regions removed~\cite{blinding} and replaced by simulated events, and with the simulated signal shape.

+ combination of the different data taking years

this does not seem to be mentioned in the paper all results report to the (somehow) ‘merged' dataset paper should clarify how the 'combination' is achieved this is relevant because of course the datasets have different features (resolutions, backgrounds, calibrations, triggers, etc), which the affect the fitting and the efficiency correction parts of the analysis

The combination of datasets 2016, 2017 and 2018 is described now in the AN Section 11.2

suitable to mention in paper, too?

We do not consider mentioning it for this letter.

from the appendix of the AN, sections 18 and 19, I understand systematic changes on the fit model are estimated in sec.19 these systematics seem to be missing however in the systematics table

This uncertainty is part of the signal model uncertainty.

I also understand that some MC ensamble is constructed merging the 3 years (sec.18)

how can such a merging be justified as most suitable choice; i.e. couldn’t the three years be included in a simultaneous procedure, employing corresponding parameters (in data fit and mc efficiency) per year

was the closure of the adopted averaging procedure confirmed?

The samples are not merged. Shapes are determined for each year separately.

ok, for fitting; but I understood they were merged for efficiency calculation purposes (?)

The efficiencies are combined numerically. The change in the UL outcome is found to be small when obtained for the three years individually combined in the calculation. The difference is included in the efficiency systematic uncertainty (AN 19.)

+ likelihood description

this seems to be missing, in both paper and AN modelling of separate channels and components is described but the actual fit that is used finally in the analysis is not e.g. it is unclear if it is a fit at a time or all at once e.g. if direct and feed-drown channels are treated separately

We clarified in Section 8

no fit related systematic appear listed in Table 2 or section 6 what does “signal modelling" in l.228 correspond to in the table?

All the parameter uncertainties of the signal models are included in the upper limit estimate as nuisance parameters.

is the fit stability and closure verified, e.g. with toy MC

a 1D fit to the 4-lepton invariant mass is adopted in the analysis couldn’t e.g. inclusion of dilepton mass add more complete characterization of the data how are the plots in Fig.3 of the AN used in the analysis?

Yes, fit stability and closure were verified and certified. The dimuon signals are selected by mass interval, only. The choices are supported by the Fig.3. No peaking background is expected.

great, thanks; could you for convenience specify relevant section(s) in the AN

This is AN Sec. 8, but also the exchange with the statistics committee to follow the standard approval process: see the discussion above on the fit and data card procedures followed here: https://twiki.cern.ch/twiki/bin/viewauth/CMS/HiggsWG/HiggsPAGPreapprovalChecks

+ observable

what’s measured is presumable the product cross-section x BF this could be clarified in the paper as well as how BF final results are arrived at

Eqn. 6 in the AN describes sigma x BF = ... . The final results are obtained by dividing by the quoted Higgs (Z) production cross section.

+ parameter fixing vs constrainig

various parameters are extracted from simulation and fixed in the fit to the data couldn’t constraining rather than fixing be more suitable e.g. it would automatically account for associated systematics

Only signal parameters are fixed.

+ data vs MC

MC simulation is used, in both fitting and efficiency components were data-vs-MC comparisons performed, e.g. using calibration samples were thus derived corrections applied to the MC in the nominal setting and leftover discrepancies propagated as systematic errors


great, thanks; could you for convenience specify relevant section(s) in the AN

In AN-19-232 we refer to the previous analysis HIG-18-025 (related note AN-18-106 (Sec 6) and for muon T&P for quarkonium channels also therein to AN-18-201).

+ psi(2S)

excited charmonium and bottomonium states appears treated slightly different in the analysis also the choice was made not to search for the psi2(S) using the same exclusive decay (mumu) but rather via the inclusive decay (Jpsi X)

was the search in the exclusive mode extended to the psi2s dimuon mass region (eg as a check?)

does increased BF compensate inherent loss of resolution and dependence on simulation when adopting the inclusive approach in AN section 12 I realise this is briefly discussed; I’m curious whether the estimated factor of ~2 favouring the inclusive approach is significant (i.e. whether its uncertainty is not large)

I wonder if some justification could be at any rate added to the paper draft

B(ps(2S)i→J/psi) x B(Jpsi → 2 mu) is 3.7 % and psi(2S)-> 2 mu = 0.8 %. Given the acceptance times efficiency is comparable, and we use the optimum signal from the exclusive decay, the factor is 2.2.

Comments on Paper From ARC member Nuno

(line numbers refer to paper version 3)

83 given 3 data taking years are listed, luminosity per year could suit here?

We propose to use total luminosity as in other CMS letter publications.

130 how is the "Z boson specific” trigger specific/dedicated for the analysis, can this be reinforced with additional info other that than the pt threshold

We removed 'specific' - it is not warranted, as these are widely used triggers.

Table 1, move caption to top of table, as standardly done, and for self consistency w/ other figure


217 “as to fit to a common vertex with a probability”, as stated reader may wonder how this probability was estimated

Changed to: ... a common vertex with a probability of greater than 0.5%, as determined by a Kalman vertex fit.

255 “exponential plus a uniform function” can a uniform (constant?) function for the background be physically motivated?

It fit the data in the given range.

263 "mass resolution” as it is not mentioned I presume the per-event resolution is not employed in the analysis, here you may mean the standard deviation parameter of the Gaussian functions and corresponding parameters of the CB functions

We more specifically address the parameters, here: the standard deviation in the Voigtian function.

264 “fixed in the fit to the data”, wouldn't constraining be more suitable here how are associated systematics estimated are the signal shapes fixed/constrained from simulation or are relative normalisations free to float — could the choice be motivated, and in any case be made more clear in the text

see above.

270 on feed-down signals. feed down fractions are taken from simulation. are these all well known, resulting in negligible systematics? it is not clear how fit to feed down signals is done i.e. what shape parameters are free or ow constrained nor how feedddown contribution is considered in the fits to extract the direct components

They are taken from the PDG; uncertainties are small and are included as nuisance parameters.

272 “For the fits to the feed-down channels”, are these really different fits? surely the direct channels could not be discarded, right?

Feed down channels fit separately with the same background as in the direct channels. With no signal, there is no correlation expected and found.

275 “No significant correlations between the different signal contributions are found.” clarify which contributions are being referred here

Signals from the feed-down and direct channels.

Table 2 Items in session do not correspond exactly to listing in the table? reword last item, limited sample size

done. In the table the "lepton efficiency" is changed to “lepton Identification”. The “Muon Isolation” is added in the table.

actually, I no longer see the systemartics table in v6 ... (I assume this was dropped following separate review requests?)

Yes, to adjust for the letter.

290 here some text is missing (!), introducing the listing that follows

It is a latex formatting error (nothing is missing).

I see the connection between text a listing is still missing in v6. Suggest to replace full stop with colon in line 283.


295 “the method used to measure the efficiency”, which is it? There is no section in the paper describing how the efficiency is measure In its absence it is inferred this is done from MC (Which would render unclear what was the method and the source of its uncertainty ) What is missing then is the description of the procedure employed to correct the MC with measurements from data, and describe associated systematics Presumably, different components of the efficiency were estimated/calibrated with different methods?

Efficiency is obtained from MC simulation and the T&P method is used to obtain scale factors and uncertainty estimates.

thanks for having added info to the paper, now specifying the method in item #ii of the systematics listing

299 same comment for electron, as above for muon

Efficiency is obtained from MC simulation and the T&P method is used to obtain scale factors and uncertainty estimates.

while this has now been specified for the muons; for the electrons, the statement remains "It arises from the method used for the efficiency measurement"; if it is indeed similar as for the muons, try to reword/drop statement about "the method" that may remain unspecified.

We shorten the sentence to refer to [71].

302 “fit efficiency”, what is it?

It is the percentage of events surviving the 4 lepton vertex fit criterion when applied last.

so it is not so much an efficieny of a fit, as it is the efficiency of a selection criteria based on a fit quality ... not a big deal but siggest to simply refine accordingly

We replaced it with 'criterion'

304 need to specify how energy scales and resolutions were calibrated

Differences in the lepton resolution and momentum scale in data and simulation were estimated from J/psi and Z dilepton signals and extrapolated to the four-lepton signals. The systematic uncertainty is the relative change of the upper limit when varying the signal mass mean and width by these differences. This text is added.

thanks, more clear in paper now

325 “coupling strength of the bosons to any Y(nS) pairing is assumed to be the same"

done. Italic (nS) is changed to normal.

Table 3 caption “will be updated after unblinding”, confirm whether this was done

removed (will be updated after unblinding). The numbers are final.

// type A

46, 47 branching fraction


81 and elsewhere, ensure consistent use of symbol “pp” i.e. use tdr/latex macro

used in all text {\Pp\Pp}

214 selection

The selection → selection

217 case of the jpsi


222 in the jpsi


229 each with


Fig 4 caption: plots show, observed -> existing

done. we keep observed.

note that (i) 'observed limit' may sound self-contradictory, and (ii) not clear whether used limits are 'existing' ones or obtained here

... the observed UL BF at 95% CL as obtained in this analysis ...

234 and the azimuthal


236, 238 use consistent symbols, i.e. \newcommand

space is deleted in YY

238 distributions


246 space missing


289 Tab -> Table


291 uncertainty … is


296 uncertainties … are


332 in association with -> and


335 ", are used"


342 Y consider all -> stands for the


462 CMS Collaboration


465 Higgs


491 fix symbols in title


492 drop page range


505 ATLAS Collaboration


506 fix symbols, names in title


527 Higgs


536 fix reference, incl. collaboration name, symbols, remove repeated symbols etc


543 revise reference, missing author filed, doi seems not openly available etc

Fixed reference.

594 fix particle symbol


611 revise reference

%GREEN % done

613 check authorship, CMS


Comments on AN From ARC member Hwi dong

Thank you very much for excellent analysis. It looks very nice and study rare decay of Higgs boson widely, extending from the previous analysis.

I have following 1st set of comments as below with my major concern. It might be overlapped with other ARC members’ comments which I didn’t have a chance to follow up yet. Please point out your answers delivered if my comment and question are same and we can discuss further details in upcoming ARC-author meeting soon.

1. Target journal - Do you target a letter style journal (ex. PRL or PLB)? In my opinion, overall description written in the paper draft is not so much friendly for readers, and therefore I think it’s less comprehensive or minimized to learn something. For example, I think event selections are quite important but the description of event selection (how to decide the working point and optimize the performance in each channel) is only delivering the facts (what you did). If you target to PRL (we should discuss whether it’s feasible), then of course it’s fine. But if not, I think JHEP style is more suitable for your study. I think readers will learn a lot if you share the details how to optimize and motivation of the selection cuts etc.

After discussion, we decided to target PLB.

2. Trigger strategy and description - you describe the trigger selections in LL129-139. But as your analysis contains several channel and therefore your trigger strategy is quite different w.r.t. the channels, it is always better to provide a table for the summary of pt, eta, and other requirement w.r.t. such online selection. I have same comment for the event selection parts. It’s quite unclear in the text of the paper draft. - According to AN section 3.2, you use HLT_IsoMu27, HLT_Ele27_xxx for the channel with H->ZJ/psi (Z->ee or mumu), and the others are based on BPH trigger. It looks reasonable approach but it’s a bit strange w.r.t. your offline event selection. I will comment more in the event selection issue - AN table 5, 6, you counted the yield with the order of preselection => trigger etc. It looks strange to me why you count the preselection (it’s offline lepton based) => trigger. Certainly trigger is online selection and it should affect all aspects of your samples you start this analysis. It’s looks strange order. Moreover, preselection contains dilepton vertex cut etc. which is very dangerous order in the selection strategy. - How did you get the reco+id, iso and trigger efficiency for muon and electron? First of all, the description is not well available in either paper or AN. And if you applied the efficiencies (and SFs) and systematic uncertainties provided by muon POG, did you report and discuss details in muon POG and its contact in HIG group? I think your application on the muons particularly looks a bit unusual, I think it’s quite important step.

We do not include such table as letter publications usually point out the important features, but not in such detail. The selection criteria and steps are common, particularly for quarkonium analyses, and with respect to the signal very open. In our final states the order preselection/trigger was checked and does not matter. The lepton efficiencies and uncertainties were discussed with the analysis object groups and the analysis group.

3. Offline event selection - The most difficult part to understand. I think the section 4 should be reorganized and improved significantly. As mentioned above, if you target the letter and deliver only facts you used, at least you need a table to summarize the requirement, cut value, etc. for each channel. - If your selection is quite standard in BPH (I am not working in the group unfortunately), then please include some references that I can also follow up the standard treatment in similar BPH.

We add text to describe the selection strategy. We checked the order of the criteria. All the signal selection criteria are very open to maintain a high efficiency.

Comments on Paper From ARC member Hwi dong

L184: pt > 3 GeV, but at least in ZJ/psi channel you used IsoMu27. So one muon should be greater than 27 GeV in offline. In general, most of muon POG analyses, at least 1 GeV is added for offline muon pt due to the difference of the resolution performance in trigger. Moreover, 3 GeV is quite low for the accuracy of muon reconstruction. For instance, the GLB muon has turn-on effect around 6-7 GeV.

Iso Mu 27 trigger requires one isolated muon to have pT greater than 27 GeV. Soft muons have been studied from pT=2 GeV on. The trigger muon is the leading muon which is safely above 27 GeV according to simulation.

LL195-196: you mention only the BB and EE cases, but didn’t you accept BE case?

We did. The sentence though is publication board standard description.

L204: same question as muons. EGM POG usually used +3 GeV in offline pt of electrons compared to the trigger threshold. It will be more significant problem because electrons’ accuracy is worse than muons. What is fake rate of electrons in such low pt region?

As all of our electrons are high pT electrons from Z boson decays we do not have that problem (see Fig. 4).

L205: what about the ECAL gap? Is it not applied?

It is included in simulation and corrections.

L207: how did you treat the electron mis-charge identification? The mis-charge id of muon is negligible up to 1 TeV, so it’s not a problem but electron is larger, moreover you are accepting very low electrons as well, so it might be problematic.

The electron pT range is 10 - 100 GeV.

LL208-215: I understand the motivation of common vertex cuts to suppress fake and combinatorial backgrounds, but I don’t see any optimization discussion for the cut values. Are they optimized and so how did you do? The details (plots, procedure, test etc.) should be included in the documents, at least AN.

We follow the overall optimization strategy to obtain the lowest UL at 95% CL.

L209: why do you need the dilepton pt cut 5 GeV (and L211)? Before applying the cut, how did you decide the pairs? Most of channels are based on 4 muons and you have several choices of the combination. How did you select the best pairs? Then what is the order of the additional dilepton cuts? Did you select pairs first then apply the dilepton cuts or vice versa? Is there any dependence on the order of the selections?

We first apply very loose selection criteria defined as preselection in the AN. We first select di-leptons by mass. We find the 5 GeV cut in our optimization for the best expected UL. After the charge requirement pairing +- +- combinatorics is strongly suppressed. There is no dependence in the order of the selection in the optimization cut.

- Figure 2: you bin size is 3 GeV. Though statistics are not so big, I think it should be 2GeV or lower because of the dilepton (and four lepton) mass resolution around Higgs mass. As you see the signal distribution, your data points are just 1-2 bins to denote them. If you have 20 bins per plot for example, it might look better to see resonance structure (statistics are not so poor to do, I think).

The actual fit is an un-binned ML fit. The plots serve to demonstrate the background distribution.

- Figure 2: maybe stupid question but why the peak of H->Zpsi(2S) (green distribution) is lower than 125 GeV
Psi(2S) is measured from the daughter J/psi in the decay Psi(2S)->J/Psi X where the X is not detected. It is dominated by X->pi pi.

- If you estimate backgrounds with data distribution by the fitting, is it not helpful to relax your background suppression on the event selection? It is related to the optimization of all event selections.

yes, we have the same strategy, i.e.relax selection criteria.

L218: why 0.5%?

The dimuon vertexing probability becomes uniformly distributed after 0.5%.

L218: 3.5 GeV pt cut of J/psi candidate but according to table 3 in AN, the pt cut is not applied in 2016 triggers. Has it been applied in 2016 as well?


L218: same question above, how did you select the J/psi candidate pairs with which order?

same order as defined for Z J/psi

L221: dimuon mass resolution should be expressed with % (same as dielectron case above)


L224: 5% vtx cut additionally for 4 muons, how much impact after apply the common dilepton vtx cuts?

The 4 mu vtx cut relatively reduces the background 2 times more than signal.

Figure 3: same question, why some signals’s peaks are lower than 125 GeV?

same as above

L231: The dimuon invariant masses are overlapped. How many events are overlapped between two plots in figure 4? Why do you need to separate them in the plots?

We are testing two different di-lepton mass ranges with two different models (in the Y(1)Y(1) case we do not make assumptions about the Higgs decays). The plots demonstrate these separate exercise.

LL233-234: same questions for cut values, why they need and the justification of the cut values

We will add optimization strategy in the AN.

I have more questions on the event selections but I can ask more if I am still not clear with your answers for above questions.

4. Systematic uncertainties - ii and iv contains four muon (lepton) vertex fit. It should be more clarified.

We explain now how we obtain 4 mu vertexing in the AN.

- vi: is it the effect from Parton Distribution Function choice? How did you estimate it?

We added reference and text.

Comments on Paper From ARC Chair Keith

Thanks for the updated documentation. Below you can find a first pass through the paper for editorial comments. In general, I found it to be very nicely written. I also think it would make sense to assign a Language Editor at this point. We will try to get back to you as soon as possible with any final physics questions.

Title -You should probably mention the Z decays, too

Here is a new proposed title: Search for Higgs boson decays into $\cPZ \JPsi$ and Higgs and Z boson decays into $\JPsi$ or $\PgU$ meson pairs at CMS

Abstract -Mention CMS somewhere

Included it in the first sentence

-line 2 drop "decays"


-line 6-7 I think you can drop the whole sentence "Different polarization scenarios..." just to shorten the abstract


General: all plots should have "Preliminary" removed.


Introduction -line 8 "the SM" => "SM"


-lines 12, 18 you define "(CL)" twice

Removed from line 18

-line 17 "... have been searched for" make clear by which experiment(s)

Added “have been searched for at LHC”

-line 19 "... expected values in the SM" better to add in the theory reference with these predictions already here

Sentence line 16 refers to the theory. In line 19 the references [19-21] quoted in the previous sentence also evaluate these deviations from the SM with the corresponding theory references.

-Fig. 1 caption "indirect quark" => "indirect (middle, right) quark"


-line 45 "... experimentally at the LHC" add in by which experiment

added"… studied by the ATLAS collaboration ..."

-line 69 or so. I think it would help make clear what you are actually searching for to explicitly list out all of the final states around this point.

We merged the subsequent two paragraphs.

Simulated samples -line 142. "in the four-muon invariant mass" you also use the MC for the 2mu+2e invariant mass, right?

replaced by four-lepton

-line 169 I don't think you've defined "acceptance" yet here. You might want to just move this whole table down to when you are talking about systematics.

Moved table behind systematics.

Changes are given to 2 digits only (30.2 -> 30)

-Table 1. "Transversal" => "Transverse" and I also think "Uniform" => "Unpolarized" is better to be consistent with the language used in the text


Event reconstruction -line 209 "with data in the direct" => "with data with the direct"


-line 213. you first mention the vertex fit probability here. Down in line 221 you mention that this is "determined by Kalman vertex fit probability" Move up the description of how you get the vertex fit probability to the first mention in line 213.


-line 217. Not clear what "threshold" is meant here.

added “.. close to the Z J/psi threshold..”

-line 233 "A Y pair" => "An Y pair"


-line 233 Not clear (to me at least) what "Y" means here as distinct from "Y(1S)" I would normally think "Y" alone would mean all of the Y states, but then the mass window for "Y" couldn't be different from "Y(1S)" so I'm confused.

Here, Y refers generically to all Upsilon states. But we went throughout the paper to correct inconsistent use (and formatting) of Y(nS)Y(mS) and Y(1S)Y(1S) occurrences.

-line 244-245. Need to mention the electron efficiency scale factors as well, I think.


-line 251-252. 28 and % get split across the lines.


Signal extraction -line 256-258 I thought the background fits were now done across the whole mass window including the signal region.

New sentence: The background shapes in the four-lepton invariant mass distributions are obtained from data.

-line 259 "uniform" => "constant"


-line 262 you use "direct channel" quite a bit, but I don't think it is ever explicitly defined. There can also be confusion with the way "direct" is used in Fig. 1 so some care is needed.

we define direct signal as ”... a combination of the same functions as used for the Higgs directly decaying into ground state mesons (direct signal).”

-line 269 "line" => "lines"


-line 275 and 353 "Psi(2s) -> J/Psi" I think would be better with a "+ X" or similar added.

L275 : changed to “...involving the inclusive transition from psi(2S) to J/\psi….” L353 : changed to “ ...result of an inclusive Psi(2S) to J/\psi transition….”

-line 274-285. Say explicitly that separate fits are performed to the 4 lepton mass distributions for the different signal hypotheses.

We adopt the sentence. “Separate fits are performed to the four-lepton mass distributions for the different signal hypotheses.”

Systematic uncertainties -In point ii I think you need to mention the electrons as well.


-line 317 "1.73" => "1.7"


-line 319 I think those numbers are in "%" right?


Results -line 327 and 345 "Higgs (Z)" => "Higgs or Z"


-after line 339. Add some comparison between your limits and those expected from the SM and/or BSM possibilities.

We add a statement such as: The observed upper limit in branching fractions agree with the expected limits and are 826 times in the case of H → ZJ/ψ [18] or higher than the SM predictions for these rare channels. With the increase in luminosity at the high luminosity LHC and the combination of final states for the channel H → ZJ/ψ, the observed branching fraction could reach the SM predicted value within an order of magnitude. In the H → Υ ( nS ) Υ ( mS ) channel, the observed upper limit in branching fraction is found to be a factor 5.8 higher than the value from earlier SM predictions [31], which could be reached with the high luminosity LHC data.

-Table 2. A few lines have leftover "(X)" values. I think those should be removed.


-Table 2. The J/Psi,J/Psi and Y(nS)Y(mS) lines have rather asymmetric expected ULs of 4.6+2.0-0.6 and 3.6+0.1-0.3. Do you understand the asymmetry? And why it would have different sign for the two channels?

The background is qualitatively very different in these channels. It is very small in the YY channels.

Comments From Language Editor Paul

Title : Ask pen names being used


Abstract : For the first time decays...>>For the first time, decays....


Abstract: Integrated luminosity of about 137 fb-1 >> Integrated luminosity of 137 fb-1


Abstract: While an observation >>An observation


Abstract: the standard model, no significant excess.... >> standard model, and no significant excess


Abstract: Upper limit at 95% CL >>Upper limit at the 95% CL


Abstract: Put some numbers in the abstract?

%GREENE% done

L5: boson are so far consistent >> boson are, so far, consistent

L:18 at LHC>>at the LHC


L39: fermions [11]. An example is a version of >> fermions [11], for example, a version of


L40: Yukawa couplings are possible.... >>Yukawa couplings are also possible


L.43: with Froggatt-Nielsen >>with the Froggatt-Neilsen


L:52 in the leftmost, direct amplitudes the Z boson >> in the leftmost, the direct amplitudes of the Z boson


L:64 sentence is confusing

changed to "The Higgs boson is expected to couple to quarkonium pairs that include radially excited states64with comparable strength [37]."

L:72: B>>branching fractions


L:166 for Z boson>>for the Z boson


L:202 The selection on >> The selection based on


L:214 by Kalman vertex >> by a Kalman vertex


L237: two candidate dimuons the absolute value >> two candidate dimuons, the absolute value


Comments From Approval Presentation From Nick and Jan

Many thanks again for the very clear approval presentation yesterday during the Higgs PAG meeting. We have collected the comments from the discussion and added them below. Once these are addressed (and there is really only one to address), we think we can approve the analysis and move towards the paper. As was mentioned, after that we should aim to get the GL from the Language Editor on the paper and then go for CWR as soon as we can.

On the extrapolation to the HL-LHC in the summary slide, we should not include this in the paper and instead we think it will make a very interesting PAS in the UPSG for Snowmass. Note that we should figure out the luminosity gain since the background is not 0 so the sensitivity should go more like the sqrt(L) given the current analysis and if the analysis is improved to cut away more background, this should be studied and reviewed in the UPSG.

Sentences regarding HL-LHC are removed.

Please check the luminosity rounding since it should be 138/fb if using the full Golden JSON. We note that there is in fact one channel where there is less luminosity available however.

Luminosity is changed to 138 fb-1

Comments From Language Editor Paul



Events with \cPZ boson decays into an electron or muon pair, or with quarkonium resonances decaying into muon pairs are selected.

Change to

Events with \cPZ boson decays into an electron or muon pair, or with quarkonium resonances decaying into muon pairs, are selected.



For the $\PGg \psi\text{(2S)}$ and $\PGg \PgU\rm{(nS)}$ decays, the corresponding upper limits are, respectively, 3 and 5 orders of magnitude larger than the SM expectation.

Change to

For the $\PGg \psi\text{(2S)}$ and $\PGg \PgU\rm{(nS)}$ decays, the corresponding upper limits are, respectively, 3 and 5 orders of magnitude larger than the expected SM branching fraction.



Recently, the CMS collaboration published upper limits on the branching fraction for $\PH\to \cPZ\rho$ and $\PH\to \cPZ\phi$ at the 95\% CL that exceed SM expectations by more than a factor of 730~\cite{Sirunyan:2020mds}.

Change to

Recently, the CMS collaboration published upper limits on the branching fraction for $\PH\to \cPZ\rho$ and $\PH\to \cPZ\phi$ at the 95\% CL that are larger than the expected SM branching fraction by more than a factor of 730~\cite{Sirunyan:2020mds}.



In the rightmost diagram both photons could be gluons in which case additional soft-gluon exchange occurs.

Change to

In the rightmost diagram, both photons could be gluons in which case, additional soft-gluon exchange occurs.



The results presented in this paper are based on proton-proton ($\Pp\Pp$) collision data recorded in 2016, 2017 and 2018 with the CMS detector at a center-of-mass energy of $\sqrt{s}=13\TeV$, amounting to an integrated luminosity of 138\fbinv in $ \cPZ \JPsi$ channels and 133\fbinv in the quarkonium pair channels.

Change to The results presented in this paper are based on proton-proton ($\Pp\Pp$) collision data recorded in 2016, 2017, and 2018 with the CMS detector at a center-of-mass energy of $\sqrt{s}=13\TeV$, amounting to an integrated luminosity of 138\fbinv in $ \cPZ \JPsi$ channels and 133\fbinv in the quarkonium pair channels.



The distribution of the decay angle $\theta$, defined as the angle between the positive lepton direction of flight in the rest frame of the intermediate particle (\JPsi meson or \PgU meson or \cPZ boson) with respect to intermediate particle's direction in the parent particle's (Higgs or \cPZ boson's) rest frame, is proportional to $(1 + \lambda_\theta \cos^2\theta)$.

Change to

The distribution of the decay angle $\theta$, defined as the angle between the positive lepton direction of flight in the rest frame of the intermediate particle (\JPsi meson or \PgU meson or \cPZ boson) with respect to the intermediate particle's direction in the parent particle's (Higgs or \cPZ boson's) rest frame, is proportional to $(1 + \lambda_\theta \cos^2\theta)$.



To suppress muons originating from nonprompt hadron decays, the impact parameter of each muon track,

Change to

To suppress muons originating from non prompt hadron decays, the impact parameter of each muon track,



In case of the $\JPsi$ pair channel, each dimuon has to fit to a common vertex with a probability of greater than 0.5\%.

Change to

In the case of the $\JPsi$ pair channel, each dimuon has to be fit to a common vertex with a probability of greater than 0.5\%.


In the case of the $\JPsi$ pair channel, each dimuon has to have a common vertex fit with a probability of greater than 0.5\%.

changed to

In the case of the $\JPsi$ pair channel, each dimuon has to be fit to a common vertex with a probability of greater than 0.5\%.


For the channel $\cPZ \JPsi\to 4\Pgm$ ($\cPZ \JPsi \to 2\Pe 2\Pgm$), the Higgs boson signal is parameterised with a sum of a Gaussian and a Crystal Ball function~\cite{crystalball} (two Crystal Ball functions) with common mean. Similarly, the Higgs boson signal in the \JPsi (\PgU) pair channel is described with a double Gaussian function (combination of Gaussian and Crystal Ball function) with common mean.

Change to

For the channel $\cPZ \JPsi\to 4\Pgm$ ($\cPZ \JPsi \to 2\Pe 2\Pgm$), the Higgs boson signal is parameterized with a sum of a Gaussian and a Crystal Ball function~\cite{crystalball} (two Crystal Ball functions) with a common mean. Similarly, the Higgs boson signal in the \JPsi (\PgU) pair channel is described with a double Gaussian function (combination of Gaussian and Crystal Ball function) with a common mean.



The scale correction factors in muon identification, isolation and trigger are observed to deviate from unity by less than 2(2), 0.5(0.5) and 1(3)\% for $\cPZ \JPsi$ (QQ) final state.

Change to

The scale correction factors in muon identification, isolation, and trigger are observed to deviate from unity by less than 2(2), 0.5(0.5) and 1(3)\% for $\cPZ \JPsi$ (QQ) final state.


CWR Comments from William Ford

In the context of the Z->J/psi j/psi limit, we might add a reference to BPH-16-001, which observed Z->J/psi mu mu with a BF of about 8E-7. The results here are compatible with that one, being slightly higher limits for a more exclusive final state. Bill

We propose to write: The inclusive Z decay into a J/\psi meson and a lepton pair, which is dominated by the electromagnetic fragmentation process, was observed at a rate consistent with the SM prediction [PhysRevLett.121.141801].

To the context: Z->J/psi ll is mostly an em process. The J/psi pair decay requires Z coupling to charm quarks. The lepton pair kinematics in Zll does mostly not match the J/psi decay ll pair (discussed with A.V. Luchinsky). The SM consistency is for the BF extrapolated to full PS with assumption about a Z decay ratio, but I think the proposed statement is adequate.

CWR Comments from Vassili Kachanov

Comments to a CMS PAPER HIG-20-008 "Search for Higgs boson decays into Z and J/ψ and for Higgs and Z boson decays into J/ψ or Υ pairs at CMS"

The article is written pretty well. We have several comments and propositions.

1) In order to facilitate comparisons with other results, it may make sense to present exact values of the branching fractions for Z, J/psi, psi(2S), Upsilon(nS) decays into leptons.

We prefer to refer to the specific PDG issue we used.

2) Page 5, lines 186-187 "... candidate from other hadronic activity in the event, a cone of size Delta R = sqrt((Delta eta)^2 + (Delta phi)^2) is constructed around its momentum direction, where PHI is the azimuthal angle in radians." Apparently it is necessary to add a description of the eta: "eta is the pseudorapidity"

Introduced in line 91 (detector description)

3) Pages 5 and 6 (Event reconstruction) It is not clear the selection criteria for events with 5 or more leptons. What criterion was used for the selection exactly four leptons among all selected leptons?

Candidates are built from four leptons taken from a lepton candidate list. Only single four-muon candidate events survive all criteria.

CWR Comments from Albert De Roeck

Thanks for your analysis and paper on the search for Higgs boson decays into Z and J/ψ and for Higgs and Z boson decays into J/ψ or Υ pairs at CMS

The paper contains several searches for different channels and constitutes a nice study, based on using leptonic Z-boson and quarkonium state objects . Congrats with these results. Limits get improved but we are of course still far from expected SM sensitivity. The paper is general in good shape and I have relatively few comments.

General Comments - I had to read the introduction a few times to get the overview of what the SM expected values on these branching fractions were, for the different channels. I assume for some we do not have a number (eg for the process Z->YY which is neither mentioned here or in [36])? Others are mentioned with a large TH uncertainty it seems (see further comment below).

given in lines 57 and 58

However, if you do get more comments on this section and the organisation of it, with requests for improvements, I support that.

We reworked the introduction.

- Line 45: So do I understand that the ATLAS limits are much better than ours for this channel? I believe they made use of the di-jet decay channel instead of the leptonic one, so this is not unexpected. But by how much are they better at the end? Ie can we give their result here in a BR fraction to be compared directly with ours later?

The opposite: ATLAS publishes a BF > 100% which is unphysical. We improve their limit by about 3 orders of magnitude.

Detailed comments

- line 13: We do claim in CMS to have evidence for H->cc, so the presentation of that result here (factor 2 on the 2-sigma upper limit) is -while correct-- a somewhat "negative" way to present that... smile

We changed this part. We have added the CMS evidence for H-> mu mu.

- line 33: by how much -- as far as we know-- can these BSM effects enhance the production rate within, say, pheno studies that have been conducted so far? Could we give a yardstick here, for the reader, as a potential possibility for BR enhancement? Even if this is -- as is likely- much below what we can possibly reach with the present data, it will still be important to make the search in the data and report it in a paper as we do here, so no worries with that.

Dependent on the channel, the predictions range from 1 to 3 orders of magnitude enhancements (e.g. in H->gamma Y). We prefer to present the references for the SM predictions and BSM enhancements. We spell it out now. - line 50: "the decay" -> prefer: "the Higgs decay"


- line 59: The predictions for H-> jpsi/jpsi and H->YY in [31] and [35] differ a lot! Do we understand what causes this large theoretical uncertainty and perhaps if the latest one is supposed to be the most reliable? Reversely, if not the case, it is an extra motivation/argument to actually search for these decays, as it will also advance the SM theory understanding for this processes. As I understand our result is already closing in on the expected values reported in [31]..

The relative contribution of direct versus indirect amplitudes is chosen differently in the literature. If the YY channel BF is not found to be at O(10-5) as in [31] this would indeed help to sort this out. We point now out that this measurement also still is relevant for the SM calculations.

- lines 98-100 are a bit of unusual detail we give normally in the standard detector descriptions in our CMS papers. Is that information of particular relevance for this specific analysis? If not, it sticks out as being odd.

Following your observation, we remove the last (standard) sentence in this paragraph. - line 141 I assume there is then no specify pT requirement on the dimuon system for the 2016 data period, as it is not mentioned here..? just checking


- line 185: "leading muon" is not defined, I believe, but is continued to be used later. So I suggest to define it here (the muon with the largest pt?)

we add … leading (highest pT) muon ..

- lines 267-268 are just an exact copy of the lines before -> remove!


- line 261: More details should be given on the alternative background functions or reference given to it. "negligible" means exactly what here: much less than eg 1%? Perhaps add that here, as it does not come back in the systematics section.

We write now that a possible bias in the choice of the background parameterization is probed with the alternative functional forms second order Chebyshev polynomial or power law function, and is found to be negligible.

- line 354: expected -> observed


CWR Comments from Sijin Qian

The Type-A comments were all addressed

CWR Comments from Andreas Meyer

thanks a lot for the interesting analysis and paper draft which I found well written. Below are a few comments and questions.

All the best and good luck towards swift publication! Andreas

line by line (type B)

* lines 11-14: The formulations in this passage dont quite reflect the current status. There is evidence for H->mumu (3.0 sigma observed by CMS alone), and a signal strength has been measured at the 30% level ~1.2+/-0.4.

line 11: suggest to replace "measuring" by "observing" or rephrase completely

line 13: suggest to also state that current results on H->mumu are consistent with SM expectation. Updated the two lines. Include the recent H->mumu results. Rare exclusive decays of the Higgs boson to mesons provide experimentally clean final states to study Yukawa couplings to quarks, and physics beyond the SM branching fractions that cannot be obtained with inclusive measurements. The required sensitivity for observing Yukawa couplings to second- and first-generation fermions has not yet been reached. Recently, the CMS collaboration published evidence for the Higgs boson decay to a pair of muons []. The $\rm{\bar{c}c}$ in inclusive measurements is found to be approximately 70 times the SM expectation~\cite{Aaboud:2018fhh,2021135980}.

* Figure 1: Curious, can/should we say sth about the size of the interference terms of these amplitudes?

The theory publications conduct such discussions. Estimates are very channel dependent and in some cases can be very sizable (where in the SM destructive interference between dominant amplitudes is expected). We rather generalize the discussion to include all types of decay channels here and refer to literature.

* line 47: would replace "1" by "unity"


* line 84: should we give an explanation about why, in the quarkonium pair channels, the luminosity is lower, 133fb-1 ?

This is due to a delayed deployment of the quarkonium triggers at the beginning of Run-2. We decided to add this information to the sentence.

* line 152: Suggest to write sth like "Z boson events are generated using the Pythia 8.226 generator [56] which implements leading order matrix element calculation interfaced with parton shower, for which the tune CUETP8M1 is used".


* line 153: suggest sth like: "the INCLUSIVE production cross section PREDICTION includes"


* line 155: suggest sth like: "The generated events are then reweighted to match the pt-spectrum of the Z-boson predicted at NLO."


* line 226: It is unclear to me how the widened mass window is used to monitor the sideband population. Naively, one would expect that one can always plot the m_mumu distribution to look at the side band, and then, for the fit of the Higgs candidate, only take events from within a narrow window consistent with signal.

Your statement is correct. Small amount of background shows up in the sub-leading J/psi at this stage. The change in interval size though does not weight into the final optimization after all criteria are applied as background becomes marginal. This informs the reader that the interval was chosen wider to follow the background during optimization.

line 313: It is unusual to assign systematic uncertainties due to changes in detector conditions. We usually model detector conditions year-by-year (or era-by-era) and apply corresponding uncertainties, including estimates for their correlations. If this is not the case here, it will be useful to add a sentence to explain this.

A common parameterization for the signal model was used for the entire run period. The relative dependence of the upper limit branching fraction was found to be very small.

Type A

The Type-A comments were all addressed

CWR comments Pisa Group

congratulations for the analysis and for the nice results. Here are the comments of the Institutional Review of the paper HIG-20-008 by the CMS Pisa Group. General Comments: The paper is very well written. To further improve its clarity we would like to suggest the following changes: ● make clearer when you are addressing the Z J/ decay or the quarkonia pair decay modes; ● throughout the full paper (for reasons of compactness ?) you always write sentences like "X (Y) ... is ..... V (W)". We think that it would be clearer and less heavy to read if you wrote "X and Y .. ... are .... V and W respectively.

Included as part of several improvements.

Type A Comments

The Type-A comments were all addressed

Type B:

Abstract: For the first time, decays of the Higgs boson into a Z boson and J/ meson (...) at the LHC -> Not relevant since ATLAS has done this measurement already. It should be removed here and in lines 69 and 346.

First time in leptonic final states. While we present the ATLAS attempt, it is an unphysical result.

Abstract 4th line write “ Higgs and Z boson decays” like in the title and drop the last sentence.

Done, but also fix the next sentence: … The Z boson in the Higgs boson decay is detected …

"Quarkonium resonance" is a pleonasm. Onia are by nature short-lived particles.


Abstract 6th line: remove the sentence and start the next sentence with “As no significant excess is observed …” Abstract remove everything after “Higgs decay branching ratios …”: there are no numbers and leave the list for the summary, instead add the numerical upper limit for H to Z J/ for the longitudinal polarization case . This way the abstract is shorter and much more appealing.


1. Introduction Lines 7-15: This paragraph would benefit from a reorganization of the statements: first lack of understanding of why the Yukawa couplings are what they are, then the lack of sensitivity for the 2nd generation couplings and, finally, the possibility to recover such lack of sensitivity by exploiting rare decays


Line 21: missing references about 3 and 5 orders of magnitude larger…

Same references as quoted in sentence before.

line 28-29 the sentence is unclear. In the last graph there are two loops.

The loop connecting to the Higgs boson – we make this more clear.

It is not clear if the sentence about the study of the indirect processes (lines 36-38) and the following sentence about Yukawa couplings (lines 38-44) are related. If they are related this should be clarified, if they are not, the second sentence should be modified to make clear we are discussing something different

Sentence moved now into first paragraph. Sentence L36-38 removed.

In the paragraph of line 45-49 it is not nice that in one case the absolute value of the result is presented (BR limit close to 1) and in the other case the ratio with respect to the SM predicted value is quoted (730 times larger than the SM value). Both results should be presented in the same way.

For our result we also state BF, only, as does ATLAS.

line 46-47 what does it mean a BF that exceeds 1 ?


line 50 “third related” what was the second ?

reorganized the introduction

the sentence of lines 52-53 should be improved: what does it mean that an amplitude is replaced by “the same direct coupling”?


line 61-62. The sentence “leading to an increase of an order of magnitude in the related B…”: Do they mean that the BR(H->J/psi J/psi) is an order of magnitude larger than VR(H->J/psi gamma)? Sentence to be improved

revised: Inclusion of the mechanism ..

The sentence of lines 64-65 should be moved earlier, at the end of the review of the predictions and before the sentence that starts in line 62 with “Recently…”


line 66-68 this sentence is unclear, what are the benefits ? clarify or delete The three sets of results in this paper should be listed in a clearer and more easily readable manner. For example the fact that only for the third set a new paragraph is started is a bit confusing. In particular, the sentence of lines 75-78 has to be improved. The attempt to clarify which analysis are new ( (2S) and Y(1S)Y(1S) ) and which are just an update with higher luminosity (J/ J/ and Y(mS)Y(nS)) turned into a sentence very difficult to read and with formal inconsistencies (when “quarkonium pair decay channels” are referred to in line 76, all the cases are covered!)

revised. Write specifically .. in j/psi and Y pair Furthermore, this Letter presents an update of the searches in \JPsi and $\Upsilon$ pair decay channels with higher luminosity, and new channels accessible via the inclusive decay of psi(2S) into an \JPsi meson.

2. CMS Detector Line 87: Central feature of the CMS apparatus is a superconducting solenoid. Remove or explain "Central Feature".

Reformulated the sentence.

Missing description of the PPS detectors.

Not part of the standard description and not used here.

3. Simulated Samples Since only signal MC samples are generated and the signal is a rare decay, we should find a way to convince the reader that the simulated events are meaningful and sensible

The simulation procedure is well established (e.g. H->ZZ*). The dependence on signal shape parameters is weak (see systematics).

line 166: please define " "

The parameters are defined within this paragraph.

4. Event Reconstruction Figs. 2,3,4 . When the candidates are selected, are the four momenta of the two muons originating from Onia refitted taking advantage of the knowledge of the Onia mass (e.g. is the information that the J/ is at 3.1 GeV used at all) ? This is a standard procedure to improve invariant masses such as the ones shown in the figures, however it is not clear if it is done or not.

The dependence on the variation in the onia inv. mass is negligible to the momentum resolution of the high momentum muons. In general, while not relevant here, a fit to an intermediate particle masses introduces a bias in signal and background.

Missing control plots for the , and of the J/ and other mesons.... Moreover, this is useful for future references.

The relevant information is included in this Letter with the description of intervals. CMS published measured di-lepton spectra. In the extreme that there is no four-muon signal, as is in the YY channel, there is no corresponding di-lepton signal.

elements like “stations” (line 177), “muon station" (line 180) and “x and y” (line 180) are not defined and should not be referred to unless they are defined.


In addition the reader is invited to look at the definition of the reference system in line 129-130 and there “x and y” definitions are not the one of line 180 (which refers to the muon chamber local coordinates).

remove both ..

line 185: “to isolate” sounds odd, as if the authors of the analysis “isolate” the muon while they only measure its isolation

To measure the isolation of ..

line 192. It is not clear how the electron candidates are defined in the first place.

The energy of electron candidates .. Removed duplication and define candidate in detector section now.

lines 206-208. It is not clear if this sentence includes the statement of the sentence in lines 184-185. After all, two muons and two leptons includes the four muons case. A better description of the event selection(s) is needed. It has to be clarified that what is decribed in lines 184-185 and 206-208 is a PRE-selection which is followed by the selection described later

Place sentence upfront.

line 211: the definition of “direct decay” should be added here. Is it similar to the one of “direct signal” of line 280?

It is the same. Fixed the sentence.

line 227: the meaning of the word “monitoring” is not clear for the purpose of this analysis. To be improved

We state now: … to allow monitoring the reduction of the sideband population

line 234 remind than n,m =1,2,3


5. Signal Extraction Figure 4.: The definitions of the red lines (dashed and dot-dashed) in the left plot are swapped: dashed is the Higgs and dot-dashed is the Z


lines 267-268. This sentence appears twice (lines 265-266) and it has to be deleted


line 277: please specify the meaning of "feed-down signal" as J/ from (2S)

The sentence is defining this:

Explains better which fits are performed. Lines 276-277 seem to indicate that more than one fit is performed : are they direct signal only, feed-down signal only and direct+feed-down combined signals fits?

Separate fits are performed for the different signal hypotheses: only direct, only feed-down. While direct+feed-down was also performed, it makes no difference. This will likely change once a signal is found.

6. Systematic Uncertainties Line 286: Most of the systematic uncertainties affect only the normalisation of the simulated signals -> Not really true. You should consider the systematic on the final reconstructed physics objects candidates. Furthermore, the events yields when reconstructing J/ using only low tracks (tracker) instead of only the tracker+muon detector candidate types, the PU correction per year, the detector degradation and so on… We suggest removing this line.


7. Results The consequence of the statement of line 340-341 is not clear: does it mean that B(H->Y(1S)Y(1S)) , for example, is, actually, a measurement of B(H->Y(1S)Y(1S) X) where “X” are the possible decay products of Y(2S) into Y(1S) X? The interpretation and the presentation of the results of B(H->Y(1S)Y(1S)) and B(H->Y(mS)Y(nS)) have to be improved otherwise we obtain that the limit on B(H->Y(nS)Y(mS)) with m=1 and n=1 is more stringent than the limit on B(H->Y(1S)Y(1S)) . A possible solution is not to consider the assumption of line 336, to quote the limit on B(H->Y(nS)Y(mS)) for each pair (n,m), and to quote the limit on the sum of those branching ratios. The same comment applies to the limits of the Z decays into Y pairs (lines 356 and 357).

More transitions in the wider interval lead to a lower limit. We reworded. Also: … the Y states could be the result of .. (not just one)

8. Summary line 362: replace “uniformly polarized” with “non polarized

removed sentence

CWR comments from the University of Zurich

Congratulations on the analysis and nice results. Here are the comments from the University of Zurich for the Institutional Review.

Type A

The Type-A comments were all addressed

Type B:

Everything else (e.g. strategy, paper structure, emphasis, additions/subtractions, etc) Generic.

It’s not very clear if for the acceptance measurement the full space is taken into account or if it’s more fiducial-like.

By definition, these are fully corrected BFs

L 34/35 - Please give a reference for this statement

References are the same in next sentence.

Possible self-plagiarism in the introduction. L36-L44 contain some sentences that are almost exactly the same as the paragraph below Fig1 here: https://arxiv.org/pdf/2007.05122.pdf. Perhaps this could be slightly rephrased to avoid having the exact same wording

Done – introduction is significantly reorganized and reworded

L50: Adding Feynman diagrams may help to better understand this class of decays.

- Reorganized/reworded the paragraph. This class is mentioned as related, only now.

L54: “both photons could be gluons” → “both bosons could be gluons”? Perhaps it’d be better “could be replaced by”

… both vector bosons could be also gluons …

L77: Adding a clear definition of "feed-down decays" helps


L84 Maybe we missed it, but is it explained anywhere why the integrated luminosity for the quarkonium pair channels is lower? Was the trigger for these channels introduced later, or was it always there from the beginning but simply inactive during part of data taking?

Added : … somewhat less due to a delayed trigger deployment.

L 148: I’d add here the references you have on L166.

Remove longitudinal polarization here, as explanation follows.

L165 - It is hard to understand the definition of theta. Maybe rewording or adding a figure helps.


Sec. 4 - A table per each channel could nicely summarize the event selection criteria. Reading through all the details of the cuts and criteria like vertex probabilities per decay channel distracts the reader from the big picture in the event reconstruction.

We do not opt for a table as each of the criteria and variables anyhow need to be described in the text. We keep repetition to a minimum.

L229: why the two separate vtx probability cuts (jpsi+jpsi vs Z+jpsi)?

Each channel is optimized separately due to differences in kinematics and backgrounds.

L231: it is stated that the criterion removes 20% of selected events… perhaps it is better to have a statement to justify the criterion so we don’t rely on observed data?

We state now upfront that further optimization is performed to obtain the best 95% CL upper limit.

L234-L235 Y(nS) (n>1, presumably) and Y(1S) are treated differently here, and Y(1S)Y(1S) is also treated differently from Y(nS)Y(mS) later on in the paper. Why is this necessary? Maybe you can add a sentence to the paper to explain this as well.

The only difference is the di-lepton invariant mass interval as described in the sentence in line 234: An Y pair candidate …

L237 "to suppress random combinations": possibly rephrase; we think what's meant is combinations where you pick up two muons with one of them from one Y and the other from the other Y, but it's not totally clear and "random" could also mean something else (other objects misidentified, etc.)

The ‘in-channel’ combinatory is negligible. This really refers to random lepton pairings.

L270 This might be a good place to quote the world-average value that is used as well (the reader might not want to look it up in the reference)

We prefer to refer to the specific PDG issue.

L319-L321 and Table 1: These seem a bit out of place as it describes acceptance of signals with different polarization assumptions, not with systematic uncertainties. Consider moving to Section 3 or 4.

We apply these variations in the result table – hence the table and description are not needed and we erase them.

L338 "It is assumed that one of the Y states could be the result of one step transition" -> Two questions: 1) "one step transition" sounds like transitions Y(nS)->Y((n-1)S) are meant, but Y(3S)->Y(1S) is also listed. Would that count as a one-step transition?

yes: this is a direct transition

2) It is mentioned that one of the Y states could be the result of such a transition, do you also include the case where both Y candidates are the result of a transition, or is this a negligible contribution?

This is negligible

L340 "Consequently, in the Y(1S)Y(1S) final state the feed-down transitions from higher Y states are included" -> But this is also the case for generic Y(nS)Y(mS) final states, as long as n and m are not both 3, correct? Why is Y(1S)Y(1S) mentioned separately here?

The di-lepton invariant mass interval is restricted to include Y(1S), only, for the Y(1S)Y(1S). This is to state clearly that the assumption of equal Higgs coupling is applied, too.

L344 "higher than the value from earlier SM predictions" -> This sentence refers to the upper limit on the branching fraction, but you then say they are higher than 'earlier SM predictions' - what is earlier about them? This seems a contrast with the previous sentence where it is simply stated that the upper limit on the H->ZJ/psi branching fraction is 826 times above the SM prediction.

Remove earlier

L344: Why are only the limits on the H->ZJ/psi branching fraction and the H->YY branching fraction compared with the SM expectation? Several other decay channels are considered in this paper.

These two come closest. We add the sentence:

The factors are larger in all other channels. L348 The search for Z boson decays to Jpsi or Y pairs are not mentioned here

Covered with additional sentence above.

L360 Consider removing the reference to table 2? If you do include it, 'table' should always be written in full.


CWR Comments from Greg Landsberg

Congratulations on completing a nice search for a number of quarkonia decay modes for the Z and Higgs bosons and a well-written paper! Please, find my comments below, split into the physics and style categories.

Type A

The Type-A (Style) comments were all addressed


- Abstract, LL7-8: ... beyond the standard model. However, no evidence for these decays has been observed in any of the channels. Upper limits at 95\% confidence ...


- L3: Refs. [1-7] show a mixture of the original discovery and later follow-up papers. Please, use only the three standard discovery references in this sentence: [1,3,4] and drop the rest. done

- LL10-14: the discussion is completely outdated, as CMS has already published evidence for the H→μ+μ- decays at over three standard deviations, which you do not even cite [JHEP {\bf 01} (2021) 148]! Thus, you need to soften the statement about the couplings to second-generation fermions [say something along the lines that while there is evidence, the precision has not reached the interesting level yet, and also obviously cite the evidence paper.

done Rare exclusive decays of the Higgs boson to mesons provide experimentally clean final states to study Yukawa couplings to quarks, and physics beyond the SM branching fractions that cannot be obtained with inclusive measurements. The required sensitivity for observing Yukawa couplings to second- and first-generation fermions has not yet been reached. Recently, the CMS collaboration published evidence for the Higgs boson decay to a pair of muons []. The $\rm{\bar{c}c}$ in inclusive measurements is found to be approximately 70 times the SM expectation~\cite{Aaboud:2018fhh,2021135980}.

- LL29-30: the sentence makes little sense, as there are no tree-level HZγ couplings; the only way this decay could proceed is through a loop. Please, remove or modify accordingly this incorrect sentence.


- L51-54: the discussion is very confusing; please rephrase as follows: "... in each diagram, the on-shell Z boson in the lower part is replaced by a quarkonium decay, similar to the process depicted in the upper part. Particularly, in the leftmost diagram, the direct ... to a quark-antiquark pair as in the upper vertex. In the rightmost diagram, both vector bosons could be also gluons, in which case, ...


- L145: specify the order at which it was generated [next-to-leading order (NLO)].

we write: The SM Higgs boson signals are simulated at next-to-leading order (NLO) in perturbative with the POWHEG 2.0 …

- LL163-164: particles don't "fly"; planes do! Please rephrase as "positive lepton momentum in the ...". done

- LL217,242: the use of "single" is very confusing here, as it could be understood that you require only one reconstructed Z/meson. I think what you mean is that you do not accept events with multiple candidates. If so, just state this once, after you discuss the selection. Or simply drop the word "single" here.

we drop ‘single’ - no multiple candidate events survive.

- Figures 2-4: in the legends, add the × sign to all the branching fractions, e.g., "B=1.9×10−3".


- LL267-268: remove the sentence, as it's a repetition of the previous one!


- LL282-283: I don't understand this sentence. Found how? Since you do not have any signal in the data, you can't infer this from data. Since you simulate signals separately, how can you make such a statement based on simulated samples? Please, rephrase for clarity or drop this sentence.

Remove sentence.

- LL314-316: remove this sentence. You treat all the systematic uncertainties as nuisance parameters, as you state later on LL326-327; there is no need to single out the signal modeling systematic uncertainty here.


- LL358-360: remove this sentence as too detailed for a summary [also, the summary shouldn't point back to the main text!];


CWR Comments from Sergey Polikarpov

congratulations on this important analysis reaching the final stages of the review!

I have read the CWR version of the paper and have some comments listed below.

Type A

The Type-A comments were all addressed


Abstract: It seems the third and the last sentences can be merged.

We modified the abstract.

General: why you did not add a search for H --> Z Y decays? (should be rather straightforward in the framework of the existing analysis)

Executive decision. They are under investigation; different channels require different optimizations.

General: I think you should cite BPH-16-001 result, which is quite relevant for your paper: http://cms-results.web.cern.ch/cms-results/public-results/publications/BPH-16-001/.

We include a reference to the Z-> Psi ll analysis paper now.

Introduction: When giving branching fractions, it is not clear if the values provided include the quarkonium branching fractions into dimuon (or Z to dilepton). I suggest to clarify this.

By convention, the BF represent the decay up to the level as stated in the brackets.

Section 4: How clean are the Z, Jpsi, Upsilon peaks after the selection ? As far as I understand, it is assumed that every dilepton pair in the range 80-100 GeV is Z bozon, every dimuon with 3.0<m<3.2 GeV is a J/psi, and every dimuon with 9.0<m<9.7 GeV is Y(1S). How reliable are these assumptions? If I understand correctly, using these assumptions brings you to conservative upper limit estimation. Is this correct?

The selection was optimized, including di-lepton mass intervals, to obtain best upper limits. Non-resonant background becomes insignificant.

Sergey Follow-up: Please show the Z, Jpsi, and Upsilon peaks after all the selection requirements (widening, for each of the 3 cases, the respective mass window, to have a clear view on the peak sidebands). What do you mean "become insignificant", please quantify it (I think this would also be useful information for the paper, i.e. provide the numbers for the purity of the Y, Z, Jpsi signals). Checking the AN, I found quite significant background under Jpsi in Figure 3. Cutting the mass window 3.0--3.2 GeV would leave you with approximately 30% nonresonant mumu and 70% signal Jpsi-->mumu. This means that the upper limit you are setting is not on H-->Z Jpsi process, but on H-->Z mumu process where mumu mass is in the range [3.0, 3.2] GeV. If the background from non-resonant dimuons was subtracted, you would have obtained stricter upper limit on H-->Z Jpsi, compared to what you currently report. This is because H-->Z Jpsi events are a subset of H-->Z mumu you select.

The situation is even worse in case of Y(1S) signal, where in figure 24 of AN2019_232_v10.pdf it is seen that the signal and background are of the same level for both fist and second Y(1S) candidates. This basically means that about 50% or more of your 4mu candidates in YY channel are from fake Y meson candidates. By subtracting non-Y background, you would reduce very significantly the background in m(4mu) distribution and arrive to a considerably stronger upper limit. What you currently report, for Br(H-->Y(1S)Y(1S)) is an upper limit on Br(H-->4mu) where the two OS dimuons fall into the mass range of Y(1S) meson, and, as your data shows, more than half of such dimuons are not from Y(1S) decay. I understand that the currently reported limits are the first or/and world-best, but I don' think this is a good enough argument to throw away the possibility to improve them even more, if it would not imply significant additional effort. And, from what I can see, these are not 1% better or 2% better limit, but a significant improvement, at least for YY case.

Have you considered subtracting the non-resonant backgrounds? It can be done in several ways, e.g. using mass sidebands of a resonance, or using wrong-sign data, or (the most sensitive approach) through a multidimensional likelihood fit (in case, e.g. H->Z Jpsi, a 3-dimensional fit to Jpsi mass, Z mass and JpsiZ mass).

The fits are based on 4l invariant masses and no improvement is obtained including the dilepton invariant mass distributions.

Sergey follow-up: I think you did not understand my question/suggestion. You claim that "no improvement is obtained including the dilepton invariant mass distributions" --- is this documented somewhere? I have not found anything related to this in the AN. How did you conclude that there is no improvement? What I propose is, for example, for H-->Y(1S)Y(1S) case, -- to perform a 3-dimensional fit that would allow to extract directly the signal of H-->Y(1S)Y(1S), with non-Y(1S) dimuons statistically subtracted. You can refer to Y(1S)Y(1S) cross section measurement paper BPH-18-002 where a two-dimensional fit on the two dimuon masses was performed to extract YY signal.

If the non-resonant backgrounds in the selected mass windows are significant, using these approaches would result in considerable improvement in the obtained upper limits.

see above

Section 4: Why you are not using mass constraints on the narrow dilepton resonances in the reconstruction? These procedures are very commonly used in B-physics analyses with quarkonium resonances and allow to very significantly improve the invariant mass resolution. For example, when reconstructing Z Jpsi events, in the vertex fit to 4 muons, one can add a requirement that post-fit value of the mass of the Jpsi-dimuon exactly matches the PDG value. This should significantly improve the mass reslution, allowing to reach stronger limits (since the peak will become narrower). Same for all other channels considered in the paper. In case this is not possible for some reason, at this stage of the analysis, you can try to use the mass difference variable, e.g. M*(Z Jpsi) = M(Z Jpsi) - M(Jpsi) + MPDG(Jpsi). This would approximately cancel the Jpsi reconstruction uncertainty and should result in a narrower peak in M* distribution. This also makes sense for Y, while for Z the natural width is comparable to the resolution and it should be checked in MC if there will be a relevant improvement.

The variation in the dilepton invariant mass is negligible to the momentum resolution of the high pT muons.

Sergey follow-up: the answer is very unclear. I don't understand what do you mean with the sentence above and which studies have you done to arrive to this conclusion. Please make it clearer and elaborate in more detail. Example studies I would like to see in the answer are: - comparison of the signal shapes (on the signal MC) for H --> Z Jpsi process, for the two mass variables: M(4l) and M*(4l)= M(4l) - M(mumu from Jpsi) + MPDG(Jpsi). - comparison of the signal shapes (on the signal MC) for H --> Y(1S) Y(1S) process, for the two mass variables: M(4mu) and M*(4mu)= M(4mu) - M(mumu from 1st Y(1S)) - M(mumu from 2nd Y(1S)) + 2*MPDG(Y(1S)).

In each of the two cases I would expect the modified M* mass variables to provide a better mass resolution, allowing to set a stricter upper limit. The mass resolution affects only the signal, however, any improvement (even as low as 5-10%) in the mass resolution would result in a better upper limit. And, if this comes for very little effort, I don't see why not to do it. One may have a concern that the mass difference variable can bias the background shape; this can be checked on data sidebands and in MC simulations. Even if the shape will be slightly changed, it will remain smooth and can be fitted with an analytical function, as is done now. Many analyses have successfully used this mass difference approach, including BPH-18-002. http://cms-results.web.cern.ch/cms-results/public-results/publications/BPH-18-002/index.html

Fig 4 caption: there is no descriptin of left panel. I would suggest to put Y1S Y1S plot on the left and YnS YmS on the right.


Fig 4: why is the signal shape so asymmetric for H-->YnsYms, while all other signal shapes appear to be symmetric?

The different Y-state transitions show up here.

Fig 4: signal line styles appear to be swapped between Z and H for the left panel in the legend.


Line 270-271: how different are the mean values from the nominal masses of Z and H? (in other words, how miscalibrated in the momentum scale, according to MC?).

This is described in the systematic uncertainties and accounted for.

Figures 2-4: Guidelines explicitly say not to use horizontal error on data points if the bin widths do not change.

Since we have empty bins for which the points are not shown, we keep the horizontal bars.

Sergey follow-up: From TWiki https://twiki.cern.ch/twiki/bin/view/CMS/Internal/PubGuidelines#FigsandTabs Horizontal bars on data points in one-dimensional plots should only be used to indicate the binning if the bin width varies across the plot. In that case, the meaning of the horizontal bars should be clearly explained in the accompanying figure caption. I think this clearly states that you should remove horizontal error bars. In case of doubt you can consult with PubComm. In addition here is a list of several CMS results with empty bins: http://cms-results.web.cern.ch/cms-results/public-results/publications/BPH-21-004/index.html http://cms-results.web.cern.ch/cms-results/public-results/publications/BPH-20-004/index.html http://cms-results.web.cern.ch/cms-results/public-results/publications/BPH-17-004/index.html http://cms-results.web.cern.ch/cms-results/public-results/publications/EXO-19-019/index.html all of them go without horizontal error bars.

CWR Comments from Chris Hill

Congratulations on completing a nice analysis searching for quarkonia decay modes of the Z and Higgs bosons. Please, find the comments from Ohio State below (note: due to the holiday period not all of the members of the group were able to provide comments in time for the deadline, nevertheless we hope you find our input useful).

Main comment:

0) The introductory paragraphs where theory/motivation are discussed go on too long for a paper of this length. It simply takes too long to get to the point of the paper. ~1 page instead of >2 would be more appropriate. This reduction should be easy to acheive as their is some discuussion that seems to be superfluous. For example, the entire second paragraph can, and should be, removed. Similalry for L66-68.

We revised the introduction, including your suggestions.

Type A

The Type-A comments were all addressed

CWR Comments from Colin Jessop

Congratulations on the multiple nice results presented in this paper. The analysis appears sound and the paper is well written. I have no significant criticisms and the minor comments I had were already made by others so I recommend to proceed to publication after addressing these

CWR Comments from Ioannina group representatives

I am submitting comments as HIG PubCom chair on behalf of Ioannina group representatives (Costas Fountas and John Strologas), because of a technical problem in their submission.

The Type-A comments were all addressed

FInal Reading Comments from Michel Della Negra

Dear authors,

This is a clear analysis and a well written paper. I have only few comments below.

HIG-20-008 v18 PLB FR comments




"at CMS" -> "in pp collisions at sqrt(s) = 13 TeV"




Break the first sentence:

"Decays of the Higgs boson into a Z boson and a Jpsi or psi(2S) meson are searched for in four-lepton final states with the CMS detector at the LHC. A data set of pp collisions collected at sqrt(s) = 13 TeV and corresponding to an integrated luminosity of 138 fb−1 is used.


Last sentence: "upper limits" -> "observed upper limits"


Type B


Line 98-101: replace [40] by EGM-17-001 published in JNST for electron reconstruction at 13 TeV. Also update the text "ranges from 1.7 to 4.5%." -> "ranges from 2 to 5%."


Line 277: [68] is a dupplicate of [40], both should be replaced by EGM-17-001?

done - and reference fixed

Line 281:"Differences in the lepton resolution and momentum scale" -> "Differences in the lepton momentum scale and resolution" (lepton resolution has no meaning)


Lines 286-291: "Theoretical uncertainties..." uncertainties in which physical quantity? Looking at the AN I understand that these uncertainties are in the production cross-section due to PDF and alpha_s choice, whilst in the next sentence you quote uncertainties on the same production cross-sections due to the choice of the qCD scale. I think the whole paragraph could be simplified as: "The theoretical uncertainties in the production cross-sections for the Higgs boson (Z boson) are +/-3.2% (+/-1.7%) due to the choice of the PDFs and the value of the strong coupling constant [7, 48, 69], and +4.6/−6.7% (+/-3.5%) due the renormalization and factorization scale choice[69-72]."


Lines 322-337 (Summary): The summary section does not read well and is a bit too dry. Can you consider using some of the statements in the outlook slide 22 of the approval presentation (Nov 23, 2021)? The second sentence (lines 323-325) should not be on upper limits but on search for quarkonium pairs. In addition it should be moved after the definition of the data set used. Lines 333,334: For limits replace "=" by "<"

Add two sentences at the end of the summary: "The observed upper limit branching fraction for $\PH \to \cPZ \JPsi$ is about 826 times the value predicted by the standard model. For $\PH \to \PgU\text{(nS)} \PgU\text{(mS)}$ it is about 6 times the value from earlier standard model calculations."

Find below a suggestion for a modified summary:

"This Letter presents the first search for decays of the Higgs (H) boson into a Z boson and J/psi meson in four-lepton final states. Data from proton-proton collisions at integrated luminosity of about 138 fb−1, are used. Using the same data set, decays of the Higgs and Z boson into quarkonium pairs are also searched for. No excess of a Higgs or Z boson signal above background is found in any [...] B(H -> Zpsi(2S)) < 6.6 x 10−3 [...] B(H-> psi(2S)j/psi < 2.1 x 10-3, B(H -> psi(2S)psi(2S)) < 3.0 x 10−3 [...] and B(Z -> Υ(1S)Υ(1S)) < 1.8 x 10−6."

Implemented the suggestion.



Fig. 3 the dotted green line for H->psi(2S)j/psi is not visible

Change color green to magenta in Figs 2a/b and Fig.3



[40] -> [EGM_17-001] CMS Collaboration, "Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC", JINST 16 (2021) P05014, doi:10.1088/1748-0221/16/05/P05014, arXiv:2012.06888. [68] = [40]


FInal Reading Comments from Philippe BLOCH

Thanks for this nice paper of H/Z into quarkonia or Z+quarkonia. The paper is clear and I have very few comments.

L18. One could introduce here the notation (Q) used later one.


Figure 1 rightmost diagram: is it gamma or gamma* , as mentioned in the text L29 ?

gamma* (as in middle figure the top Z is a Z*)

L22: one talks about loops in the last graph, but isn’t it true also for the central diagram?

The middle diagram represents the direct coupling H->ZZ* (following Ref. [16]).

L33. It took me time and I had to go back to ref [21] to understand what exceed unity means. In the ATLAS paper, it is said: assuming the SM prediction for inclusive Higgs boson production, the limits on charmonium decay modes correspond to branching fraction limits in excess of 100%, which is a bit clearer. Alternatively, one could say that this does not yield any limit on the BR !

The reader may actually wonder why, with essentially the same data set, ATLAS does not give any limit, while you are obtaining eventually a 1.9E-3 limit. The answer is that they used a different approach, looking for light resonance decay into hadrons (in a light jet). Would it not be worth mentioning that ATLAS used a totally different approach?

We write now: ... ATLAS Collaboration in hadronic final states, arriving ...

Are all the lines 174-180 (about isolation) relevant for the 4-muons channels only are also to the 2e2mu case?. This is not clear.

-if so, I would suggest to create a separate paragraph, starting at end of line 173 (Events with at least 4muons..)

-if not, it should be clarified.

The isolation for electrons is included in the electron MVL selector, described in lines 187 ff, with isolation addressed in line 189.

I would also suggest starting a new paragraph at line 197. This way we would have one paragraph on Z J/psi, one on the J/Psi pair channel (L207) and one on the Yspilon pair (L220). One of the difficulty of the paper is that one jumps from one channel to the next and it is important to well separate the various presentations.


L224 the DeltaY or simply DeltaY ?

not sure; write: .., the |DeltaY| value has ..

L247 The bias is found to be negligible (otherwise ‘it’ refers to the subject of the previous sentence which is the background)


L 324 (summary) Higgs boson and Z boson decays

done (also in answer to Michel)


The green on Figures 2 and 3 is totally invisible in black and white. May be try a darker green?

Changed color to magenta.

Something strange happened on the HP printer I used at CERN: the Psis in the legends of Figures 2 and 3 do not come right, while they are nice on the laptop screen. Please check on your side, and make sure this is not due to a bizarre choice of font by ROOT (or the program you used to draw the figures).

I printed the paper on the HP printer in Bldg 40 (section B, near elevator: 40-RB-HPBWCOR) last week. The printout is ok.

As often, the FR takes unfortunately place in parallel with an HGCAL meeting (this time our Institution Board that I must attend). I’ll read the answers to the comments in the twiki and will send an email in case of remaining issue,

Comments from Sijin

Page 1

(1) L8, the "BSM" should be explained at its 1st appearance in text here; but since it has not been used afterward in whole paper, so can be simply spelled out, i.e.

"Several BSM frameworks predict enhanced Yukawa" --> "Several beyond-SM frameworks predict enhanced Yukawa"

done: Several beyond the SM (BSM) frameworks ...

Pages 3-9

(2) L107-108, L121, L125, L130-L131, L177, L210, L222, L228, L237-238, L257, L268, L288 and L290. These lines may be shortened from

In general, in PLB we are not such pressed for space.

(a) L107: (as the "PU" has been used for only one time in whole paper in the 2nd half of L107, thus it is not necessary to be introduced in the middle of L107, and be spelled out at the end of L121)

"referred to as pileup (PU). The average number of PU interactions" --> "referred to as pileup. The average number of pileup interactions"

We use it now also in Sec. 3

(b) L108, L125 and L268: (to follow some good examples on L70 and L269, etc., the year ranges can be shortened)

#) L108 and L125: (two places)

"2017 and 2018" -- "2017-2018"

These are two distinct data taking (full year) periods.

##) L268: "for the 2016, 2017, and 2018 data-taking years have 1.2鈥.5%" --> "for the 2016-2018 data-taking years have 1.2鈥.5%"

use standard sentence as provided

(c) L121: "three muons with pT greater than 2 GeV." --> "three muons with pT > 2 GeV."


(d) L130-131: (as the "MC" and "ggF" have not been used afterward in whole paper)

"with the POWHEG v2.0 Monte Carlo (MC) event generator [46, 47], which includes the gluon-gluon fusion (ggF) and vector ..." -->

"with the POWHEG v2.0 Monte Carlo event generator [46, 47], which includes the gluon-gluon fusion and vector ..."


(e) L177: (to follow the good example of L105 and L182, etc. for using the "pT sum" to shorten)

"The sum of the pT of the reconstructed inner-detector tracks" --> "The pT sum of the reconstructed inner-detector tracks"


(f) L210: "have to be within 0.10 and 0.15 GeV," --> "have to be within 0.10-0.15 GeV,"

These are two distincst values referred to as such in the sentence (respectively).

(g) L222: "within the range 9.0â€0.7 GeV and 9.0â€.7 GeV," --> "within the range 9.0â€0.7 and 9.0â€.7 GeV,"

Not sure - intervals are quite separated in this sentence.

(h) L228 and L257: (two places, as the "m4mu" has been introduced on L218)

"four-muon invariant mass" --> "m4mu"


(i) L237-238: (two places)

"are about 31% and 30%, respectively. For the Z boson, the corresponding values are about 28% and 32%." -->

"are about 31 and 30%, respectively. For the Z boson, the corresponding values are about 28 and 32%."


(j) L288: "are +-3.2% and +-1.7%" --> "are +-3.2 and +-1.7%"

reworded (second value in brackets now)

(k) L290: (also, an extra space before and after the "-" of "- 6.7%" may should be removed, and the last word of line should be plural)

"is about +4.6/ - 6.7% and +-3.5% for the Higgs and Z boson," --> "is about +4.6/-6.7 and +-3.5% for the Higgs and Z bosons,"

changed: stacked first values, second in brackets now.

(3) Figs.2-4, in the legend of each plot, the last two lines, to be consistent in this paper, two spaces should be added before and after the symbol "=", e.g.


"H -> ZJ/psi, B=1.9x10**(-3) H -> Zpsi(2S), B=6.6x10**(-3) " -->

"H -> ZJ/psi, B = 1.9x10**(-3) H -> Zpsi(2S), B = 6.6x10**(-3) "

Fig.3 (the last 4 lines of legend) and Fig.4 (the last 2 lines of legend) are similar.

Figures are independent of the text. And we need to save space.

(4) L285, to be consistent in this paper, the font of electron "e" should be changed from

"in the 4mu (2e(italic)2mu) channel." --> "in the 4mu (2e(non-italic)2mu) channel."


Page 10

(5) L310-311, it'll be looked and sound better if a comma can be added after the 1st clause in the sentence, i.e. (also, I'm not sure whether the 1st word of L311 should be plural)

"To calculate their contribution to the corresponding H and Z boson branching fraction the coupling strength of the bosons to ..." -->

"To calculate their contribution to the corresponding H and Z boson branching fractions, the coupling strength of the bosons to ..."


(6) L322-323, in the Summary Section

(a) L322, to be consistent in this paper (i.e. L6) and with all other CMS papers, the "(H)" should be placed after "boson" instead of before, i.e. "of the Higgs (H) boson into a Z boson and J/psi" --> "of the Higgs boson (H) into a Z boson and J/psi"


(b) L323, the branching fraction "B" used in L332-337 should be explained at its 1st appearance in this Section (to be consistent with all other CMS papers, since some readers may only read the Summary Section instead of whole paper), i.e.

"results on upper limits of branching fractions" --> "results on upper limits of branching fractions (Bs)"

done (at first appearance - summary reworded now).

Pages 12-16, in the References Section

(7) L434-436, in [17], this Ref. is identical with the [18], so should be removed.


(8) L538, in [55], in the author part, as the "Group" has the meaning of Collaboration, so the last word can be removed, i.e.

"[55] Particle Data Group Collaboration, “Review of ..." --> "[55] Particle Data Group, “Review of ..."


(9) L549-550, in [60], to be consistent with other CMS papers on this Ref., the author part can be shortened from

"[60] The ATLAS Collaboration, The CMS Collaboration, The LHC Higgs Combination Group Collaboration," -->

"[60] ATLAS and CMS Collaborations, and the LHC Higgs Combination Group,"


(10) L560, in [64], an extra dot before the "Ph.D." should be removed, i.e. (also, an extra space before the closing quotation symbol after the article title should be removed)

"... gammagammapsi â€. Ph.D. Thesis (1980)," --> "... gammagammapsi†Ph.D. Thesis (1980),"

Copy reference from other publications.

Final Reading comments from Rainer

very appealing and interesting and do not see big conceptual issues. Nevertheless, I think the text is still a bit unclear in places, and for a PRL letter it should read very smoothly.

The proposal is to publish in Physics Letters B


General comments:

- the natural width of the quarkonia is very small, at least for the ground states. Would it not be an ideal application for a mass constraint fit?


- "Kalman" (=jargon) should be always replaced by "Kalman filter", which is the proper name of the method. And it is enough to mention the method the first time you refer to the vertex fit, you can just call it "vertex fit" thereafter.


- it is not really clear to me how you reconstruct excited quarkonia states like Y(2S) which frequently cascade down to Y(1S), emitting pion pairs or photons. I can only speculate that you ignore the accompanying particles and just take the dilepton daughters from the ground state, at some expense of the H mass resolution/tails, but from your text it is just not clear. (The vague indication in L309f is not enough and anyway too late.)

Add sentence in line 63 ff: Furthermore, this Letter presents an update of Higgs boson searches in \JPsi\JPsi and $\PgU\rm{(nS)}\PgU\rm{(mS)}$ decay channels with higher luminosity. New channels are accessed via the inclusive decay of $\psi\text{(2S)}$ into an \JPsi meson.For the $\PgU\rm{(nS)}$ states the possibilities that they are the result of transitions from higher $\PgU$ states before decaying into muon pairs are included.


Type B comments:

- L9: leading to branching fractions (B) that are enhanced by ...


- L15: "rate" is unprecise here

We re-write (see next point)

- L16: cite the latest CMS VHcc measurement

The observed upper limit for B(H -> cŻc) times the cross section for Higgs production in association with vector bosons as measured by the ATLAS and CMS experiments is found to be approximately 70 times the SM expectation [15, 16].

- L51: this nomenclature, Upsilon(nS)Upsilon(mS), is very unhandy. Can you think of a shorthand that you define in the beginning, like "Upsilon" (I believe this greek letter is only used for Y(nS) states, so it would make sense).

We used it in the previous paper (now adopted by PDG). Here we also single out Y(1S)Y(1S). Hence, we propose to keep the present nomenclature.

- L65 and elsewhere: the symbols in these decay modes appear a bit crowded. Consider a small space e.g. between the two J/Psi here.

We removed any space in an earlier version due to a comment on style compliance.

- L72: "channels, where the second number is slightly smaller due to..."


- L84: "after 2016" unprecise, maybe: "in the technical stop between the 2016 and 2017 data-taking periods"

Done.We write: ..during a technical stop between the 2016 and 2017 data-taking periods,

- L108: "23, and increased to 32 …"


- L125: for the years 2017 and 2018


- L159-161: Ordering issue: It is strange that you refer to the use of muons etc before you have introduced these objects. It is better to move this part after the object definition.

This part is introductory to guide the reader through the following paragraphs - as requested during CWR. Muon and electron are here referred to as physical particles.

- L159: this sentence sounds awkward because muons are also leptons. Better: "two muons and two additional leptons"

done; also use symbols now

- L160: "oppositely charged muons": why not write “mu+mu- pairs“ here, like you do in the following? Be consistent.


- L173: "longitudinal axis" -> "along the longitudinal axis"?

yes, done.

- L199: Each dilepton candidate must fit to a common vertex with…


- L204: Why is the region close to the threshold excluded?

Because the distribution rapidly changes with the opening of the threshold which is difficult to model, while the signal is far enough away.

- L210f: "This cut is motivated by the dimuon mass resolution of about ..."


- L212: as the selection progresses


- L220f: "The ... candidates": this sounds ambiguous because the Y(1S) seems to be included in the list of Y(nS)

Distinguishing criterion is in line 222

- L227: do the 59 Y(nS)Y(mS) candidate events include the 18 Y(1S)Y(1S) candidate events? Not clear.

yes - the criterion is described in line 222

- Fig. 4: Similar question: does the left plot include the events from the right plot?


- L266, L268: strong coupling constant ?

L286 L288 - yes; re-written

- L290: stack asymmetric errors vertically

done. But not sure about style (appearance)

- L298: any of these channels


- Table 1: as the H boson has spin 0, what is the point in making different assumptions about the Z etc polarization?

In general, the polarization of the vector mesons is not constrained. BSM might introduce different spin contributions. In line with previous publications (Phys. Lett. B 797 (2019) 134811, JHEP 2011 (2020) 039) we include the different possibilities.

Final Reading Comments from Avto


However, no evidence for these decays has been observed in any of the channels. Upper limits at the 95% confidence level are placed on the branching fractions of these decays. => No evidence for these decays are observed and upper limits at the 95% confidence level are placed on the corresponding branching fractions.

No evidence for these decays HAS BEEN observed and upper limits at the 95% confidence level are placed on the corresponding branching fractions.

Type A

L15: Use ccbar penname


L42: by Refs. => in Refs.


L45: values of about => values of

We keep 'about' as they are estimates without uncertainties.

L159: at least two muons plus two leptons => at least 2mu plus 2\ell (\ell = e or mu)


LL242-244: Combine these two sentences? E.g. ... and are described by ...


L306: branching fraction => branching fractions


Type B

L8: Define BSM

done: Several beyond the SM (BSM) frameworks ...

L98: 3X_0 => three radiation lengths (or define X_0)


L128: Drop m_ll here since is defined at L203; define \ell at L159 or L203

done (\ell in 159)

L134: for parton showering and hadronization according to the CUETP8M1 [52] tune. => for parton showering, hadronization, and underlying event simulation using CUETP8M1 tune [52].


L138: Similar to L134. Or mention the CUETP8M1 tune [52] once.

done similar

LL145-147: PU is introduced earlier. Say instead: Simulated events are weighted so that the PU distribution reproduces the one observed in data, which has an average of 23 (32) interactions per bunch crossing in 2016 (2017âEUR"2018).

We write: Simulated events include additional pp interactions that are weighted so that the PU distribution reproduces the one observed in data.

L217: of the selected events => of the background events


L273: the tag-and-probe using => the tag-and-probe method using



Fig. 1, caption: q a quark => q -- a quark

done: ... and q is a quark.

Table 1

1st column: channels => channel


Final Reading comments from Emanuele

= Type B comments =

* Abstract / Title: the title seems incomplete, since the search includes H->Z + J/psi and H->Z+Psi(2S). The title only mentions the former.

We propose to keep the title - to limit its length. It refers to the ground-states which are directly reconstructed.

* Abstract L [-3,-1]: you state the result only of the first search (H->Z J/psi or H->ZPsi(2S)). You should mention also the result of the search in quarkonium pairs, which is cited in the abstraxt beginning and in the title (even if summarized in words, if you don't want to repeat the long list of the summary).

This abstract has been condensed after several iterations to the present form with the thought that is does not have to be complete. Please advise.

* L2: either say "a mass of about 125 GeV" or cite the exact value of the most precise mass measurement (125.38 +/- 0.14 GeV) citing Phys. Lett. B 805 (2020) 135425, doi:10.1016/j.physletb.2020.135425, arXiv:2002.06398.

done - about

* Fig. 1: the "gamma" in the 3rd Feynman diagram should be a gamma*, right? This should be the ampltitude that you mention in L29 (H->Z gamma* contributing "significantly")

We consistently do not label virtual particles with a star in the diagrams.Here we follow the convention of the central theory paper Ref. [16]

* L32: "have been studied" => "have been searched", not to give the impression that these decays have been observed already

done: .. searched for by ..

* L72: "somewhat less due to a delayed trigger deployment." This particular cannot stay here, where the "trigger" is not even defined. This can go e.g. after L125 when you have described the dedicated trigger.

We write less specifically: .. and 133\fbinv in the quarkonium pair channels, where the second number is slightly smaller due to a delayed trigger deployment.

* L147: "...pileup, are included in simulated samples.". You should also mention that the simulated samples are reweighted to match the data distribution. I suggest the usual: "Events are then reweighted to match the pileup profile observed in data."


* L152: please define "\lambda_theta" right after the formula "where \lambda_theta is a coefficient depending on the polarization of the intermediate decay products"

We add: The $\lambda_\theta$ is the average polar anisotropy parameter [P. Faccioli, .. Eur. Phys. J. C 69 (2010) 657]

* L190-193: the cited efficiency vs fake rejection rate (90% vs 2-5%) is for what range of pT? They seem more likely the numbers for the MVA ID for high pT electrons (pT>~20 GeV) than for the spectrum starting from pT> 3 GeV that you are considering. If this is the case, either specify the typical range for the cited numbers, or the numbers averaged over your spectrum of electrons.

We write more specifically: The selection based on the multivariate identification discriminant has an electron identification efficiency of 90\% while the rate of misidentifying other particles as electrons is 2--5\%. In the next sentence removed pT>3GeV as it is superficial - the electron momenta in the Z decay are much higher.

* L201: justify the requirement of the recoil of the 4l candidate pT(4l)>5 GeV. Is it to help the vertex fitting?

This is to suppress low-pT background while maintaining high signal efficiency to achieve the best upper limit in this sample.

* L211: " The dimuon mass resolution is about 1%." Why do you cite the dimuon mass resolution only for the case of 2J/psi? This is relevant also for the J/Psi Z channel before, so move up this after L199 (before "Each dilepton...")


* Fig. 2,3,4: the blue line is labelled everywhere "Sig+bkg fit". But in each plot there are multiple signals, so it is not clear which of these signals are in the blue fit? Since results are ULs, wouldn't be cleaner to show Bkg-only fit in the blue? Otherwise at least specify in the caption and in the legend what "Sig" is. In Fig. 3 it seems that there is a positive signal fitted, for m(4mu)~mZ, so maybe that is the signal in that specific case.

We specify in the caption: The result of the maximum likelihood fit to direct signals plus background is superimposed (solid blue line).

* L258: " The m(4mu) distribution below 80 GeV is well described solely by an exponential function." What is the purpose of this sentence after you stated the results? If it is to describe the bkg PDF, then it should be before the results. Since you just have few events above 60 GeV, how do you account the possible variations of the bkg PDF in the signal region (~mZ or mH)?

This paragraph describes the central parameterizations. Results are described in L298 ff. The functions are defined in the full m_4mu range.

* Systematics: I am surprised to see that there is no systematic associated to the background parameterization. Given the fits shown in Figs. 2,3, and especially Fig. 4, where the lever arm of the bkg constraint is only 1-sided, the impact of changing the bkg shape could be huge. There is only a mention of the possible fit bias estimated changing the function, but in addition to this, there should be also a systematic from its possible variations (eg. like it is done in H->gg with the discrete profiling method of the envelope PDFs).

We float background shape parameters - hence this (dominant) uncertainty is statistically. Different shapes functions have been systematically tested.

* Summary: any comment on the improvement or novelty of this list of results with respect to the current knowledge? Can you connect them with the models that you have described in the introduction?

We add: The observed upper limit branching fraction for $\PH \to \cPZ \JPsi$ is about 826 times the value predicted by the standard model. For $\PH \to \PgU\text{(nS)} \PgU\text{(mS)}$ it is about 6 times the value from earlier standard model calculations.

= Type A comments =

* Title: "at CMS" => "with the CMS experiment"

removed 'at CMS' (see comment Michel)

* L6: Since you defined the "H" symbol at the beginning of the line, should use it later: "the Higgs boson to mesons" => "he H to mesons". Same for L17, and caption of Fig. 1, L58... Try to use it throughout the paper.


* L15: maybe is more typical to write "c cbar" rather than "cbar c"


* L247: "It is found to be negligible" => "The bias is found to be negligible"


* L290: I would use the symbol "^{+4.6}_{-6.7}%" rather than "+4.6/ − 6.7%"


Follow up comments Michel Della Negra

My comment was to REPLACE "at CMS" by "in pp collisions at sqrt(s) = 13 TeV". Remove "at CMS" from the title in v19


Abstract: ===== You should define the symbol "B" for branching fraction used in the last sentence: "corresponding branching fractions (B)"


Type B ===

Line 4:Remove "from ATLAS and CMS". Refs [4-6] contain also a combined measurement from CMS alone [6]


Line 17:"70 times" ?? Atlas [15] gives a limit of 110 and CMS [16] a limit of 14 times the SM expectation. Suggestion: "Upper limits for B(H->cc) times the cross section for H production in association with vector bosons have been reported by the ATLAS and CMS experiments [15, 16]. The current best observed limit at 95% confidence level (CL) is found to be 14 times the SM expectation [16]."

From the abstract of Ref [16]: The two analyses are combined to yield a 95% confidence level observed (expected) upper limit on the cross section σ(VH)B(H→ccŻŻ) of 4.5 (2.4+1.0−0.7) pb, corresponding to 70 (37) times the standard model prediction. The new result (14 times) is not yet published.

Lines 83-84:"with higher luminosity" -> "with higher integrated luminosity"? or "with the full available data sample"?

(L63-64) we write now: with the full availabe data sample.

Lines 59-71: This paragraph repeats three times "this letter presents". Can we say instead "we present" at lines 62 and 66?

Find formulation in passive style.

Lines 136-137: "The JHUGen 7.1.4 generator [49, 50] is used to decay the Higgs boson into Z bosons and Q mesons." I suppose this covers the decays HZQ and HQQ ? What about the decays ZQQ described in the next paragraph?

write: ... produced and decayed ...

Lines 147-148:"The total cross section is obtained with the B(Z->mumu) value from Ref. [55]." I dont understand this sentence? Why do you need B(Z->mumu)? How do you get the ZQQ signals shown in Figs 3 and 4? You need sigma(pp->Z)xB(ZQQ)xacceptance?

The cross section value is not needed here but for the final BF calculations. It was recommended to decsribe all related information here.

Line 156:"proportional to (1 + lamba_theta cos2 theta). The lamba_theta" -> "proportional to (1 + lamba_theta cos2 theta), where lamba_theta"


Lines 244-245:"The Higgs and Z boson invariant mass distributions are derived from simulation." Remove this sentence, the signal shapes are described in the next paragraph?


Line 258: I agree with Emmanuele that "signal + background function" is unclear as you have more than one signal. You have to comment at least that since the various fitted signals are consistent with zero the blue lines are essentially background only curves independant of a particular signal.

In line 258 we only introduce the 'direct' channels. Feeddown channels are addressed in the next paragraph and further clarification is provided in L263-268. In response to Emeanuele, we added text in the captions of these figures.

Table 1: in the first column use "channel" or "channels" consistently for Higgs and Z boson

done - channel

Line 327:"branching fractions (Bs)" -> "branching fractions (B)"Â Remove the "s"


Line 336: "about 826 times" -> "about 800 times"Â (as in line 317)


Lines 337 and 341: repeat the theory refs [23] and [47]

refer to [17-19] (ZJ/psi) and [23] (YY)

Lines 341-342: Suppress the last sentence. Check the new end of the summary (lines 335-342) with the CCLE.

remove last sentence

Final Reading Comments

Notes from HIG-20-008 Final Reading. Line numbers refer to v19 of the paper draft. General comments

From Michel:

Check ccbar is correctly used in all places.


lines 59-71. use "... are presented ..." instead


line 147-148 remove "...the total cross section is obtained with the B(Z->muumuu) value from Ref. [55]..."


Add [55] somewhere else. Do it around where the EW correction is mentioned.

done - reference number shifted

Fig. 2 and Fig. 3. Show background-only fit line. Need to change figure for Fig. 3, and update captions for Fig. 2 and Fig. 3.


Summary L337, the language editor has checked this line. Remove sentence: "This indicates ...".

Comment from Rainer: line 63: Feed down decays. Write a sentence on paper to explain Feedown: text is not clear → Make explicit that you ignore the extra particles from the Upsilon(nS)->Upsilon(1S), but do account for the different Higgs mass shape from the missing particles.


line 204. Include the sentence about the threshold effects in the paper (from the replies to Rainer)


Table 1 : Table 1 caption: add "longitudinal" and "transverse" as lambda_theta = -1 and lambda_theta = +1


Comments From Emanuele:

Figure 2 remove "preliminary"


L247-L250 : remove these lines from here, and add the sentence as a new point 6 or 7 in the systematics items


Formal review of the title, introduction, and summary

Title: Remove “at CMS”


Abstract L3 collision->collisions


Use (B) for the BF


Introduction : 1st paragraph: L4 : remove ATLAS and CMS


line 17. Check if H->cc paper is submitted before HIG-20-008 goes out.

L10 : 3-> three


line 14 move comma to after "reached"


L8: drop acronym BSM


line 8 make it "Several models beyond the SM predict...


        1. nd Paragraph
line 33 "arriving at 95% CL" => "reaching 95% CL" 3rd paragraph arrives-> reaches values of about


line 45 make it "More recently, Ref. [27] predicts values of ..."


L46: drop of about 4th paragraph L56: 2->two L58: 3->three , 5-> five 5th paragraph


L66: we have discussed the way to avoid “This letters presents…..”


Summary L327: remove s from (Bs)


->remove last sentence (already done)


L322: J/\psi -> a J/\psi


L330: Y(n)->Y(nS) (twice)


L332-333: make B(H->Jpsi Jpsi) in a line ( use latex tilde to bind values together)


→ consistently use ccbar notation at all places ( reference [28] make it c,c-bar not c-bar,c)


CWR comments: Authors will propose a sentence to clarify that the non-resonant contribution under J/psi and Upsilon are small and included in the H or Z 4 lepton distributions to extract the final upper limits. This goes most likely around lines 214 and 225.

Add sentence in line 221 to describe non-resonant background in the dilepton spectra..

-- HimalAcharya - 2021-02-17

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf HIG20-008ResultsWithAsimov.pdf r1 manage 882.2 K 2021-02-17 - 21:20 HimalAcharya Expected branching fraction calculation and impacts of all the channels studied in HIG-20-008
Edit | Attach | Watch | Print version | History: r51 < r50 < r49 < r48 < r47 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r51 - 2022-04-19 - StefanSpanier
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback