Paper Explainer: Digging Deeper for New Physics in the LHC Data

This is a post about a new paper I wrote with my fellow professor at Rutgers, David Shih; two Rutgers postdocs, Anthony DiFranzo and Angelo Monteux, and David’s grad student Pouya Asadi.


Is there any new physics at the LHC?

The answer appears to be “no.” If there was obvious evidence of new physics at the LHC, trust me, you would have heard about it by now. 

But how do we know? The LHC produces a truly ridiculous amount of data. For each event (and the LHC writes to permanent record 400 events per second) the LHC records all information from all the detector elements. But nowhere in that information is a little flag that says “New Physics!” Indeed, most new physics we can imagine can be aped by physics of the Standard Model. We look for new physics via statistical evidence: we hope to see more events with a particular character than we would have expected. 

But we haven’t seen such an excess, correct?

Correct. At least, we haven’t in the places we’ve looked.

Because the statement “excess over the Standard Model” means an excess of events with some set of final states (so, production of jets, or b-quarks or leptons, or photons, or some combination of the above) which some set of kinematic characteristics (large missing energy, one high momentum jet, jets that appear to have come from a top quark, etc.). We can’t just asking if there is an excess everywhere in the LHC data, we have to be specific.

So if you go to the ATLAS and CMS webpages, you’ll find searches for new physics broken out by, among other things, the final states. Then, an individual search will define “signal regions,” where only events with a defined range of kinematic variables end up. CMS these days tends to define many of the signal regions and each signal region is exclusive, meaning they don’t overlap. ATLAS tends to define larger signal regions, and those regions can overlap with each other.

What the experiments then do is select a set of models of possible new physics (as motivated by theoretical physicists), and use those to predict how many events should be seen in each signal region. Most signal regions should never be populated by events which were the result of new physics, which some smaller number of regions would have those events pass the selection criteriaA model of gluino pair production in supersymmetry, for example might suggest that you should see 1500 extra events in signal region #3, and 250 in signal region #10, and 8 in signal region #45 in a particular search. Or whatever. This allows you to statistically combine the different signal regions: looking at how many events they expected to see from the Standard Model, how certain we are of that prediction, how many events we actually saw, and how many events the model of new physics predicts. By picking a model, we weight the “interesting” signal regions more than the “boring” ones. Without the model, we would not have a prayer of seeing new physics.


But we haven’t seen new physics. So what’s with that?

Maybe there is no new physics at the LHC. Maybe it is there, but we’re not writing those events to permanent storage for one reason or another. Maybe it is there, we are recording it, it is in the ATLAS and CMS searches, but we’re just looking in the wrong signal regions, because our models have led us astray. Those models were designed by smart people, but maybe they were wrong, because the Universe is confusing.

So what to do? There are certainly signal regions which have large statistical excesses away from the Standard Model, but we can’t just pick and chose those and say “Hey! New Physics right here!” For one thing, if you have thousands of signal regions, you expect to see some deviations, that’s just how statistics works. For another, we don’t expect the new physics to populate only one signal region. A realistic new physics model is unlikely to produce events with such a narrow range of kinematics and topologies to all end up in only one of CMS’s narrow signal regions, for example. Further, the LHC detectors are not perfect: they don’t see every jet, or measure the energies to infinite resolution. There should be some slop where events that “should” be in one signal region are mismeasured in end up in others.

So what to do?

David Shih, our postdocs, David’s grad student, and I were delving into the LHC data collected last year. We were finding no obvious excesses when we compared with the benchmark models, but we sometimes found combinations of signal regions that gave interesting deviations from the predictions. But it was haphazard. David had the idea to systematize this exploration. 

For that, we needed a way to look for excesses without appealing to a specific model of new physics. As I said, if many events at the LHC are being produced by a particular process, we sort of expect them to have relationships between the number of jets, or leptons, or whatever, as well as having characteristic kinematics. So these events should be “near” each other in the CMS signal regions (the ATLAS choices of signal regions turned out to be more difficult to use in our approach, as you’ll see). 

So we did the most straightforward thing. We looked for “rectangular aggregations.” We combined CMS signal regions, but only if they fell into “rectangles” of the kinematic and topological divisions used by CMS. So, for example, we could combine regions with one jet, and two jets, and three jets, but not one jet and three jets. Here’s a visual example of how we drew our rectangles. 

We then cycled through every possible rectangle we could draw in a particular search. If there were more than two variables that defined a signal region, we extended our rectangles into those multiple dimensions, but always demanding that we only combined signal regions which were “next to” each other. And there were a lot of ways to do that. We specialized to two CMS searches, both looking at events with jets and missing energy. We called them CMS033 and CMS036, after their collaboration ID numbers. CMS036, for example, had 213 individual signal regions, and over 33,000 rectangular aggregations. 

When we looked at each rectangle, we compared the number of events seen with the number of events predicted, and the error on that prediction. We used the correlations between the errors in different signal regions to calculate this final error bar, which was absolutely necessary. When signal regions are near each other, we expect the errors to be related: because similar physics underlies the number of predicted events in nearby regions, so being wrong in one region tells you about how you should be wrong in a related one. CMS provides these correlations now, and that’s hugely helpful of them.

In the end, we identified a number of rectangles that had a number of events in them that we calculated had a less than 1% change of being due only to the Standard Model. We could group these into three excesses in CMS033, and five in CMS036. Now, that doesn’t mean that there are eight regions with evidence of new physics. Sometimes, 1% fluctuations happen. But this is a manageable number of places to look at in more detail.

So we went through all eight regions, one by one. In most of them, we could say, with some confidence, that it was unlikely that the excess could be anything but a random fluctuation. For example, if we found an excess with one b-quark, there needs to be some events with two, because the LHC isn’t perfect and sometimes the non-b-quarks should be identified as b’s. Or, the excess in one search was ruled out by the lack of an excess in the other. In the end, only two excesses in each search we judged plausible as new physics. Again, that’s just plausible, not definite. For one thing, 1% is not the $5\sigma$ evidence needed for discovery — think back to the diphoton anomaly for a case of a highly significant anomaly being nothing but a fluctuation. But we think these four anomalies bear watching, and hopefully will grow in the future, as more data is collected.


We also noticed that one anomaly in CMS033 was consistent with one anomaly in CMS036. That is, both were characterized by similar kinematics and topology: few jets, no b-jets, and low missing energy. We can’t just combine these two searches: some of the same LHC events recorded in CMS appear in both, so the statistical likelihoods are not independent. But we can combine them with the equivalent ATLAS search (ATLAS022) — ATLAS is on the other side of the LHC ring from CMS. Its events are completely independent. We found that ATLAS also had a mild excess with similar kinematics (remember that we can’t play our rectangular aggregation search game with ATLAS to find interesting regions, but we can focus in on a particular region if we already know to be interested in it). That’s interesting.

So we decided to try to fit this anomaly. We tried fitting it with a bunch of different models: including standard supersymmetry models and dark matter models. Neither worked — to get the observed number of events in the regions identified by our rectangles, they all predicted too many events in other signal regions which had no excesses. Which is what you’d expect: if you can fit this anomaly with a known model, it would have been found already. So what does work? Something kinda strange.

We found we could fit all the data from CMS033 or CMS036 as well as ATLAS, using a model were we produced a new, strongly interacting particle that then decayed to a single quark and a particle invisible to the detector (so something related to dark matter, perhaps). It sort of looks like a model called R-parity violating supersymmetry, but there are some issues with that identification. But it’s a kind of ugly model. Again, maybe that’s what we should expect from new physics at the LHC now: if it was pretty, someone would have thought it up for deep aesthetic reasons, right? The Standard Model is kind of ugly, so maybe the new physics is too. Either way, questions of ugliness are not the important issue here. We found that we could fit the data with a $3.5\sigma$ preference for new physics. That’s pretty interesting. Adding in a different CMS search puts some pressure on this fit, but avoiding the excluded region from that search, we still find a $3\sigma$ region that prefers new physics. Again: interesting.


Global statistical significance as a function of local significance, as estimated from pseudoexperiments using the best-fit model to the anomaly of CMS036 and ATLAS022

Global statistical significance as a function of local significance, as estimated from pseudoexperiments using the best-fit model to the anomaly of CMS036 and ATLAS022

One thing to worry about is the dreaded “Look-Elsewhere-Effect.” We found some evidence at $3\sigma$ significance. But we looked in so many places — surely that means we must have found something eventually right? Even if there was no new physics. 

It is hard to get a quantitative measure on the look-elsewhere-effect, especially if you don’t have a model. It might be that any rectangular aggregation search will identify anomalies (though several other CMS searches we’ve looked at don’t, so that’s something). But the specific statistical question that can be answered really requires a specific model against which to compare. The question “should I see any anomaly if I look in a bunch of places, due to random fluctuations?” is not as well defined as we’d like, but “should I have seen a random fluctuation that looks like this model, if I look in a bunch of places?” is. So we asked it, and we found that our $3\sigma$ fluctuation gets knocked down to $2\sigma$, after this effect is applied. That’s completely standard for the look-elsewhere-effect, so we don’t necessarily suffer greatly for using our rectangles.


So where does this leave us? Do we have evidence of new physics?

No. Don’t be silly. 

But what we do have is some evidence that things are not clear-cut. People were not only worried that we didn’t have $5\sigma$ evidence of new physics, they were worried that we didn’t even have $3\sigma$ hints. There just seemed to be nothing there in the LHC data.

But the data is incredibly complicated. There are so many places to look for new physics. The experimentalists are really good at doing the hard work of looking for new physics. But how they look in influenced by how we theorists advised them on where to look. Maybe we were wrong. What this paper shows is that there are interesting hints that might be new physics, existing in the data as it stands. Right now.

We have this incredible LHC data. We must use it to its full power, which means we need to cast a wide net for new physics. There are certainly improvements we can think of on our rectangular aggregation technique. We’re contemplating that right now. Maybe the particular anomaly we found will be a mirage, just like so many other excesses with $2\sigma$ global significance (hello, diphotons). But we should start tracking these anomalies now, find out which ones are compatible with all the data, and which ones grow as more data is collected.

Maybe there is no new physics at the LHC. But the only way to answer that question is to do a hell of a lot more work over many more years. There’s a lot to be done.