Model Talk
Pressure revisited (again)
Roughly once a year I get to thinking about how to best model pressure in professional golf. Our early efforts on the subject weren't that rigorous but are fun to look back on. Our most recent pressure post, from two years ago, provided a ton of data on how golfers perform near the lead, and also introduced an interesting statistic called conditional expected wins, which estimates how likely a player was to win a tournament conditional on their final round performance (while remaining ignorant of the other golfers' performance). The example of Tiger Woods at the 2019 Masters illustrates the basic idea. Tiger began the day 2 strokes behind Francesco Molinari (with a bunched leaderboard) and ultimately beat the field by 1.5 strokes in the final round; knowing only these starting conditions and Tiger's final round performance, we estimated Tiger's likelihood of winning to be just 5%! This indicates that Tiger was very lucky, in some sense, to earn his 5th Green Jacket.

My main goal in this latest return to the pressure question is to better understand how to estimate a "pressure-free" baseline that can be used as the comparison point for a golfer's performance when they are playing under pressure. As in past posts, pressure is defined based on the player's position on the leaderboard before the round. (Obviously this isn't perfect, but for a round-level analysis it's about as good as we can do — strokes back of the leader could also be taken into account, but I don't focus on that here.) The purpose of this baseline is to provide a prediction of performance assuming the golfer of interest starts their round in a neutral pressure position (40th place or so).

The pressure-free baseline should primarily depend on a golfer's performance in the earlier rounds of the current tournament and their pre-tournament skill. The key difficulty, however, is that these variables — pre-tournament skill and performance in previous rounds of the tournament — are also the main determinants of a player's starting position in the final rounds of a tournament. Therefore it seems potentially difficult to estimate the "pressure-free" relationship between, for example, round 3 (R3) performance and round 4 (R4) performance, because a good R3 moves you up the leaderboard into a position for R4 that adds more pressure.

(Throughout this post I will focus on the example of estimating the impact of R3 performance on R4 performance, but the same considerations and analysis can be applied to estimating the relationship between R1 and R4, or R2 and R3, etc.) To concretely see the problem, consider a regression of 4th-round performance on pre-tournament skill and relative (to pre-tournament skill) performance in rounds 1-3: this yields small, positive coefficients on a golfer's R1-R3 performances. At first blush, you might draw the surprising conclusion that performance in rounds 1-3 of a tournament don't have much predictive power for final round performance. But what's actually going on is more nuanced than that: the best performers in the first three rounds of an event will find themselves near the lead, and playing with or near the lead has a negative effect on performance. This negative effect masks the larger positive "true" effect of R1-R3 play on R4 play, resulting in a correlation that is only slightly positive.

For readers with a statistical background, the next step should be clear: control for pressure when estimating the effect of R1-R3 performance on R4 performance. That is, to better estimate our pressure-free baseline, final round starting position needs to be added to the regression described above. However this quickly leads us to another problem: as mentioned, a golfer's pre-tournament skill plus their relative performance in rounds 1-3 of a tournament will be very highly correlated with 4th-round starting position. Intuitively, estimating the effect of R3 performance while controlling for pre-tournament skill, R1 and R2 performance, and R4 starting position, requires a comparison to be made between two golfers who start the final round in the same position, had the same pre-tournament skill / relative performances in Rounds 1-2, but had different 3rd round performances. Are comparisons like this even possible? Well, there is at least one tournament where they are: the Tour Championship (since 2019, when starting strokes came into effect). Two players with different starting handicaps could have the same skill level, shoot identical scores in rounds 1&2, shoot different scores in round 3, but find themselves in the same starting position heading into the final round (and therefore facing the same amount of pressure)! Comparing their subsequent final round performances would provide an estimate of the impact of R3 performance on R4 performance that isn't confounded by pressure.

Of course, data from a few players at the last four playings of the Tour Championship is nowhere near enough to say anything definitive about the relationships between relative performance in each round. All is not lost, however: while the Tour Championship provides the cleanest example of a disconnect between pre-tournament skill, early-round performance, and final round starting position, there are others once you start making comparisons across tournaments. For example, consider two tournaments that proceed identically except in one the final round leader plays really well in the 3rd round to build a 6-shot lead, while in the other the leader plays a pedestrian 3rd round to only hold a 1-stroke advantage. This is a situation where two golfers face the same pressure (i.e. both are leading) and have played the same except for their R3 performance. In theory, comparisons like this can be used to estimate the "pressure-free" effect of R3 performance on R4 performance. However, this might not be the kind of variation we want to use to estimate the R3-R4 relationship; holding a 6-stroke lead is probably different from a 1-stroke lead in terms of pressure and strategy, and so the comparison between the final round performances of the two golfers described above is not solely picking up the effect of their differential performance in the 3rd round.

Fortunately, less problematic comparisons can also be made at the bottom of the leaderboard: for a golfer who starts the final round in last place, it wouldn't have mattered (from a pressure standpoint) whether they had performed 5 strokes worse the previous day. Further, it turns out that there are many less obvious situations where relatively clean comparisons can be made to estimate the effect of R3 performance. Golf leaderboards come in many different flavours: sometimes there are a few golfers who have separated from the field, while other times 20 players are within 4 strokes of the lead. Consequently, the "same" performances in different tournaments can land you in different positions on the leaderboard heading into the final round, and this can be exploited to estimate our pressure-free baseline.

In econometrics there is something called the Frisch-Waugh-Lowell thoerem, which states that in a regression of \( Y \) on \( X_{1} \) and \( X_2 \), the coefficient on \( X_{1} \) can be estimated by first regressing \( X_{1} \) on \( X_{2} \) and keeping the residuals, and then regressing \( Y \) on these residuals. The previous three paragraphs have attempted to describe situations where these residuals will take on non-zero values, which is a preliminary requirement if we hope to even be able to estimate a coefficient on \( X_{1} \) (R3 performance in our context). Under a few assumptions, it's possible to determine how much each observation contributes to the overall estimate; the larger is the residual, the larger is the contribution of that observation. This makes sense intuitively: recall that in a simple regression of \( Y \) on \( X \), observations equal to the mean of \( X \) actually contribute nothing to the coefficient estimate on \( X \)! If you don't believe me, try it yourself: create a dataset in R, and run the regression of \( Y \) on \( X \); then add another observation with any value for \( y \) and with \( x \) equal to the mean of \( X \). Re-run the regression and you'll find the coefficient estimate on \( X \) unchanged! When you have multiple regressors, as we do, the same logic applies except it is the residual from a regression of \( X \) on the other regressors that we are concerned with, instead of \( X \) itself.

Returning to our context of estimating the effect of R3 performance on R4 performance, consider the observation for Garrett Willis in Round 4 at the 2010 Honda Classic. Willis started the 3rd round in 61st position and started the final round in 74th position. Given his pre-tournament skill, R1 performance, R2 performance, and R4 starting position, we predicted that his R3 performance would have been 2.6 stroke below baseline (the baseline is pre-tournament skill here); in fact, it was 8.7 strokes below baseline, yielding a residual of -6.1 strokes. This is one of the larger residual values, and, as described above, is possible because you can't move lower on a leaderboard than last place. As stated earlier, the size (squared, actually) of the residual indicates how much that observation contributes to the estimate of the effect of R3 on R4 performance. The table below summarizes the weight contributed from observations for each final round starting position:

R4 Start Position # of Observations Weight
1 696 9.1%
2-5 2572 12.6%
6-15 5808 14.1%
16-25 5735 11.1%
26-35 5003 6.6%
36-45 4973 5.1%
46-55 4567 7.8%
56-65 4076 9.3%
66+ 3081 23.8%

Not surprisingly, it is the golfers that start the final round near the top or bottom of the leaderboard that contribute the most to the estimate. This is because their round 3 performance does not impact their final round starting position as much as other positions on the leaderboard. However, it can be seen that the variation is pretty evenly distributed overall. I would have been concerned about what exactly I was estimating if most of the variation was coming from final round leaders or players occupying the last few positions.

At this point it's probably worth recapping my thought process in this post. The goal was to generate predictions for golfers' final round performance if they were to play it with "no pressure" — this is the so-called pressure-free baseline. I decided that this baseline (for the final round) should depend on pre-tournament skill and performance in rounds 1-3. I argued that simply regressing R4 performance on R1-R3 performance plus pre-tournament skill doesn't get me what I want because the estimates are confounded by 4th-round pressure (i.e. final round starting position is correlated with all of these variables). Therefore starting position needs to be added to the regression; this introduces the additional problem of a strong correlation between regressors. As those familiar with regression will know, to precisely estimate a coefficient on some variable you need to have sufficient variation in that variable. I argued with various examples that there should still be some variation in R3 performance even after controlling for R1-R2 performance, skill, and R4 starting position. I then examined, with the help of the FWL theorem, which observations are actually providing that variation. If the main source of variation had been from final round leaders, I would not have been comfortable using the resulting estimate to predict a pressure-free baseline for R4. However, given that the variation is fairly evenly distributed across final round starting positions, I think I'm picking up something close to the true (i.e. pressure-independent) effect of R3 performance on R4 performance. To actually predict the R4 pressure-free baseline, the full regression that includes skill, R1/R2/R3 performance, and R4 starting position is estimated, and then predictions from this regression are generated with golfers' actual R4 starting position replaced with a pressure-neutral position (e.g. 40th place). All of this discussion and analysis could be applied to each of the other variables involved in the prediction of the final round pressure-free baseline (pre-tournament skill, R1 performance, and R2 performance).

To conclude, the table below displays the average strokes-gained relative to the pressure-free baseline for final round leaders in each season since 2010, separately for the PGA and European Tours.

2010 -1.43 -0.66
2011 0.02 0.25
2012 -0.67 0.01
2013 -0.69 -0.20
2014 -0.72 -1.15
2015 -0.73 -0.51
2016 -0.50 0.10
2017 -1.03 -0.46
2018 -0.39 0.08
2019 -0.36 -0.16
2020 -0.18 -0.76
2021 -0.71 -0.56
2022 -0.88 -0.41
ALL -0.65 -0.34

I think the most important takeaway here is just how much variation we see from year-to-year. This isn't surprising: variation in golf performance is mostly noise, and with only 35-45 final rounds per season, the yearly estimates are still heavily influenced by this randomness. It's interesting to look at the values from the 2019 season, which was a season where we had some success making in-play bets on leaders (we probably got lucky). Another interesting observation to make is that leaders on the European Tour appear to play significantly better than their PGA Tour counterparts! However, it's unclear what this difference in underperformance actually means. As I've tried to illustrate throughout this post, it's tricky to disentangle the separate effects of pre-tournament skill, R1-R3 performance, and R4 starting position on R4 performance. The larger are the effect estimates for R1-R3 performance, the higher will be the pressure-free baselines for final round leaders (because they will have played well in R1-R3 to get in the lead), and the larger will be the estimated effect of pressure. In the case of the European Tour, there is a weaker relationship between skill / R1-R3 performance and final round performance than on the PGA Tour, which explains the smaller "underperformance" by leaders. In practice, what actually matters here obviously is the overall prediction (i.e. pressure-free baseline + effect of pressure) we make for leaders in the final round. These won't differ that much between the PGA and European Tours.