Model Talk
NIGHT MODE
DAY MODE
Does our model overvalue leaders?
Using our PGA Tour live model archives, which date back to Nov 2018 (with a few missing events since then for various reasons), 339 players have entered the final round with a share of the lead or better. Using our pre-final-round win probabilities, we predicted 127.8 wins from this group of players (37.6% win rate). In fact, this group won 134 times (39.5%).

Using our European Tour archives, which only go back to Feb 2023, 83 players had a lead or co-lead heading into the final round. We predicted 26.6 wins from this group (32%) and 29 of them went on to win (34.9%).

So, interestingly, our model has slightly underestimated the actual win rate of 54-hole leaders. This is interesting because its one of the few spots where our model still consistently diverges from the betting market. Unfortunately we don’t archive in-play betting odds, but anyone who follows our model closely knows that we are consistently higher on leaders than the sportsbooks and exchanges. This divergence seems to occur most often with unproven players, or players with a history of losing leads. Here is how the win rates look when grouped by the leader’s pre-tournament skill:

tour player type count model actual
pga elite 73 38.3 (52.4%) 43 (58.9%)
pga sub elite 97 39.2 (40.4%) 36 (37.1%)
pga above average 115 36.8 (32%) 41 (35.7%)
pga below average 54 13.5 (25.1%) 14 (25.9%)
euro elite 11 4.9 (44.6%) 5 (45.5%)
euro sub elite 22 8.4 (38.3%) 9 (40.9%)
euro above average 34 10.2 (30.1%) 12 (35.3%)
euro below average 16 3.1 (19.1%) 3 (18.8%)

(On the PGAT, elite players were defined as those with a skill above +1.6, sub-elite were between 0.8 and 1.6, above-average between 0 and 0.8, and below-average was below 0. For the European Tour, the breakpoints are all 0.9 strokes lower than the PGA Tour.)

Combining across tours, elite players outperformed our model’s win expectation the most, while sub-elite players were the only ones to underperform it. The sample sizes in these subgroups are pretty small, so I wouldn't read into them too much. For a useful reference point, the standard deviation in a proportion estimate is \( \sqrt(\frac{p \cdot (1-p)}{N}) \), so for \( N=339 \) and \( p=0.39 \) the standard error is 2.6%.

The model estimates shown above are true out-of-sample predictions in that they were made before the outcome had occurred. However, to get a bigger sample size, it’s useful to look back at what our model predictions would have been for tournaments that pre-dated our live model.

Mainly out of laziness, I'm going to use the probabilities that appear on our pressure page, which go back to 2004 for the PGA and European Tours. Importantly, these probabilities don’t include an adjustment for pressure (i.e. a player's position on the leaderboard), and as a result they are, on average, ~3% higher for 54-hole leaders than the probablities from our live model (which does account for pressure). These probabilities also differ for various idiosyncratic reasons (the live model accounts for weather, and updates skill in a more complex way during the event), but these should even out in a sufficiently large sample. The upshot is that we'd expect the average live model probability to be about 3% lower than the "model" probabilities shown below. First, here are the overall predicted and actual win rates by tour since 2004:

tour count model actual
pga 1231 492.1 (40%) 464 (37.7%)
euro 996 389.8 (39.1%) 390 (39.1%)

The actual win rate on the PGAT is 2.3% lower than our pressure-free model probabilities, and therefore roughly in line with what we would have expected our live model to project. Shockingly, 54-hole leaders on the European Tour have won as often as the pressure-free model predicted. This is a bit puzzling because when looking at strokes-gained relative to expectation, leaders on the European Tour underperform significantly. However, this underperformance is not as large as on the PGA Tour, and the chasers also underperform more on the European Tour versus the PGAT. With a sample size of 1000 the standard errors of these proportion estimates are about 1.5%, meaning that a true win rate of +/- 3% from what we've observed is still within the realm of possiblity.

Here are the win rates divided up by the leader's skill as before:

tour player type count model actual
pga elite 233 134.8 (57.8%) 142 (60.9%)
pga sub elite 377 160.9 (42.7%) 139 (36.9%)
pga above average 397 134.7 (33.9%) 127 (32%)
pga below average 224 61.8 (27.6%) 56 (25%)
euro elite 202 106.9 (52.9%) 114 (56.4%)
euro sub elite 287 119.7 (41.7%) 114 (39.7%)
euro above average 331 116.4 (35.2%) 117 (35.3%)
euro below average 176 46.7 (26.5%) 45 (25.6%)

Again, remember that our hypothetical live model numbers would be expected to be 3% lower than all the model figures here. On both tours, elite players have won more than expected by a substantial margin. The only group that was more than 3% lower than the pressure-free prediction was sub-elite players on the PGA Tour, which is probably noise (i.e. I don't see a good reason why they would underperform more than average players). Seeing this makes me think we should consider allowing pressure effects to vary by the leader's skill. Of course, any analysis of professional golf in the 2000s is incomplete without a Tiger adjustment: the elite win rates sans Tiger on the PGA Tour are 56.4% (predicted) and 57.2% (actual), meaning that Tiger's ability to close did significantly inflate elite players' win rate.

Another interesting dimension to examine is how win rates varied with the size of the 54-hole lead:

tour lead count model actual
pga 0 527 135.3 (25.7%) 132 (25%)
pga 1 303 109.6 (36.2%) 101 (33.3%)
pga 2 165 78.2 (47.4%) 74 (44.8%)
pga 3 107 65.6 (61.3%) 54 (50.5%)
pga 4+ 129 103.4 (80.2%) 103 (79.8%)
euro 0 435 111.9 (25.7%) 106 (24.4%)
euro 1 253 93.4 (36.9%) 102 (40.3%)
euro 2 132 61.8 (46.8%) 58 (43.9%)
euro 3 102 62.7 (61.5%) 66 (64.7%)
euro 4+ 74 59.9 (80.9%) 58 (78.4%)

The PGA Tour numbers tell a pretty consistent story, with larger leads usually resulting in fewer wins relative to expectation (the 3-stroke lead numbers for the PGAT are a bit crazy, but still within 2 standard errors of the live model prediction). The European Tour numbers don't show the same pattern.

It would be nice to have an archive of in-play betting odds to complete this analysis, but it seems like it must be the case that the betting market has underestimated 54-hole leaders' win probabilities in recent years (they are normally lower than our model, and our model has been slightly lower than the observed win rates). The market is generally in line with or maybe even a bit higher than our model when a top player is leading, so that subset of odds might be more accurate. It is important to remember that even when using 20 years of data, sample sizes are still relatively small: as mentioned earlier, the win rate for 1000 golfers has a standard error of roughly 1.5%.

Extra: Here are the predicted and actual win rates for 36-hole leaders from our live model (we haven't added probabilities back to 2004 for 36-hole leaders yet):

tour count model actual
pga 357 82.0 (23%) 82 (23%)
euro 80 17.7 (22.1%) 20 (25%)

And here is the breakdown by player skill:

tour player type count model actual
pga elite 60 22.8 (38%) 28 (46.7%)
pga sub-elite 113 28.6 (25.3%) 24 (21.2%)
pga above average 118 22.2 (18.8%) 22 (18.6%)
pga below average 66 8.3 (12.6%) 8 (12.1%)
euro elite 11 4 (36.7%) 4 (36.4%)
euro sub-elite 22 6.3 (28.6%) 4 (18.2%)
euro above average 26 4.9 (18.9%) 8 (30.8%)
euro below average 21 2.4 (11.6%) 4 (19%)