Model Talk
Understanding variation in variance
Scores from last week's RBC Canadian Open were very spread out. The first two rounds ranked among the top 50 in terms of their standard deviation of scores (using Shotlink PGA Tour events since 2015), while the weekend rounds were both in the top 250. Part of this high variance was due to the fact that the skill distribution of the field was unusually wide. However, even after accounting for this variance in skill, the variance in scores for rounds 1 and 2 at St. Georges were in the top 6-8% of rounds since 2015. This made me think of something I've come back to several times over the years: when trying to measure performance in golf, should we standardize scores on a given day? That is, if player A beats the field average by 9 strokes on a day where the standard deviation in scores was 3 strokes, while player B beats the field by 7.5 strokes when the standard deviation was 2.5 strokes, should we judge player A's performance to be better, or equal, to player B's performance?

To clarify, by "standardize" I mean divide by the standard deviation of the variable of interest. Standardizing is useful for framing comparisons between quantities measured in different units. For a straightforward example from golf, how do we judge whether Viktor Hovland's length (9.6 yards longer than PGA Tour average) or accuracy (4.4% more fairways per round than average) is more impressive? We standardize: the standard deviation in our measure of driving distance skill is 8 yards, while the standard deviation in accuracy skill is 4.7%, which gives Hovland a standardized skill of 1.2 in distance and 0.92 in accuracy. This 1.2 and 0.92 can also be roughly converted to a percentile form, allowing us to say that Hovland is in the 88th percentile for distance and the 82nd percentile for accuracy. Returning to round scores, another way to frame the question of whether to standardize is: are strokes "worth" the same in every round, that is, should we think of them as measured in the same units? To answer this question, we need to think a bit more deeply about why scores spread out more in some rounds than others.

Let's consider two extreme examples. The first will make the case for standardizing scores while the second makes the case for not doing so. First, suppose that two golf courses are identical except that one of them has bunkers where the other has water hazards. If golfer A plays both courses and doesn't hit it in a bunker/hazard, while golfer B plays both courses and hits it in 4 bunker/hazards, then A's strokes-gained over B will higher at the course with water hazards than the one with bunkers. I think scores should be standardized here because the higher SG on the water-hazard course does not reflect better play, but rather the different features of the courses that were played. Second, suppose that two golf courses are identical except one has only 17 holes while the other has the standard 18. If golfer A always gains 0.05 strokes per hole over golfer B, then A will have a higher SG on the 18-hole course. I think it's obvious that scores shouldn't be standardized here because the higher SG on the 18-hole course has been earned through good play.

With these simple examples in mind, let's turn to the data to understand what drives variance in golf. To start, I calculate the standard deviation (or variance — which equals standard deviation squared — I'll use the two interchangeably) in total strokes-gained, as well each strokes-gained category, for every Shotlink-measured PGA Tour round since 2015. (All of these are adjusted for the variance in skill of the players in each round, as I'm not interested in that component of variance.) The goal of this analysis is to understand, or explain, the variation we observe in standard deviations across PGA Tour rounds. A simple first step is to look at how the standard deviations of the SG categories vary across rounds; the standard deviation of SG:APP varies the most, followed by PUTT, and then OTT/ARG. Put more simply, when we observe an above-average variance in round scores, the most likely cause is a higher variance in SG:APP in that round. This is related to (but not completely driven by) the fact that this is roughly the ordering of the SG categories in terms of their variability across players within each round.

We'll return later on to the findings in the previous paragraph, but first we are on to more interesting things. I next perform a round-level regression of the variance in total SG on various characteristics of the course. About 25% of the differences in variance across rounds can be explained by these course characteristics. There were 6 course variables at our disposal that had meaningful explanatory power: the strongest determinant of variance was the difficulty of the course (i.e. scoring average relative to par), followed by the course's par and the number of penalty strokes per round, then the penalty for a missed fairway, the average standard deviation in driving distance on each hole at the course, and finally the GIR rate (lower GIR = more variance). Interestingly, course yardage doesn't predict variance at all (after controlling for all of the above variables). Of the courses that are played regularly on the PGA Tour, Muirfield Village has the highest average standard deviation in scores; given the above results, this is explained (in part) by the fact that it's a par 72, it usually plays very difficult, it has one of the highest penalty rates on tour, and it has a very high penalty for missing the fairway.

Let's pause and consider whether the sources of variance listed above say anything about whether scores should be standardized within rounds. The par of the course and the scoring average relative to par are essentially telling us the same thing: the more shots golfers have to hit in a round, the higher we expect the variance in those scores to be. This makes perfect sense, and is no different than the 17-versus-18-hole example described above. These sources of variance provide golfers with more opportunities (i.e. shots) to showcase their skill, and therefore standardizing scores would not be appropriate. Conversely, penalties (both of the actual and missed-fairway variety) as a driver of variance does make me think standardizing scores could lead to a better measure of performance. For example, the same off-the-tee performance (in terms of distance and dispersion) will be rewarded differently at a course with a high penalty for missing the fairway. Standardizing scores may be too extreme of a solution, but it does seem like some adjustment would be appropriate here.

I also perform similar regressions to the one described above for each of the SG categories. For example, I regress the standard deviation in SG:PUTT on a few putting-related characteristics of a course: difficulty of putts inside 5 feet, difficulty of putts in the 5-15 foot range, and difficulty of putts greater than 15 feet. The sole driver of putting variance (among these 3 characteristics) is the difficulty of putts inside 5 feet. It should come as no surprise, then, that the 6 highest standard deviations in putting performance since 2015 have come at Torrey Pines or Pebble Beach (courses both known for their bumpy Poa Annua greens). In the ARG regression, the two main predictors of variance were a course's GIR rate (lower GIR = more variance) and the difficulty of bunker shots. For APP and OTT, a course's penalty rate is easily the most predictive of variance in both categories.

As mentioned above, we only managed to explain ~25% of the differences in standard deviation of scores across rounds using the characteristics of the course, leaving 75% unexplained. The main driver of this 75% is randomness in golfer performance: if the same group of golfers plays many rounds under the exact same course conditions, each round will yield a different standard deviation in scores. This ties back to the earlier point about how the within-round variance of SG categories relates to their explanatory power of differences in variance across rounds. For example, because putting is high variance, some rounds may have a higher variance in scores simply because a few golfers had really good, or really bad, days on the greens. This can't happen with off-the-tee play to the same extent it happens on the greens, because off-the-tee performance is not as variable. This portion of the unexplained variance should definitely not be standardized, as it occurs independently of the conditions of the course.

To wrap up, for any readers who have made it this far, my main takeaway is that given the likely sources of variation in the standard deviation of scores across rounds, standardized scores within each round seems like a bad idea. There are examples, such as high-variance rounds being driven by increased penalty strokes, where the idea of standardizing makes some sense. But without isolating the different sources of variance we risk standardizing away useful information along with the penalty stroke variance we are after. A better approach would be to target penalty strokes directly as a component of performance that is not predictive.