Scores from last week's RBC Canadian Open were very spread out. The first two rounds ranked among the
top 50 in terms of their standard deviation of scores (using Shotlink PGA Tour events since 2015),
while the weekend rounds were both in the top 250. Part of this high variance was due to the fact that the skill
distribution of the field was unusually wide.
However, even after accounting for this variance in skill, the variance in scores for rounds 1 and 2
at St. Georges were in the top 6-8% of rounds since 2015. This made me think of something I've come back to
several times over the years: when trying to measure performance in golf,
should we standardize scores on a given day? That is, if player A beats the field average by 9 strokes on a
day where the standard deviation in scores was 3 strokes, while player B beats the field
by 7.5 strokes when the standard deviation was 2.5 strokes, should we judge player A's
performance to be better, or equal, to player B's performance?
To clarify, by "standardize" I mean divide by the standard deviation of the variable of interest.
Standardizing is useful for framing comparisons between quantities measured in different units. For a straightforward example from golf,
how do we judge whether Viktor Hovland
's length (9.6 yards longer than PGA Tour average)
or accuracy (4.4% more fairways per round than average) is more impressive? We standardize: the
standard deviation in our measure of
driving distance skill is 8 yards, while the standard deviation in accuracy skill is 4.7%, which gives
Hovland a standardized skill of 1.2 in distance and 0.92 in accuracy. This 1.2 and 0.92 can also
be roughly converted to a percentile form,
allowing us to say that Hovland is in the 88th percentile for distance and the 82nd
percentile for accuracy. Returning to round scores, another way to
frame the question of whether to standardize is: are strokes "worth" the same
in every round, that is, should we think of them as measured in the same units?
To answer this question, we need to think a bit more deeply about why scores spread out more in some rounds than others.
Let's consider two extreme examples. The first will make the case for standardizing scores while
the second makes the case for not doing so. First, suppose that two golf courses are identical except that
one of them has bunkers where the other has water hazards. If golfer A plays both courses and doesn't hit it
in a bunker/hazard, while golfer B plays both courses and hits it in 4 bunker/hazards, then
A's strokes-gained over B will higher at the course with water hazards than the one with bunkers. I think scores
should be standardized here because the higher SG on the water-hazard course does not reflect better play, but rather
the different features of the courses that were played. Second, suppose that two golf courses are identical except one has only
17 holes while the other has the standard 18. If golfer A always gains 0.05 strokes per hole over golfer B,
then A will have a higher SG on the 18-hole course. I think it's obvious that scores shouldn't be
standardized here because the higher SG on the 18-hole course has been earned through good play.
With these simple examples in mind, let's turn to the data to understand what drives
variance in golf.
To start, I calculate the standard deviation (or variance — which equals standard deviation squared — I'll use the two interchangeably) in total strokes-gained,
as well each strokes-gained category, for every Shotlink-measured PGA Tour round since 2015. (All of these are adjusted
for the variance in skill of the players in each round, as I'm not interested in that component of variance.)
The goal of this analysis is to understand, or explain, the variation we observe in standard deviations
across PGA Tour rounds. A simple first step is to look at how the standard deviations of
the SG categories vary across rounds; the standard deviation of SG:APP varies the most,
followed by PUTT, and then OTT/ARG. Put more simply, when we observe an above-average variance in round scores,
the most likely cause is a higher variance in SG:APP in that round.
This is related to (but not completely driven by) the fact that this is roughly the
ordering of the SG categories
in terms of their
variability across players within each round.
We'll return later on to the findings in the previous paragraph, but first we are on to more interesting things.
I next perform a round-level regression of the variance in total SG on various
characteristics of the course
. About 25% of the differences in variance across rounds can be explained by these
course characteristics. There were 6 course variables at our disposal that had meaningful explanatory power:
the strongest determinant of variance was the difficulty of the course (i.e. scoring average relative to par),
followed by the course's par and the number of penalty
strokes per round, then the penalty for a missed fairway
the average standard deviation in driving distance on each hole at the course, and finally the GIR rate (lower GIR = more variance).
Interestingly, course yardage doesn't predict variance at all (after controlling for all of the above variables).
Of the courses that are played regularly on the PGA Tour,
has the highest average standard deviation in scores;
given the above results, this is explained (in part) by the fact that it's a par 72, it usually plays very difficult, it has one of the highest penalty rates on tour,
and it has a very high penalty for missing the fairway.
Let's pause and consider whether the sources of variance listed above say anything about whether scores should
be standardized within rounds. The par of the course and the scoring average relative to par are essentially telling us the same thing:
the more shots golfers have to hit in a round, the higher we expect the variance in those scores to be. This makes perfect sense,
and is no different than the 17-versus-18-hole example described above. These sources of variance provide golfers with
more opportunities (i.e. shots) to showcase their skill, and therefore standardizing scores would not be appropriate.
Conversely, penalties (both of the actual and missed-fairway variety) as a driver of variance does make me think
standardizing scores could lead to a better measure of performance. For example,
the same off-the-tee performance (in terms of distance and dispersion)
will be rewarded differently at a course with a high penalty for missing the fairway. Standardizing scores
may be too extreme of a solution, but it does seem like some adjustment would be appropriate here.
I also perform similar regressions to the one described above for each of the SG categories.
For example, I regress the standard deviation in SG:PUTT on a few putting-related characteristics of a course:
difficulty of putts inside 5 feet, difficulty of putts in the 5-15 foot range, and difficulty of putts greater than 15 feet.
The sole driver of putting variance (among these 3 characteristics) is the difficulty of putts inside 5 feet. It should come as no surprise, then,
that the 6 highest standard deviations in putting performance since 2015 have come at Torrey Pines or Pebble Beach (courses both known for their
bumpy Poa Annua greens). In the ARG regression,
the two main predictors of variance were a course's GIR rate (lower GIR = more variance) and the
difficulty of bunker shots. For APP and OTT, a course's penalty rate is easily the most predictive of
variance in both categories.
As mentioned above, we only managed to explain ~25% of the differences in standard deviation of scores across rounds
using the characteristics of the course, leaving 75% unexplained. The main driver
of this 75% is randomness in golfer performance: if the same group of golfers plays many rounds under the exact same course conditions,
each round will yield a different standard deviation in scores. This ties back to the earlier
point about how the within-round
variance of SG categories relates to their explanatory power of differences in variance across
For example, because putting is high variance, some rounds may have a higher variance in scores simply because a few golfers
had really good, or really bad, days on the greens. This can't happen with off-the-tee play to the same extent it happens
on the greens, because off-the-tee performance is not as variable. This portion of the unexplained variance should definitely not be standardized,
as it occurs independently of
the conditions of the course.
To wrap up, for any readers who have made it this far, my main takeaway is that given the likely
sources of variation in the standard deviation of scores across rounds, standardized scores within each round
seems like a bad idea. There are examples, such as high-variance rounds being driven by increased penalty strokes,
where the idea of standardizing makes some sense. But without isolating the different sources of variance we risk standardizing away
useful information along with the penalty stroke variance we are after. A better approach would be to target penalty strokes
directly as a component of performance that is not predictive.