When the claim is made that driving distance has become more ‘important’
to performance in professional golf, I think what is meant, intuitively,
is that having above-average driving distance now affords a larger advantage
than it did in the past. Statistically, we might rephrase the question as,
does driving distance ‘account’ for a larger portion of the variance
(i.e. the spread or dispersion) of scores in today’s game than it did in the past?
If we agree that this is the claim being made, there are then two different ways
it could be true. First, it might be the case that being 10 yards above average
in driving distance is now “worth” more (in terms of total strokes-gained)
than it was in the past. This could be true due to changing course setups:
as courses get longer, players are forced to hit driver off every tee,
which makes having 10 extra yards more useful than when players were
sometimes choosing shorter clubs off the tee for strategic purposes.
(There are of course other reasons for why this claim could be true, or false, too).
Second, it might be the case that 10 yards is still worth the same in terms
of strokes-gained, but now there are larger differences across golfers
in their driving distances. That is, maybe it’s the case that being
in the top 10% in driving distance in 2019 means you hit the ball 15 yards further than average,
but in 1990 it meant you only hit it 10 yards further than average.
If 10 extra yards is still worth the same in 2019 as it was in 1990,
this would also result in driving distance being more ‘important’ in 2019,
in the sense defined in the opening paragraph.
For a moment let’s focus on the question, “how much is 10 extra yards
of driving distance worth?”. One interpretation of this is,
“
holding all other attributes of a golfer constant, how much
of an increase in performance can a golfer expect by
increasing their average drive by 10 yards?”.
Another interpretation is, “
on average, how many strokes better
per round are golfers who hit it 10 yards further than the average golfer?”
The answer to these two questions will differ because golfers who hit
the ball far also tend to do other things well, or poorly. For example,
players who hit the ball above-average distances on the PGA Tour
are also above-average approach players. Therefore, the simple
correlation between a golfer’s average driving distance and their
performance in part captures the fact that longer players are,
on average, better approach players. This part of the
correlation may or may not be of interest.
(For more thoughts on this, see
[1]).
In the analysis that follows, we examine how both the ‘raw’
relationship and the ‘conditional’ (on other attributes)
relationship between distance and performance has evolved over time.
Let’s now turn to the data. Below we do two related analyses,
which are likely best explained with examples. First, for each
PGA Tour season from 1984-2019,
we estimate by how much we would expect golfer A to beat golfer
B given that A hits it 10 yards further than B; second,
we estimate by how much we would expect A to beat B given
that A is
1 standard deviation longer than B,
where the standard deviation is specific to the PGA Tour season.
The latter estimate accounts for the fact
that differences across golfers in terms of driving distance
have been changing over time; it will provide us with a more
complete answer to the question we laid out in the
introduction. Also, as previously mentioned, for
each analysis we present two versions of the estimate: one that
holds constant, or controls for, each golfer’s driving accuracy, strokes-gained approach,
strokes-gained around-the-green, and strokes-gained putting
(the
conditional relationship), and one that does not
hold any golfer attributes
constant (the
raw relationship).
The statistical details are
here
[2].
Briefly, we are correlating a golfer’s skill in each attribute
(driving distance, driving accuracy, etc.) at the time of a
tournament, with their subsequent performance (total strokes-gained) in
that tournament. The conditional relationships are estimated
using regressions with all 5 attributes included, while the raw relationships
are just simple regressions with the relevant attribute as the only independent variable.
This first plot shows the raw relationship between driving distance
(measured in yards) and total strokes-gained, as well as that between
driving accuracy (measured in % fairways hit) and total strokes-gained,
from 1984-2019. Stated with examples, the raw relationship
tells by how much we would expect golfer A to beat golfer B given
that the only thing we know about them is that A hits it 10 yards
further than B; the conditional relationship (use toggle to view) between driving
distance and strokes-gained performance will tell us by how
much we expect golfer A to beat golfer B given that A hits
it 10 yards further than B, but they are equally skilled in all other respects.
How much is 10 yards of distance, or 5% more fairways, worth?
DistanceDistance
AccuracyAccuracy
Focusing first on the raw relationship, we see that
the average difference in performance between two players with
a 10 yard driving distance difference has bounced around
0.3-0.35 strokes per round since 1984, with
a recent uptick since 2015.
Toggling to view the conditional estimates, we see an
upward trend since 2004 (however, we also see this in the raw correlations).
Note that since we are holding constant driving accuracy
(percentage of fairways hit per round), the driving distance
estimates are likely higher than they “should” be, since driving accuracy
should mechanically decrease when distances goes up. Finally,
note that, all else equal, you would probably expect the impact of
10 yards to decline as the average driving distance increases:
e.g. 10 yards is worth more when the average is 270 yards
than when the average is 300 yards. (We will discuss the driving accuracy
patterns in the next plot, as they are virtually identical to these.)
For the next plot, we repeat the analysis above except that each
attribute is normalized to have a mean of zero and a standard
deviation of 1 in each season. (Recall that for normally distributed
variables the standard deviation
is equal to the difference between the median and the 84th percentile.)
Returning to our earlier example with golfers A and B, the interpretation is the same
except now the difference we are considering
between A and B is equal to 1 standard deviation, and
therefore can vary by season.
We also include in this plot the estimates for the strokes-gained categories; hover
over a data point to see its standard deviation in that season.
How much is a 1 standard deviation improvement in each skill worth?
DistanceDistance
AccuracyAccuracy
SG: ApproachSG: APP
SG: Around GreenSG: ARG
SG: PuttingSG: PUTT
Looking at the raw relationships, we see that the
distance trend is increasing slightly more than
the first plot because the standard deviation in driving distance
has increased over time.
The driving accuracy estimate has steadily decreased since 1984, with
a slowdown in the decline occuring since 2004.
It is interesting to note that the highest coefficient
by far among the raw correlations is that of strokes-gained approach.
This is in large part due to the fact that good approach players
tend to be good
at all other aspects of the game (see correlation matrix below).
Moving to the conditional relationships, the distances estimates are again
trending up; we also see that the driving accuracy estimates have been
flat or slightly increasing since 2004. It is also interesting to note
the relative sizes of the estimates here: improving driving accuracy
by 1 standard deviation (~4.5 percentage points) yields an expected improvement
in overall performance that
is similar in magnitude to improving SG: approach by 1 standard deviation (~0.35 strokes).
It might be illuminating to consider the interpretation of
the strokes-gained category estimates.
These estimates tell us by how much we would predict golfer A to beat golfer B
given that A is 1 standard deviation better in a given SG category, and equal
in all other respects.
Given the nature of our skill estimates, we would expect that a 1 stroke difference
in skill for an SG category would result in a 1 stroke difference in total strokes-gained
going forward. If this were true, the estimate in the plot above would equal
the standard deviation in that season. (We see deviations from this due to
a combination of noise and the fact that some categories predict others, e.g. ARG predicts APP slightly.)
The fact that putting has a smaller coefficient than approach
in every season reflects the fact that skill differences across players
in SG: putting are smaller than skill differences across players
in SG: approach
[3].
Therefore, approach play is more important in the sense that
improving from the 50th best approach player to the 10th best approach
player is worth more than the same improvement in ranking in putting.
The differences between the raw and conditional relationships can be,
for the most part, reconciled by examining the correlation matrix below, which
displays the pair-wise correlation between our skill estimates in each season.
Correlation Matrix of Golfer Skill Estimates, 2004-2019
Drag slider to see how correlations have changed over time
Distance
Accuracy
Approach
Around Green
Putting
Distance
Accuracy
Approach
Around Green
Putting
There are many interesting patterns here, a few of them are:
1. Skill in SG: around-the-green and SG: putting are significantly positively correlated.
This could be due to selection, or due to a common shared skill required for
around-the-green play and putting.
2. Skill in SG: approach is positively correlated with every
other attribute in every single season! This is pretty remarkable;
the positive correlation with driving and around-the-green performance is
intuitive (driving and approach share a common ball-striking skill;
around-the-green performance includes some wedge shots),
but the positive correlation with putting is harder to explain.
3. Skill in SG: approach is more strongly positively correlated with
driving accuracy than it is with driving distance.
4. Driving distance is negatively correlated with around-the-green and
putting performance in every season. This is likely driven by selection bias:
if you don’t hit the ball far but are on the PGA Tour, you probably have a good short game.
Like most things in golf analytics, Mark Broadie has already noted a similar pattern, and
provided this rationale
(
p.24 of
one of the original SG papers)
There is a lot of information to absorb from this analysis. To wrap things up,
here are the key takeaways with respect to the question we set out to answer:
1. The raw correlation between a golfer’s average driving distance and their
total strokes-gained has increased slightly since 1984. Conversely, the raw correlation
between driving accuracy and total strokes-gained has steadily declined since 1984;
this decline has flattened out since 2004. This is consistent with
work done by Jake Nichols.
Overall, it's fairly striking how little the raw correlation between distance and performance
has changed since 1984.
2. The raw correlation between distance and performance hit an all-time low
in 2008, but has since risen to reach an all-time high in 2018. Part of this increase
is driven by the fact that driving distance
has become more positively correlated with skill in SG: approach,
and less negatively correlated with skill in SG: putting and SG: around-the-green in recent seasons.
The degree to which the increasing correlation between distance and skill in SG: approach
reflects something 'causal' is up for debate.
3. After controlling for a golfer’s other attributes, we see an
increase in
the correlation between driving distance and performance since 2004.
Therefore this is also contributing to the rising
raw correlation since 2008 mentioned above. While only looking at the
conditional estimates makes it tempting to conclude that the influence
of distance has risen steadily over time, given that the raw
relationship actually declined slightly from 1984-2008, it's not unreasonable
to assume the conditional relationship has done something similar.
Unfortunately, without the
strokes-gained categories before 2004, we cannot know this.
4. In 2004 (again considering the relationship holding constant other attributes),
improving SG: approach by 1 standard deviation was worth
more than a 1 standard deviation improvement in any other attribute. In 2019,
the gains from both distance and accuracy have surpassed SG: approach.
This again might be surprising, but remember that driving distance and accuracy
are strongly negatively correlated (some of which is mechanical). Repeating
this analysis using SG: off-the-tee instead of driving distance and
accuracy (not shown), we find a similar pattern (i.e. APP > OTT in 2004,
and OTT > APP in 2019). This indicates that in today's game,
off-the-tee performance accounts
for more of the variance in overall performance than any other SG category.
5. So, what is our answer to the question we laid out in the
introduction? Unfortunately, I think the answer is 'it depends'.
Only looking at data in the strokes-gained era of 2004-onwards,
it seems unambiguously true that distance is playing a larger role
in overall performance on the PGA Tour in recent years. However, taking the longer view from
1984-2019, we see that this relationship has fluctuated around a
pretty flat trend line, and this
recent uptick does not look like that much of an aberration.