Analytics Blog
Jan 22, 2020
NIGHT MODE
DAY MODE
The relationship between driving distance and performance on the PGA Tour
- January 22, 2020
When the claim is made that driving distance has become more ‘important’ to performance in professional golf, I think what is meant, intuitively, is that having above-average driving distance now affords a larger advantage than it did in the past. Statistically, we might rephrase the question as, does driving distance ‘account’ for a larger portion of the variance (i.e. the spread or dispersion) of scores in today’s game than it did in the past?

If we agree that this is the claim being made, there are then two different ways it could be true. First, it might be the case that being 10 yards above average in driving distance is now “worth” more (in terms of total strokes-gained) than it was in the past. This could be true due to changing course setups: as courses get longer, players are forced to hit driver off every tee, which makes having 10 extra yards more useful than when players were sometimes choosing shorter clubs off the tee for strategic purposes. (There are of course other reasons for why this claim could be true, or false, too). Second, it might be the case that 10 yards is still worth the same in terms of strokes-gained, but now there are larger differences across golfers in their driving distances. That is, maybe it’s the case that being in the top 10% in driving distance in 2019 means you hit the ball 15 yards further than average, but in 1990 it meant you only hit it 10 yards further than average. If 10 extra yards is still worth the same in 2019 as it was in 1990, this would also result in driving distance being more ‘important’ in 2019, in the sense defined in the opening paragraph.

For a moment let’s focus on the question, “how much is 10 extra yards of driving distance worth?”. One interpretation of this is, “holding all other attributes of a golfer constant, how much of an increase in performance can a golfer expect by increasing their average drive by 10 yards?”. Another interpretation is, “on average, how many strokes better per round are golfers who hit it 10 yards further than the average golfer?” The answer to these two questions will differ because golfers who hit the ball far also tend to do other things well, or poorly. For example, players who hit the ball above-average distances on the PGA Tour are also above-average approach players. Therefore, the simple correlation between a golfer’s average driving distance and their performance in part captures the fact that longer players are, on average, better approach players. This part of the correlation may or may not be of interest. (For more thoughts on this, see [1]). In the analysis that follows, we examine how both the ‘raw’ relationship and the ‘conditional’ (on other attributes) relationship between distance and performance has evolved over time.

Let’s now turn to the data. Below we do two related analyses, which are likely best explained with examples. First, for each PGA Tour season from 1984-2019, we estimate by how much we would expect golfer A to beat golfer B given that A hits it 10 yards further than B; second, we estimate by how much we would expect A to beat B given that A is 1 standard deviation longer than B, where the standard deviation is specific to the PGA Tour season. The latter estimate accounts for the fact that differences across golfers in terms of driving distance have been changing over time; it will provide us with a more complete answer to the question we laid out in the introduction. Also, as previously mentioned, for each analysis we present two versions of the estimate: one that holds constant, or controls for, each golfer’s driving accuracy, strokes-gained approach, strokes-gained around-the-green, and strokes-gained putting (the conditional relationship), and one that does not hold any golfer attributes constant (the raw relationship).

The statistical details are here [2]. Briefly, we are correlating a golfer’s skill in each attribute (driving distance, driving accuracy, etc.) at the time of a tournament, with their subsequent performance (total strokes-gained) in that tournament. The conditional relationships are estimated using regressions with all 5 attributes included, while the raw relationships are just simple regressions with the relevant attribute as the only independent variable.

This first plot shows the raw relationship between driving distance (measured in yards) and total strokes-gained, as well as that between driving accuracy (measured in % fairways hit) and total strokes-gained, from 1984-2019. Stated with examples, the raw relationship tells by how much we would expect golfer A to beat golfer B given that the only thing we know about them is that A hits it 10 yards further than B; the conditional relationship (use toggle to view) between driving distance and strokes-gained performance will tell us by how much we expect golfer A to beat golfer B given that A hits it 10 yards further than B, but they are equally skilled in all other respects.

How much is 10 yards of distance, or 5% more fairways, worth?
View effects while holding other variables constant
DistanceDistance
AccuracyAccuracy
Focusing first on the raw relationship, we see that the average difference in performance between two players with a 10 yard driving distance difference has bounced around 0.3-0.35 strokes per round since 1984, with a recent uptick since 2015. Toggling to view the conditional estimates, we see an upward trend since 2004 (however, we also see this in the raw correlations). Note that since we are holding constant driving accuracy (percentage of fairways hit per round), the driving distance estimates are likely higher than they “should” be, since driving accuracy should mechanically decrease when distances goes up. Finally, note that, all else equal, you would probably expect the impact of 10 yards to decline as the average driving distance increases: e.g. 10 yards is worth more when the average is 270 yards than when the average is 300 yards. (We will discuss the driving accuracy patterns in the next plot, as they are virtually identical to these.)

For the next plot, we repeat the analysis above except that each attribute is normalized to have a mean of zero and a standard deviation of 1 in each season. (Recall that for normally distributed variables the standard deviation is equal to the difference between the median and the 84th percentile.) Returning to our earlier example with golfers A and B, the interpretation is the same except now the difference we are considering between A and B is equal to 1 standard deviation, and therefore can vary by season. We also include in this plot the estimates for the strokes-gained categories; hover over a data point to see its standard deviation in that season.

How much is a 1 standard deviation improvement in each skill worth?
View effects while holding other variables constant
DistanceDistance
AccuracyAccuracy
SG: ApproachSG: APP
SG: Around GreenSG: ARG
SG: PuttingSG: PUTT
Looking at the raw relationships, we see that the distance trend is increasing slightly more than the first plot because the standard deviation in driving distance has increased over time. The driving accuracy estimate has steadily decreased since 1984, with a slowdown in the decline occuring since 2004. It is interesting to note that the highest coefficient by far among the raw correlations is that of strokes-gained approach. This is in large part due to the fact that good approach players tend to be good at all other aspects of the game (see correlation matrix below).

Moving to the conditional relationships, the distances estimates are again trending up; we also see that the driving accuracy estimates have been flat or slightly increasing since 2004. It is also interesting to note the relative sizes of the estimates here: improving driving accuracy by 1 standard deviation (~4.5 percentage points) yields an expected improvement in overall performance that is similar in magnitude to improving SG: approach by 1 standard deviation (~0.35 strokes).

It might be illuminating to consider the interpretation of the strokes-gained category estimates. These estimates tell us by how much we would predict golfer A to beat golfer B given that A is 1 standard deviation better in a given SG category, and equal in all other respects. Given the nature of our skill estimates, we would expect that a 1 stroke difference in skill for an SG category would result in a 1 stroke difference in total strokes-gained going forward. If this were true, the estimate in the plot above would equal the standard deviation in that season. (We see deviations from this due to a combination of noise and the fact that some categories predict others, e.g. ARG predicts APP slightly.) The fact that putting has a smaller coefficient than approach in every season reflects the fact that skill differences across players in SG: putting are smaller than skill differences across players in SG: approach [3]. Therefore, approach play is more important in the sense that improving from the 50th best approach player to the 10th best approach player is worth more than the same improvement in ranking in putting.

The differences between the raw and conditional relationships can be, for the most part, reconciled by examining the correlation matrix below, which displays the pair-wise correlation between our skill estimates in each season.

Correlation Matrix of Golfer Skill Estimates, 2004-2019
Drag slider to see how correlations have changed over time
Distance
Accuracy
Approach
Around Green
Putting
Distance
1
Accuracy
1
1
Approach
1
1
1
Around Green
1
1
1
1
Putting
1
1
1
1
1
There are many interesting patterns here, a few of them are:
1. Skill in SG: around-the-green and SG: putting are significantly positively correlated. This could be due to selection, or due to a common shared skill required for around-the-green play and putting.

2. Skill in SG: approach is positively correlated with every other attribute in every single season! This is pretty remarkable; the positive correlation with driving and around-the-green performance is intuitive (driving and approach share a common ball-striking skill; around-the-green performance includes some wedge shots), but the positive correlation with putting is harder to explain.

3. Skill in SG: approach is more strongly positively correlated with driving accuracy than it is with driving distance.

4. Driving distance is negatively correlated with around-the-green and putting performance in every season. This is likely driven by selection bias: if you don’t hit the ball far but are on the PGA Tour, you probably have a good short game. Like most things in golf analytics, Mark Broadie has already noted a similar pattern, and provided this rationale (p.24 of one of the original SG papers)
There is a lot of information to absorb from this analysis. To wrap things up, here are the key takeaways with respect to the question we set out to answer:
1. The raw correlation between a golfer’s average driving distance and their total strokes-gained has increased slightly since 1984. Conversely, the raw correlation between driving accuracy and total strokes-gained has steadily declined since 1984; this decline has flattened out since 2004. This is consistent with work done by Jake Nichols. Overall, it's fairly striking how little the raw correlation between distance and performance has changed since 1984.

2. The raw correlation between distance and performance hit an all-time low in 2008, but has since risen to reach an all-time high in 2018. Part of this increase is driven by the fact that driving distance has become more positively correlated with skill in SG: approach, and less negatively correlated with skill in SG: putting and SG: around-the-green in recent seasons. The degree to which the increasing correlation between distance and skill in SG: approach reflects something 'causal' is up for debate.

3. After controlling for a golfer’s other attributes, we see an increase in the correlation between driving distance and performance since 2004. Therefore this is also contributing to the rising raw correlation since 2008 mentioned above. While only looking at the conditional estimates makes it tempting to conclude that the influence of distance has risen steadily over time, given that the raw relationship actually declined slightly from 1984-2008, it's not unreasonable to assume the conditional relationship has done something similar. Unfortunately, without the strokes-gained categories before 2004, we cannot know this.

4. In 2004 (again considering the relationship holding constant other attributes), improving SG: approach by 1 standard deviation was worth more than a 1 standard deviation improvement in any other attribute. In 2019, the gains from both distance and accuracy have surpassed SG: approach. This again might be surprising, but remember that driving distance and accuracy are strongly negatively correlated (some of which is mechanical). Repeating this analysis using SG: off-the-tee instead of driving distance and accuracy (not shown), we find a similar pattern (i.e. APP > OTT in 2004, and OTT > APP in 2019). This indicates that in today's game, off-the-tee performance accounts for more of the variance in overall performance than any other SG category.

5. So, what is our answer to the question we laid out in the introduction? Unfortunately, I think the answer is 'it depends'. Only looking at data in the strokes-gained era of 2004-onwards, it seems unambiguously true that distance is playing a larger role in overall performance on the PGA Tour in recent years. However, taking the longer view from 1984-2019, we see that this relationship has fluctuated around a pretty flat trend line, and this recent uptick does not look like that much of an aberration.