Importance of Driving Distance

When the claim is made that driving distance has become more ‘important’ to performance in professional golf, I think what is meant, intuitively, is that having above-average driving distance now affords a larger advantage than it did in the past. Statistically, we might rephrase the question as, does driving distance ‘account’ for a larger portion of the variance (i.e. the spread or dispersion) of scores in today’s game than it did in the past?

If we agree that this is the claim being made, there are then two different ways it could be true. First, it might be the case that being 10 yards above average in driving distance is now “worth” more (in terms of total strokes-gained) than it was in the past. This could be true due to changing course setups: as courses get longer, players are forced to hit driver off every tee, which makes having 10 extra yards more useful than when players were sometimes choosing shorter clubs off the tee for strategic purposes. (There are of course other reasons for why this claim could be true, or false, too). Second, it might be the case that 10 yards is still worth the same in terms of strokes-gained, but now there are larger differences across golfers in their driving distances. That is, maybe it’s the case that being in the top 10% in driving distance in 2019 means you hit the ball 15 yards further than average, but in 1990 it meant you only hit it 10 yards further than average. If 10 extra yards is still worth the same in 2019 as it was in 1990, this would also result in driving distance being more ‘important’ in 2019, in the sense defined in the opening paragraph.

For a moment let’s focus on the question, “how much is 10 extra yards of driving distance worth?”. One interpretation of this is, “holding all other attributes of a golfer constant, how much of an increase in performance can a golfer expect by increasing their average drive by 10 yards?”. Another interpretation is, “on average, how many strokes better per round are golfers who hit it 10 yards further than the average golfer?” The answer to these two questions will differ because golfers who hit the ball far also tend to do other things well, or poorly. For example, players who hit the ball above-average distances on the PGA Tour are also above-average approach players. Therefore, the simple correlation between a golfer’s average driving distance and their performance in part captures the fact that longer players are, on average, better approach players. This part of the correlation may or may not be of interest. (For more thoughts on this, see [1]). In the analysis that follows, we examine how both the ‘raw’ relationship and the ‘conditional’ (on other attributes) relationship between distance and performance has evolved over time.

Let’s now turn to the data. Below we do two related analyses, which are likely best explained with examples. First, for each PGA Tour season from 1984-2019, we estimate by how much we would expect golfer A to beat golfer B given that A hits it 10 yards further than B; second, we estimate by how much we would expect A to beat B given that A is 1 standard deviation longer than B, where the standard deviation is specific to the PGA Tour season. The latter estimate accounts for the fact that differences across golfers in terms of driving distance have been changing over time; it will provide us with a more complete answer to the question we laid out in the introduction. Also, as previously mentioned, for each analysis we present two versions of the estimate: one that holds constant, or controls for, each golfer’s driving accuracy, strokes-gained approach, strokes-gained around-the-green, and strokes-gained putting (the conditional relationship), and one that does not hold any golfer attributes constant (the raw relationship).

The statistical details are here [2]. Briefly, we are correlating a golfer’s skill in each attribute (driving distance, driving accuracy, etc.) at the time of a tournament, with their subsequent performance (total strokes-gained) in that tournament. The conditional relationships are estimated using regressions with all 5 attributes included, while the raw relationships are just simple regressions with the relevant attribute as the only independent variable.

This first plot shows the raw relationship between driving distance (measured in yards) and total strokes-gained, as well as that between driving accuracy (measured in % fairways hit) and total strokes-gained, from 1984-2019. Stated with examples, the raw relationship tells by how much we would expect golfer A to beat golfer B given that the only thing we know about them is that A hits it 10 yards further than B; the conditional relationship (use toggle to view) between driving distance and strokes-gained performance will tell us by how much we expect golfer A to beat golfer B given that A hits it 10 yards further than B, but they are equally skilled in all other respects.

How much is 10 yards of distance, or 5% more fairways, worth?

View effects while holding other variables constant

DistanceDistance

AccuracyAccuracy

Focusing first on the raw relationship, we see that the average difference in performance between two players with a 10 yard driving distance difference has bounced around 0.3-0.35 strokes per round since 1984, with a recent uptick since 2015. Toggling to view the conditional estimates, we see an upward trend since 2004 (however, we also see this in the raw correlations). Note that since we are holding constant driving accuracy (percentage of fairways hit per round), the driving distance estimates are likely higher than they “should” be, since driving accuracy should mechanically decrease when distances goes up. Finally, note that, all else equal, you would probably expect the impact of 10 yards to decline as the average driving distance increases: e.g. 10 yards is worth more when the average is 270 yards than when the average is 300 yards. (We will discuss the driving accuracy patterns in the next plot, as they are virtually identical to these.)

For the next plot, we repeat the analysis above except that each attribute is normalized to have a mean of zero and a standard deviation of 1 in each season. (Recall that for normally distributed variables the standard deviation is equal to the difference between the median and the 84th percentile.) Returning to our earlier example with golfers A and B, the interpretation is the same except now the difference we are considering between A and B is equal to 1 standard deviation, and therefore can vary by season. We also include in this plot the estimates for the strokes-gained categories; hover over a data point to see its standard deviation in that season.

How much is a 1 standard deviation improvement in each skill worth?

View effects while holding other variables constant

DistanceDistance

AccuracyAccuracy

SG: ApproachSG: APP

SG: Around GreenSG: ARG

SG: PuttingSG: PUTT

Looking at the raw relationships, we see that the distance trend is increasing slightly more than the first plot because the standard deviation in driving distance has increased over time. The driving accuracy estimate has steadily decreased since 1984, with a slowdown in the decline occuring since 2004. It is interesting to note that the highest coefficient by far among the raw correlations is that of strokes-gained approach. This is in large part due to the fact that good approach players tend to be good at all other aspects of the game (see correlation matrix below).

Moving to the conditional relationships, the distances estimates are again trending up; we also see that the driving accuracy estimates have been flat or slightly increasing since 2004. It is also interesting to note the relative sizes of the estimates here: improving driving accuracy by 1 standard deviation (~4.5 percentage points) yields an expected improvement in overall performance that is similar in magnitude to improving SG: approach by 1 standard deviation (~0.35 strokes).

It might be illuminating to consider the interpretation of the strokes-gained category estimates. These estimates tell us by how much we would predict golfer A to beat golfer B given that A is 1 standard deviation better in a given SG category, and equal in all other respects. Given the nature of our skill estimates, we would expect that a 1 stroke difference in skill for an SG category would result in a 1 stroke difference in total strokes-gained going forward. If this were true, the estimate in the plot above would equal the standard deviation in that season. (We see deviations from this due to a combination of noise and the fact that some categories predict others, e.g. ARG predicts APP slightly.) The fact that putting has a smaller coefficient than approach in every season reflects the fact that skill differences across players in SG: putting are smaller than skill differences across players in SG: approach [3]. Therefore, approach play is more important in the sense that improving from the 50th best approach player to the 10th best approach player is worth more than the same improvement in ranking in putting.

The differences between the raw and conditional relationships can be, for the most part, reconciled by examining the correlation matrix below, which displays the pair-wise correlation between our skill estimates in each season.

Correlation Matrix of Golfer Skill Estimates, 2004-2019

Drag slider to see how correlations have changed over time

Distance

Accuracy

Approach

Around Green

Putting

Distance

Accuracy

Approach

Around Green

Putting

There are many interesting patterns here, a few of them are:

1. Skill in SG: around-the-green and SG: putting are significantly positively correlated. This could be due to selection, or due to a common shared skill required for around-the-green play and putting.

2. Skill in SG: approach is positively correlated with every other attribute in every single season! This is pretty remarkable; the positive correlation with driving and around-the-green performance is intuitive (driving and approach share a common ball-striking skill; around-the-green performance includes some wedge shots), but the positive correlation with putting is harder to explain.

3. Skill in SG: approach is more strongly positively correlated with driving accuracy than it is with driving distance.

4. Driving distance is negatively correlated with around-the-green and putting performance in every season. This is likely driven by selection bias: if you don’t hit the ball far but are on the PGA Tour, you probably have a good short game. Like most things in golf analytics, Mark Broadie has already noted a similar pattern, and provided this rationale (p.24 of one of the original SG papers)

There is a lot of information to absorb from this analysis. To wrap things up, here are the key takeaways with respect to the question we set out to answer:

1. The raw correlation between a golfer’s average driving distance and their total strokes-gained has increased slightly since 1984. Conversely, the raw correlation between driving accuracy and total strokes-gained has steadily declined since 1984; this decline has flattened out since 2004. This is consistent with work done by Jake Nichols. Overall, it's fairly striking how little the raw correlation between distance and performance has changed since 1984.

2. The raw correlation between distance and performance hit an all-time low in 2008, but has since risen to reach an all-time high in 2018. Part of this increase is driven by the fact that driving distance has become more positively correlated with skill in SG: approach, and less negatively correlated with skill in SG: putting and SG: around-the-green in recent seasons. The degree to which the increasing correlation between distance and skill in SG: approach reflects something 'causal' is up for debate.

3. After controlling for a golfer’s other attributes, we see an increase in the correlation between driving distance and performance since 2004. Therefore this is also contributing to the rising raw correlation since 2008 mentioned above. While only looking at the conditional estimates makes it tempting to conclude that the influence of distance has risen steadily over time, given that the raw relationship actually declined slightly from 1984-2008, it's not unreasonable to assume the conditional relationship has done something similar. Unfortunately, without the strokes-gained categories before 2004, we cannot know this.

4. In 2004 (again considering the relationship holding constant other attributes), improving SG: approach by 1 standard deviation was worth more than a 1 standard deviation improvement in any other attribute. In 2019, the gains from both distance and accuracy have surpassed SG: approach. This again might be surprising, but remember that driving distance and accuracy are strongly negatively correlated (some of which is mechanical). Repeating this analysis using SG: off-the-tee instead of driving distance and accuracy (not shown), we find a similar pattern (i.e. APP > OTT in 2004, and OTT > APP in 2019). This indicates that in today's game, off-the-tee performance accounts for more of the variance in overall performance than any other SG category.

5. So, what is our answer to the question we laid out in the introduction? Unfortunately, I think the answer is 'it depends'. Only looking at data in the strokes-gained era of 2004-onwards, it seems unambiguously true that distance is playing a larger role in overall performance on the PGA Tour in recent years. However, taking the longer view from 1984-2019, we see that this relationship has fluctuated around a pretty flat trend line, and this recent uptick does not look like that much of an aberration.

[1] The distinction here is that of correlation versus causation. However, even focusing on the first, causal, interpretation, there is the follow-up question of what should be ‘held constant’. Are we thinking about a golfer who gets a new driver shaft that adds 10 yards (while keeping his iron distances constant)? Or, are we thinking about a golfer who adds some muscle which increases his driver distance by 10 yards, but also increases his iron distances? Or, are we talking about a golfer who improves his swing mechanics, thereby adding distance and probably improving the quality of his strikes? These are all distinct interpretations of the causal effect of adding 10 yards of distance. In this analysis, the first interpretation is likely the one closest to what we are estimating when holding other golfer attributes constant.

More generally, teasing out cause and effect is very difficult, even in a relatively controlled environment like professional golf. As the old adage goes, correlation does not imply causation. Like the distance-performance relationship, there are many examples in golf where correlation may not imply causation. For example, the claim is often made that going for the green on par 5s in two shots is a better strategy than laying up. If the evidence for this claim is that players who go for the green (from the same spot on the hole) score better on average than the players who don’t, you shouldn’t be convinced by that alone. Intuitively, this comparison is between players like Rory McIlroy (who will more often go for it from 250, all else equal) to a player like Rob Oppenheim (who will more often lay up from 250, all else equal). Not exactly an apples-to-apples comparison. In a perfect world, we would want to know the difference in scoring average between going for it and laying up for Rory McIlroy. Or, for Rob Oppenheim. This requires knowing something that is fundamentally unknowable: the counterfactual, that is, the score Rory would have made if he had not gone for the green. (In this specific instance, if you assume that players are optimizing strategy already, the claim is by definition false — because if it was better to go for it, the player would have gone for it.)[Back to text]

[2] Some details on the analysis. For each PGA Tour event from 1984-2019, we estimate each golfer’s ‘skill’ in each of 5 attributes: driving distance, driving accuracy, SG: approach, SG: around-the-green, and SG: putting, at the time of that event. These skill estimates are weighted averages of historical adjusted performance in each attribute, with recent rounds receiving more weight, and appropriately regressing the estimates towards zero based on their predictive power (averages comprised of smaller samples regress more). Different weighting schemes are used for each attribute; the weighting scheme for driving distance weights recent rounds much more heavily than the scheme for putting does, for example. As usual, ‘adjusted’ performances are interpreted relative to an average PGA Tour player, and as such account for differences in field strength across tournaments. An important point to note is that these skill estimates are predictive: a golfer who is 1 stroke above average in SG: approach will, on average, perform 1 stroke above average in SG: approach in their next round. Similarly, a golfer who is 10 yards above average in driving distance will, on average, be 10 yards above average in driving distance in their next round. With these skill estimates in hand, regressions are then run using data from each season at the round level: specifically, total strokes-gained is regressed on all 5 attributes to estimate the conditional relationships, and 5 separate regressions are performed using each attribute one at a time to estimate the raw relationships. [Back to text]

[3] If you were to look at season-long standings in each SG category you might disagree with this because it appears, for example, that improving from 100th to 50th in SG: off-the-tee is worth a similar number of strokes to improving from 100th to 50th in SG: putting. But why should we choose 1 year as our sample size? Why not 1 month? 1 day? 5 years? As it turns out, 1 year of data is much too small to get a good assessment of a golfer’s putting skill. To see this simply, consider the following: I calculate each golfer’s performance in each SG category over the following 3 time periods: in the 2019 season (1-year sample), since 2017 (3-year sample), and since 2015 (5-year sample). I then make some restrictions based off of rounds played to get a similar sample size for each time period (~250 players). Here are the results:

(OTT, APP, ARG, PUTT)
1-year: 0.23, 0.24, 0.14, 0.26
3-year: 0.24, 0.18, 0.13, 0.14
5-year: 0.22, 0.17, 0.11, 0.14

The main takeaway is that 1-year samples of SG putting still have a lot of noise in them; that’s why we see the differences between the 100th and 50th ranked players decline once we increase the sample size. Conversely, the ranking gaps for the other 3 categories are relatively stable, indicating that a 1-year sample is sufficient to capture the real skill differences between golfers. This is consistent with Mark Broadie’s findings in his original papers on strokes-gained where he does a variance decomposition using golfers’ strokes-gained over a 3-year period. The variance decomposition is a more robust method to determine which SG categories are responsible for the variation in total strokes-gained than the simple 100-50th ranking gap, but generally they will convey similar information. [Back to text]