Analytics Blog
July 28, 2023
What makes a golf tournament entertaining?
In this post we propose 3 simple metrics to capture how entertaining a PGA Tour tournament was. The motivation for these metrics comes from the idea that golf tournaments are more entertaining when 1) there are lots of players with a chance to win, 2) there are frequent and sudden changes in who is likely to win, and 3) there are top players with a chance to win. When all 3 of these conditions are met, it usually makes for an entertaining tournament. (However, none of these conditions are required for an entertaining tournament: watching a lesser-known player try to hold on to a 3+ shot lead in a major can be highly entertaining while not meeting any of the above criteria.)

Inevitably, these metrics will be unsatisfying in certain ways given the subjectiveness involved in quantifying something like entertainment value. Each of the 3 proposed metrics maps to a point on the list above. They are, in the same order as previously shown, 1) Excitement, 2) Volatility, and 3) Star Power. The main input for each metric is our live win probability data which is generated every 5 minutes during a tournament.
This metric is meant to capture how much uncertainty there is around who will win throughout a tournament. Consider first the simpler case of only 2 players: we want this metric to be maximized when both golfers have a 50% win probability and minimized when 1 player has a 100% win probability (and the other has 0%). For cases involving different numbers of golfers, we want 3 golfers at 33.3% to have a higher value in this metric than 2 golfers at 50%. Similarly, 10 golfers at 10% should have a higher value than 3 golfers at 33.3%. This last point seems debatable, as at some point if there are too many golfers with a chance to win it probably becomes less interesting to watch. However in practice I don't think we ever actually reach that point: if there are 10 players with a realistic chance to win on the back nine on Sunday (a very rare occurence) that should be pretty compelling. Here's the formal definition:
Excitement: One minus the sum of win probabilities squared at each point in time, averaged over time

With only 2 players the sum of their possible win probabilities squared will be minimized when both probabilities are 50% (0.5) and maximized when they are equal to 100% and 0% (1). More generally, the squared sum is lower when there are more players involved and the probabilities are more equal. Using our example above, 3 golfers at 33.3% gives a value of 0.333, while 10 golfers at 10% yields a value of 0.1. Given that we want higher values for this metric to mean more excitement, we reverse the ordering by subtracting the squared sum from 1. Once we've calculated the squared sum for each timestamp in our live model data (and subtracted the sum from 1), we simply average these values across all times. Here are the tournaments with the 5 highest and 5 lowest Excitement values in our database (at the time of writing):

Top 5 in Excitement:
Bottom 5 in Excitement:
This metric is meant to capture the amount of turnover in who is likely to win throughout a tournament. Here's the formal definition:
Volatility: Sum of the absolute value of changes in player-level win probabilities

For each timestamp in our live model data we calculate the change in every player's win probability from the previous timestamp (5 minutes earlier). We then take the absolute value of these changes (meaning that a win probability changing from 1% to 2% is treated the same as moving from 2% to 1%—we only care about the size of the change, not the direction). Importantly, if the change is smaller than 1% we treat it as zero. This is done so that we don't pick up changes that are due to "simulation error" which would favour tournaments with larger fields. (If we simulate a tournament 25K times and get one set of win probabilities, and then repeat the 25K sims to get a 2nd set of win probabilities, there will be slight differences between the two sets of probabilities—these differences are what we call simulation error.) We then simply sum up all these win probability changes during the tournament to get our Volatility metric. Shown below are the tournaments with the 5 highest and 5 lowest Volatility values in our database:

Bottom 5 in Volatility:
Star Power
This is the most straightforward of the 3 metrics to understand. Intuitively it's meant to capture the average skill of the golfers who are contending to win throughout the tournament. If Rory McIlroy is in the field but 15 shots out of the lead come Sunday, then his presence isn't really adding anything to the entertainment value of that tournament (on Sunday). Here's the formal definition:
Star Power: Win-probability-weighted average of player skill at each point in time, averaged over time

For each point in time in our live model data (5-minute intervals) we take a weighted average of skill, using a player's win probability at that time as their weight. We then average these weighted averages across all times during the tournament. Shown below are the tournaments with the 5 highest and 5 lowest Star Power values in our database excluding TOUR Championships (the 2019-2022 TOUR Championships—which use starting scores—have the 4 highest Star Power values):

Bottom 5 in Star Power:
Before combining these 3 metrics to get overall entertainment scores, it's worth noting a couple things. First, it's clear that the Excitement and Volatility metrics will be correlated: when Excitement is high this means that win probabilities are more equalized across players, which makes it easier for large win probability changes to occur (which results in higher Volatility). It's easiest to see why this is the case by considering a situation with zero Excitement (100% of the win probability is on 1 player). If the probabilities can be trusted, this means there won't be any win probability movement for the rest of the tournament, which means, all else equal, a lower value for Volatility. In our actual data the Excitement and Volatility metrics have a correlation of 0.7.

Second, it's reasonable to think that field size could have an effect on the Excitement and Volatility metrics. More players means more chances for win probabilities to move (higher Volatility) and it also means more uncertainty around who will win (higher Excitement). However, in the data the correlation between field size and Volatility is roughly zero. As mentioned above, when calculating Volatility any win probability change below 1% is not counted; this was done to ensure that we don't pick up simulation noise which would give larger fields naturally higher Volatility values. That adjustment appears to have worked well. The correlation between field size and Excitement is 0.3. The Excitement metric tends to be higher in larger fields early on in the tournament because, for example, one hundred players at 1% win probability yields a higher Excitement value than twenty players at 5%. After removing the TOUR Championships with starting scores this correlation falls to 0.2, and if we only consider fields with at least 120 players the correlation goes to zero. So for smaller field sizes it will be harder to get a high Excitement value due to their naturally lower values in the early stages of the tournament. This could be desirable if you think that small-field tournaments are inherently less exciting.

To calculate overall entertainment scores we take a simple average of our 3 metrics. For ease of interpretation, each metric is converted to percentile form: for example, a value of 98 means that the tournament scored higher than 98% of tournaments in that metric. Here are the tournaments with the 10 highest and 10 lowest overall entertainment scores:

Top 10:
2023 Arnold Palmer (Excitement: 96.4, Volatility: 98.5, Star Power: 91.3; Overall: 95.4)
2022 U.S. Open (91.8, 90.3, 83.6; 88.6)
2019 Players Champ. (71.3, 91.3, 90.3; 84.3)
2019 Masters (84.6, 84.1, 81.0; 83.2)
2020 Charles Schwab (97.9, 91.8, 59.5; 83.1)
2022 BMW Champ. (87.2, 75.4, 84.1; 82.2)
2023 Memorial Tournament (95.9, 71.8, 75.9; 81.2)
2022 RBC Heritage (97.4, 94.9, 49.2; 80.5)
2021 Sentry ToC (75.4, 76.9, 88.7; 80.3)
2022 WM Phoenix Open (92.3, 89.2, 56.4; 79.3)
Bottom 10:
2019 Rocket Mortgage (Excitement: 6.2, Volatility: 1.0, Star Power: 14.4; Overall: 7.2)
2023 3M Open (8.7, 1.0, 15.8; 8.5)
2023 Mayakoba (0.5, 0.5, 39.0; 13.3)
2021 RBC Heritage (9.2, 4.6, 31.8; 15.2)
2023 Puerto Rico Open (23.6, 22.1, 0.0; 15.2)
2022 3M Open (13.8, 23.6, 9.2; 15.5)
2022 John Deere Classic (14.9, 12.8, 24.1; 17.3)
2020 AT&T Pebble Beach (28.7, 15.4, 7.7; 17.3)
2022 RSM Classic (15.9, 11.8, 25.1; 17.6)
2023 Houston Open (1.0, 0.0, 61.0; 20.6)

Looking at the top-rated tournaments by overall score, it seems like these metrics do a decent job of identifying the most entertaining tournaments. (Obviously one important element of entertainment that is missing here is the prestige of the tournament. Star Power captures this to an extent, but there is probably no world where the Hero World Challenge should be rated as more entertaining than a major championship.) Because these metrics are averaged over the entirety of an event, tournaments are rewarded for sustained entertainment value. For example, the 2023 Sentry Tournament of Champions had a very exciting 45-minute stretch where Collin Morikawa's win probability fell from 94% to 3%, but outside of that short window it was a very boring golf tournament in that most of the win probability was on a single player (Morikawa, and then later, Jon Rahm). As a result that tournament is in the 4th percentile for Excitement.

Another possible weakness of these metrics is that they assign equal weight to every part of a tournament. However, that's one of the nice things about working with win probability: a given % change should mean the same thing in terms of entertainment value regardless of what round we are in (whereas something like a stroke doesn't). For example, a 20% jump in win probability can easily occur in the final round from a single birdie, whereas a similar-sized move in the first round would require a much more unexpected sequence of events (e.g. a quadruple bogey from the tournament leader). Because win probabilities are more volatile and closer to 0/100 the later we are in a tournament, the later stages of a tournament naturally have a much bigger impact on each of our metrics, even though we aren't explicitly assigning more weight to them.