In this post we propose 3 simple metrics to capture how entertaining a PGA Tour tournament was.
The motivation for these metrics comes from the idea that golf tournaments are more entertaining when
1) there are lots of players with a chance to win, 2) there are frequent and sudden changes in
who is likely to win, and 3) there are top players with a chance to win.
When all 3 of these conditions are met, it usually makes for an entertaining tournament. (However,
none of these conditions are required for an entertaining tournament:
watching a lesser-known player try to hold on to a 3+ shot lead in a major can be highly entertaining
while not meeting any of the above criteria.)
Inevitably, these metrics will be unsatisfying in certain ways given the subjectiveness involved in quantifying something like
entertainment value.
Each of the 3 proposed metrics maps to a point on the list above. They are, in the same order as previously shown, 1) Excitement, 2) Volatility,
and 3) Star Power. The main input for each metric is our live win probability data which is generated every 5 minutes during
a tournament.
This metric is meant to capture how much uncertainty there is around
who will win throughout a tournament.
Consider first the simpler case of only 2
players: we want this metric to be
maximized when both golfers have a 50% win probability and minimized when 1 player
has a 100% win probability (and the other has 0%). For cases involving different
numbers of golfers, we want 3 golfers at 33.3% to have a higher value in this metric
than 2 golfers at 50%.
Similarly, 10 golfers at 10% should have a higher value than 3 golfers at 33.3%. This last
point seems debatable, as at some point if there are too many golfers with a chance to win
it probably becomes less interesting to watch. However in practice I don't think we ever actually
reach that point: if there are 10 players with a realistic chance to win on the back nine on Sunday
(a very rare occurence) that should be pretty compelling. Here's the formal definition:
Excitement: One minus the sum of win probabilities squared at each point in time, averaged over time
With only 2 players the sum of their possible win probabilities squared will be
minimized when
both probabilities are 50% (0.5) and maximized when they are equal to 100% and 0% (1). More generally,
the squared sum is lower when there are more players involved and the probabilities are more equal. Using our example above,
3 golfers at 33.3% gives a value of 0.333, while 10 golfers at 10% yields a value of 0.1.
Given that we want higher values for this metric to mean more excitement,
we reverse the ordering by subtracting the squared sum from 1. Once we've calculated the squared sum
for each timestamp in our live model data (and subtracted the sum from 1), we simply average these values
across all times. Here are the tournaments with the 5 highest and 5 lowest Excitement values in our
database (at the time of writing):
This metric is meant to capture the amount of turnover in who is likely to win throughout a tournament.
Here's the formal definition:
Volatility: Sum of the absolute value of changes in player-level win probabilities
For each timestamp in our live model data we calculate the change
in every player's win probability from the previous timestamp (5 minutes earlier).
We then take the
absolute value of these changes (meaning that a win probability changing from 1% to 2% is treated the same as moving from 2% to
1%—we only care about the size of the change, not the direction). Importantly, if the change is smaller than 1% we treat it
as zero. This is done so that we don't pick up changes that are due to "simulation error" which would favour
tournaments with larger fields. (If we simulate a tournament 25K times and
get one set of win probabilities, and then repeat the 25K sims to get a 2nd set of win probabilities,
there will be slight differences between the two sets of probabilities—these
differences are what we call simulation error.) We then simply sum up all these win probability changes during the tournament
to get our Volatility metric.
Shown below are the tournaments with the 5 highest and 5 lowest Volatility values in our database:
This is the most straightforward of the 3 metrics to understand. Intuitively it's meant to capture the average skill
of the golfers who are contending to win throughout the tournament. If Rory McIlroy is in the field but 15 shots out of the lead
come Sunday, then his presence isn't really adding anything to the entertainment value of that tournament (on Sunday).
Here's the formal definition:
Star Power: Win-probability-weighted average of player skill at each point in time, averaged over time
For each point in time in our live model data (5-minute intervals)
we take a weighted average of skill, using a player's win probability at that time as their
weight. We then average these weighted averages across all times during the tournament.
Shown below are the tournaments with
the 5 highest and 5 lowest Star Power values in our database
excluding TOUR Championships
(the 2019-2022 TOUR Championships—which use starting scores—have
the 4 highest Star Power values):
Before combining these 3 metrics to get overall entertainment scores,
it's worth noting a couple things. First, it's clear that the Excitement and Volatility metrics
will be correlated: when Excitement is high this means
that win probabilities are more equalized across players, which makes it easier for large win
probability changes to occur (which results in higher Volatility).
It's easiest to see why this is the case by considering a situation with zero Excitement (100% of the win probability is
on 1 player). If the probabilities can be trusted, this means there won't be any win probability
movement for the rest of the tournament, which means, all else equal, a lower value for Volatility.
In our actual data the Excitement and Volatility metrics have a correlation of 0.7.
Second, it's reasonable to think that field size could have an effect on the Excitement and Volatility metrics.
More players means more chances for win probabilities to move (higher Volatility) and it also means more uncertainty
around who will win (higher Excitement).
However, in the data the correlation between field size and Volatility is roughly zero.
As mentioned above, when calculating Volatility any win probability change below 1% is not counted; this was done to ensure that
we don't pick up simulation noise which would give larger fields naturally higher
Volatility values. That adjustment appears to have worked well.
The correlation between field size and
Excitement is 0.3. The Excitement metric tends to be higher in larger fields
early on in the tournament because, for example, one hundred players at 1% win probability yields a higher Excitement value
than twenty players at 5%. After removing the TOUR Championships with starting scores this correlation falls to 0.2, and if we only consider fields
with at least 120 players the correlation goes to zero. So for smaller field sizes it will be harder to get a high
Excitement value due to their naturally lower values in the early stages of the tournament.
This could be desirable if you think that small-field tournaments are inherently less
exciting.
To calculate overall entertainment scores we take a simple average of our 3 metrics.
For ease of interpretation, each metric is converted to percentile form: for example, a value
of 98 means that the tournament scored higher than 98% of tournaments in that metric.
Here are the tournaments with the 10 highest and 10 lowest overall entertainment scores:
Looking at the top-rated tournaments by overall score, it seems like these metrics
do a decent job of identifying the most entertaining tournaments. (Obviously one important element of entertainment that
is missing here is the prestige of the tournament. Star Power captures this to an extent, but there is probably
no world where the Hero World Challenge should be rated as more entertaining than a major championship.)
Because these metrics are averaged over
the entirety of an event, tournaments are rewarded
for sustained entertainment value. For example, the 2023 Sentry Tournament of Champions had a very
exciting 45-minute stretch where Collin Morikawa's win probability fell from 94% to 3%, but outside
of that short window it was a very boring golf tournament in that most of the win probability was on a
single player (Morikawa, and then later, Jon Rahm). As a result that tournament is in the 4th percentile
for Excitement.
Another possible weakness of these metrics is that they assign equal weight to every part
of a tournament. However, that's one of the nice things about working with win probability: a given % change
should mean the same thing in terms of entertainment value regardless of what round we are in (whereas something like a stroke doesn't).
For example, a 20% jump in win probability can easily occur in the final round from a single birdie,
whereas a similar-sized move in the first round would require a much more unexpected
sequence of events (e.g. a quadruple bogey from the tournament leader).
Because win probabilities are more volatile and closer to 0/100 the later we are in a tournament,
the later stages of a tournament naturally have a much bigger impact on each of our metrics, even
though we aren't explicitly assigning more weight to them.