In
last week's Model Talk I discussed why modelling the tie
probability in head-to-head match-ups is important when the tie is offered as a separate
bet. This post discusses why modelling the tie is important even when ties are void.
In the early days of Data Golf, we would sometimes simulate tournaments or match-ups such that ties weren't possible.
There are many ways you could go about this, but one way is to think of golf scores as continuously distributed
(e.g. following a normal distribution). Because the probability of drawing any specific value from a continuous distribution
is zero, ties won't happen. This was nice in some sense because it ensured that our finish probabilities (e.g. Top 5 probabilities)
added up to the correct number (500%), as opposed to some slightly higher figure if ties were possible in the simulations.
But, why is 500% the "correct" number? It is correct when the probabilities are used to assess the value of bets
with dead-heat rules applied, in that Top 5
probabilities that account for dead-heat rules
should add up to 500%. However, there are many methods you can use
to ensure that your Top 5 probabilities add up to 500%, but that doesn't make them all equivalent.
In fact, simulating tournaments without ties will not yield the same expected value estimates
as simulating with ties and then applying dead-heat rules (although they should be very similar).
But now back to match-ups. When ties are void it also seems like simulating without ties is a
reasonable way to assess expected value.
Consider a match-up between Golfers 1 and 2,
where Golfer 1 wins with probability \( win_1 \), Golfer 2 wins with probability \( win_2 \),
and they tie with probability \( tie \).
Expected value on Golfer 1 is then equal to \( win_1 \cdot (odds_1-1) + win_2 \cdot (-1) + tie \cdot (0) \), where
\( odds_1 \) are Golfer 1's offered odds.
If you set this equation equal to zero and re-arrange, we find that \( 1/odds_1 = win_1/(win_1 + win_2) \).
We'll call the expression on the right the "break-even implied probability".
If the bookmaker's implied probability (\( 1/odds_1 \)) is below this break-even probability
it's a +EV bet; above, and it's a -EV bet. As an important aside to make sure you are with me,
note that the break-even implied probability from a simple bet (i.e. one that has only two possible outcomes, win/loss)
is equal to \( win_1 \), which is derived by setting the following expected value equation
to zero: \( win_1 \cdot odds_1 - 1 \), which yields \( 1/odds_1 = win_1 \). This is where
the idea of the "implied probability of a book's odds" comes from.
Up to this point I've undoubtedly succeeded in taking a simple concept and
making it complicated. But bear with me.
Suppose we run our simulations with ties allowed (e.g. by rounding the output from a normal distribution) and find
Golfer 1's win probability is 25%, Golfer 2's is 65%, and they tie 10% of the time.
With ties void, using the expression above, the break-even implied probability for Golfer 1 is equal to 27.8%.
One way to think about what happened here is we took the 10%
tie probability and assigned 27.8% of those ties as "wins" for
Golfer 1, and 72.2% as "wins" for Golfer 2.
That is, when calculating the break-even implied probability,
the tie probability was distributed
in proportion to the players' respective outright win probabilities.
If you
were to simulate this match-up without ties using a normal distribution, Golfer 1 would win something like 45%
of the simulations that resulted in a tie previously,
while Golfer 2 would win 55%. This yields a break-even implied probability for Golfer 1 of
\( 25\% + 0.45 \cdot 10\% = 29.5\% \). An intuitive way to think about this is that simulating without ties is like having a sudden-death playoff in the
event of a tie after 18 holes. Golfer 1 is the worse golfer so they will still win less than 50% of the playoffs,
but it will be much closer to 50-50 than their 18-hole win probability.
Finally, consider a match-up that is offered with dead-heat rules. Now, expected value will be equal to
\( win_1 \cdot (odds_1) + tie \cdot (odds_1/2) - 1 \). Setting to zero and re-arranging yields a break-even
implied probability of \( win_1 + tie/2 \). Plugging in our numbers from the simulations with ties yields a break-even
implied probability for Golfer 1 of 30%. Therefore with dead-heat rules, we take the 10% of tied simulations and assign 50%
of them as "wins" for Golfer 1.
I've shown how to estimate break-even implied probabilities in 3
different scenarios:
1) simulating with ties allowed and voiding those ties,
2) simulating without ties allowed (which means the tie rules are irrelevant), and
3) simulating with ties allowed and assigning pay outs according to dead-heat rules.
In our example above, with probabilities of 25%, 65%, and 10% for Golfer 1 win, Golfer 2 win, and Tie,
the estimates in the 3 cases were: 27.8%, 29.5%, and 30%.
Intuitively, Golfer 1, who is the worse golfer in this match-up, is disadvantaged when ties are voided as
they are only assigned "wins" at a rate proportional to their overall match-up win probability. When ties aren't
possible, that rate increases to some number closer to (but still below) 50%, and finally with dead-heat rules applied
to ties the rate equals 50%.
We correctly estimated break-even probabilities in cases 1) and 3); however the estimate in 2) is not valid,
for the obvious reason that the break-even probability obtained using that method
is the same whether the tie rules are void or dead-heats, but we know they should
be different. The important comparison is between methods 1) and 2) as these estimates
can be meaningfully different.
To start this post, I was careful with my wording by stating that
modelling the tie probability matters
for bets where ties
are void. I didn't say that the tie probability itself matters, because as was shown above, it doesn't: the
break-even implied probability for ties-void bets, \(win_1/(win_1 + win_2) \),
is not a function of the tie probability! However, the
decision on
how to model ties does matter: if you exclude them (e.g. by using a continuous distribution to model
golf scores) you will overestimate your break-even probabilities.
The more lopsided is the matchup, and the more frequent
ties actually occur, the more this modelling decision will matter.
None of this is particuarly surprising: the data-generating
process for golf scores only produces integer scores, so your simulations should only produce integer scores as well!