rounds are included in the data, with one exception: Round 1 of the 2020 PLAYERS Championship.
Four players failed to finish their final hole of the first round before the event was cancelled;
we assigned to these players the most likely score given their position on the hole.
The only tricky cases for determining round completion from our primary data sources are withdrawals and disqualifications.
It is not that easy to identify — in an automated fashion — whether a round
that resulted in a WD/DQ was completed or not.
Currently we manually vet the data update each week from PGA, EUR, CHA, CAN, SAM, and CHAMP tours to
ensure all completed rounds are included and all incomplete rounds are dropped.
Listed below are the time periods (inclusive of the listed date) for which we didn't rely on an algorithm to filter WD/DQ rounds:
EURO: 2020-10-11 — present
KFT: 2020-10-11 — present
CHA: 2020-11-14 — present
CAN: 2021-06-26 — present
SAM: 2020-10-09 — present
CHAMP: 2020-08-21 — present
For rounds not included in these date ranges or from a tour not listed above, we apply a simple algorithm to label each round as complete/incomplete.
If a player's tournament ends with a WD or DQ, only their last round played will be considered as a potentially incomplete round.
We then look at their strokes-gained for the day (or, more accurately, the strokes-gained implied
by their listed round score), and apply some basic filters:
if they withdraw after rounds 2-4 and gain more than 2 strokes on the field in their final round, we drop the round;
if they withdraw after round 1 and gain positive strokes on the field, we drop the round.
This is a pretty conservative filter, as we feel the cost of including an incomplete round
is higher than omitting a complete one.
All stroke-play tournaments are included (or the stroke-play portion of events with a Match Play component, e.g. 2019
ISPS Handa World Super 6 Perth).
For the PGA Tour, a few tournaments are included only in select years:
Reno-Tahoe (event_id=472) 2019 and later,
Zurich Classic (event_id=18) before it became a team event (2016 and earlier).
: Total score in a specific round.
sg_app, sg_arg, sg_ott, sg_putt
: Strokes-gained categories. Only available at Shotlink-equipped PGA Tour events.
The values reported here are directly pulled from the PGA Tour website. In theory, each SG category should have a mean
of zero by tournament-round-course (i.e. the PGA Tour subtracts off the mean in each category). This is almost always true,
however there are a few exceptions:
if a player completes their round and then withdraws / is DQ'd, sometimes (very rarely) their data from that
round is not included in the PGA Tour's SG calculation. The most consequential example of this is Round 2
of the 2021 Arnold Palmer Invitational: Robert Gamez fired a 92 but was then DQ'd, and as a result he is not included when
the SG categories are demeaned by the PGA Tour. Therefore, the strokes-gained categories have a mean of zero excluding Gamez
but won't when he is included (as in our data).
2021 Olympic Golf Competition. Shot-level tracking was not administered by the PGA Tour,
and the strokes-gained category data was not demeaned.
Ignoring the above cases, the 4 SG categories should add up to strokes-gained total (defined as the difference between a player's score and the
average score for that round and course). However, there are a small number of what appear to be mistakes on the part of the PGA Tour. One somewhat common
mistake is that when a player doesn't hit a shot around-the-green they are given a value of 0 for SG:ARG, meaning that the mean SG:ARG for the field was not subtracted off.
To fix this you can calculate sg_arg as the difference between sg_t2g and sg_ott+sg_app. We don't make this correction in the API data, as it's meant to be raw data. The remaining errors
(20-30 of them, all since the 2020 season) appear to be fairly idiosyncratic and the source of the discrepancy is not always clear. To see these for yourself, calculate the difference
between sg_total and the sum of sg_ott, sg_app, sg_arg, sg_putt.
There is a single instance (as of September 2021) where SG category data was not reported for a golfer in a round
that should have had Shotlink data:
Viktor Hovland at 2021 U.S. Open #2 (event_id=536) Round 1.
: Strokes-gained from tee to green. Defined as the sum of sg_ott, sg_app,
and sg_arg. The values here are directly pulled
from the PGA Tour website (i.e. we do not perform the calculation ourselves), which means we
can't guarantee the individual components always add up.
: Total strokes-gained. Calculated (by us)
as the difference between a player's score
and the average score for that round and course (for events with multiple courses). When strokes-gained
categories are available, this should be equal to the sum of sg_ott, sg_app, sg_arg, and sg_putt, but
for the reasons detailed above this will not quite be true in all cases.
: Player ID. There is a single dg_id for each player. Please notify us if you find a player that
has multiple dg_ids. Use dg_id when performing operations by player. Any changes we make
retroactively to a player's dg_id will be posted in the changelog.
: Player's name. Will not necessarily be the same for all data points for a
given player, although it should be.
Use dg_id instead of player_name when performing operations by player.
: Tournament ID. For PGA, KFT, SAM, and CAN tours, event_id is constant across years.
(However, note that in seasons where an event was played twice, e.g. the Masters in 2021, a new
tournament number is used for the second playing of the event). For all other tours, event_id changes by year.
Within a tour and season
, event_id uniquely identifies a tournament. Within tour and calendar year,
event_id will not uniquely identify a tournament.
: Event name. May change by year for any tour.
: Course ID. For PGA, KFT, SAM, and CAN, course_num is constant for a given course.
However, a course number may "change" if the course undergoes substantial changes.
For example, Pebble Beach Golf Links at the 2019 U.S. Open is assigned a different number
than its typical assignment for the AT&T Pro-Am. Here is the full list of PGA Tour courses with multiple IDs:
Muirfield Village (23, 893), Pebble Beach (5, 666), Quail Hollow (872, 241, 698),
Ridgewood (745, 873), Hamilton (694, 874), Liberty National (762, 886),
Bethpage Black (689, 880), Sea Island Plantation (231, 889),
TPC Four Seasons (19, 882), Chambers Bay (818, 100).
If you find a PGA Tour course that has multiple course numbers and is not listed above,
please notify us.
When performing operations by course on the above-listed tours, use course_num.
For all other tours, course_num varies by course-year. For tours other than
PGA, EUR, KFT, CHA, CAN, SAM, and CHAMP, course_num is not meaningful, and therefore multi-course
events are not distinguishable from single-course events.
: Course name. For the European Tour (EUR), we have made course_name
constant for a given course (i.e. spelling, naming convention is identical across years).
Use course_name when performing operations by course on the European Tour.
For all other tours, the course_name may not be constant for a given course. If you find a
course that has different values for course_name, please notify us.
: Official finishing position.
: Official season as defined by each tour.
: Official date of the final
round of the tournament (e.g. if the event is delayed 1 day
to a Monday, this date will still be that of the Sunday).