General Notes
Only
completed rounds are included in the data, with a few exceptions:
Round 1 of the 2020 PLAYERS Championship.
Four players failed to finish their final hole of the first round before the event was cancelled;
we assigned to these players the most likely score given their position on the hole.
Rounds 1&2 of the 2022 Joburg Open (eventid=2022100).
Due to the sudden travel restrictions imposed on South Africa, 6 golfers withdrew after
playing at least 15 holes in their last round. We assigned to these players the most likely score on their remaining holes.
The only tricky cases for determining round completion from our primary data sources are withdrawals and disqualifications.
It is not that easy to identify — in an automated fashion — whether a round
that resulted in a WD/DQ was completed or not.
Currently we manually vet the data update each week from PGA, EUR, CHA, CAN, SAM, and CHAMP tours to
ensure all completed rounds are included and all incomplete rounds are dropped.
Listed below are the time periods (inclusive of the listed date) for which we didn't rely on an algorithm to filter WD/DQ rounds:
EURO: 2020-10-11 — present
KFT: 2020-10-11 — present
CHA: 2020-11-14 — present
CAN: 2021-06-26 — present
SAM: 2020-10-09 — present
CHAMP: 2020-08-21 — present
For rounds not included in these date ranges or from a tour not listed above, we apply a simple algorithm to label each round as complete/incomplete.
If a player's tournament ends with a WD or DQ, only their last round played will be considered as a potentially incomplete round.
We then look at their strokes-gained for the day (or, more accurately, the strokes-gained implied
by their listed round score), and apply some basic filters:
if they withdraw after rounds 2-4 and gain more than 2 strokes on the field in their final round, we drop the round;
if they withdraw after round 1 and gain positive strokes on the field, we drop the round.
This is a pretty conservative filter, as we feel the cost of including an incomplete round
is higher than omitting a complete one.
All stroke-play tournaments are included (or the stroke-play portion of events with a Match Play component, e.g. 2019
ISPS Handa World Super 6 Perth).
For the PGA Tour, a few tournaments are included only in select years:
Reno-Tahoe (event_id=472) 2019 and later,
Zurich Classic (event_id=18) before it became a team event (2016 and earlier).
Data Dictionary
sg_categories: Only shown in JSON format. (Also listed in the Historical Raw Data Event IDs
endpoint). Value of "yes" indicates that SG category data
is available for all rounds; "partial" indicates that SG category data is only available for some rounds;
"no" indicates that SG category data is not available in any round.
traditional_stats: Only shown in JSON format. (Also listed in the Historical Raw Data Event IDs
endpoint). Value of "yes" indicates that all traditional stats (DD, DA, GIR,
Scrambling, Proximity) are available for all rounds and are derived from shot-level data (see details on specific variables below);
"partial" indicates that traditional stats derived from shot-level data are only available for some of the rounds; "basic" indicates
that only some of the traditional stats (DD, DA, GIR) are available and they are not derived from shot-level data but instead use
PGA Tour definitions (see details on specific variables below); "no" indicates that traditional stats are not available in any round.
round_score: Total score in a specific round.
sg_app, sg_arg, sg_ott, sg_putt: Strokes-gained categories. Only available at Shotlink-equipped PGA Tour events.
The values reported here are directly pulled from the PGA Tour website. In theory, each SG category should have a mean
of zero by tournament-round-course (i.e. the PGA Tour subtracts off the mean in each category). This is almost always true,
however there are a few exceptions:
if a player completes their round and then withdraws / is DQ'd, sometimes (very rarely) their data from that
round is not included in the PGA Tour's SG calculation. The most consequential example of this is Round 2
of the 2021 Arnold Palmer Invitational: Robert Gamez fired a 92 but was then DQ'd, and as a result he is not included when
the SG categories are demeaned by the PGA Tour. Therefore, the strokes-gained categories have a mean of zero excluding Gamez
but won't when he is included (as in our data).
2021 Olympic Golf Competition. Shot-level tracking was not administered by the PGA Tour,
and the strokes-gained category data was not demeaned.
Ignoring the above cases, the 4 SG categories should add up to strokes-gained total (defined as the difference between a player's score and the
average score for that round and course). However, there are a small number of what appear to be mistakes on the part of the PGA Tour. One somewhat common
mistake is that when a player doesn't hit a shot around-the-green they are given a value of 0 for SG:ARG, meaning that the mean SG:ARG for the field was not subtracted off.
To fix this you can calculate sg_arg as the difference between sg_t2g and sg_ott+sg_app. We don't make this correction in the API data, as it's meant to be raw data. The remaining errors
(20-30 of them, all since the 2020 season) appear to be fairly idiosyncratic and the source of the discrepancy is not always clear. To see these for yourself, calculate the difference
between sg_total and the sum of sg_ott, sg_app, sg_arg, sg_putt.
There is a single instance (as of September 2021) where SG category data was not reported for a golfer in a round
that should have had Shotlink data:
Viktor Hovland at 2021 U.S. Open #2 (event_id=536) Round 1.
sg_t2g: Strokes-gained from tee to green. Defined as the sum of sg_ott, sg_app,
and sg_arg. The values here are directly pulled
from the PGA Tour website (i.e. we do not perform the calculation ourselves), which means we
can't guarantee the individual components always add up.
sg_total: Total strokes-gained. Calculated (by us)
as the difference between a player's score
and the average score for that round and course (for events with multiple courses). When strokes-gained
categories are available, this should be equal to the sum of sg_ott, sg_app, sg_arg, and sg_putt, but
for the reasons detailed above this will not quite be true in all cases.
driving_dist: Driving distance. When shot-level data is available (traditional_stats='yes' or traditional_stats='partial'),
this equals the average distance of all drives on Par 4s and 5s (reloads included). When there is no shot-level data (traditional_stats='basic'),
this equals the average of the two measured drives (selected by the PGA Tour) in each round.
driving_acc: Driving accuracy. When shot-level data is available (traditional_stats='yes' or traditional_stats='partial'),
this equals the percentage of fairways hit counting intermediate rough, fringe and greens as fairway; reloads are not included.
When there is no shot-level data (traditional_stats='basic'),
this equals the percentage of fairways hit not counting intermediate rough as fairway.
gir: Greens in regulation. When shot-level data is available (traditional_stats='yes' or traditional_stats='partial'),
this equals the percentage of greens hit in regulation counting fringes; when there is no shot-level data (traditional_stats='basic'),
fringes in regulation are not counted as greens hit.
scrambling: Scrambling percentage. Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial').
Equal to the percentage of around-the-green shots from 50 yards and in that were holed out in 2 strokes or less. Shots deemed to be chip outs are excluded.
prox_rgh: Rough proximity. Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial').
Equal to the average proximity of all shots hit from locations other than the fairway or intermediate rough from a distance greater than 50 yards.
Shots deemed to be layups or punch outs are excluded.
prox_fw: Fairway proximity. Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial').
Equal to the average proximity of all shots hit from the fairway (par 3 tee shots included) or intermediate rough from a distance greater than 50 yards.
Shots deemed to be layups or punch outs are excluded.
great_shots: Great shots.
Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial').
Equal to the number of "great" shots in the round, where a great shot is defined as the top 5% of strokes-gained values in each
category. The strokes-gained cutoffs by category are OTT: 0.3, APP: 0.55, ARG: 0.55, PUTT: 0.65. SG values are adjusted at the hole-level.
poor_shots: Poor shots.
Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial').
Equal to the number of "bad" shots in the round, where a bad shot is defined as the bottom 5% of strokes-gained values in each
category. The strokes-gained cutoffs by category are OTT: -0.4, APP: -0.5, ARG: -0.6, PUTT: -0.5. SG values are adjusted at the hole-level.
dg_id: Player ID. There is a single dg_id for each player. Please notify us if you find a player that
has multiple dg_ids. Use dg_id when performing operations by player. Any changes we make
retroactively to a player's dg_id will be posted in the changelog.
player_name: Player's name. Will not necessarily be the same for all data points for a
given player, although it should be.
Use dg_id instead of player_name when performing operations by player.
teetime: Player's tee time. In the case of a weather delay changing tee times, we use the updated times. Within
a tournament-round-course, teetime + start_hole identify groupings.
start_hole: Player's starting hole.
event_id: Tournament ID. For PGA, KFT, SAM, and CAN tours, event_id is constant across years.
(However, note that in seasons where an event was played twice, e.g. the Masters in 2021, a new
tournament number is used for the second playing of the event). For all other tours, event_id changes by year.
Within a tour and season, or within a tour and calendar year, event_id uniquely identifies a tournament.
event_name: Event name. May change by year for any tour.
course_num: Course ID. For PGA, KFT, SAM, and CAN, course_num is constant for a given course.
However, a course number may "change" if the course undergoes substantial changes.
For example, Pebble Beach Golf Links at the 2019 U.S. Open is assigned a different number
than its typical assignment for the AT&T Pro-Am. Here is the full list of PGA Tour courses with multiple IDs:
Muirfield Village (23, 893), Pebble Beach (5, 666), Quail Hollow (872, 241, 698),
Ridgewood (745, 873), Hamilton (694, 874), Liberty National (762, 886),
Bethpage Black (689, 880), Sea Island Plantation (231, 889),
TPC Four Seasons (19, 822), Chambers Bay (818, 100),
Liberty National (762, 886), Torrey Pines South Course (4, 744),
Keene Trace (823, 884), Winged Foot (502, 891), TPC Craig Ranch (894, 921), Oak Hill Country Club (558, 514),
Silverado Resort North Course (552, 926).
If you find a PGA Tour course that has multiple course numbers and is not listed above,
please notify us.
When performing operations by course on the above-listed tours, use course_num.
For all other tours, course_num varies by course-year. For tours other than
PGA, EUR, KFT, CHA, CAN, SAM, and CHAMP, course_num is not meaningful, and therefore multi-course
events are not distinguishable from single-course events.
course_name: Course name. For the European Tour (EUR) and Challenge Tour (CHA),
we have made course_name constant for a given course (i.e. spelling, naming convention is identical across years
and tours).
Use course_name when performing operations by course on the European and/or Challenge tours.
For all other tours, the course_name variable may not be constant for a given course. If you find a
course that has different values for course_name (on EUR or CHA), please notify us.
course_par: Course par. Available for PGA, EUR, KFT, CHA, CAN, SAM, CHAMP, and LIV tours.
fin_text: Official finishing position.
season: Official season as defined by each tour.
event_completed: Official date of the final
round of the tournament (e.g. if the event is delayed 1 day
to a Monday, this date will still be that of the Sunday).
tour: Professional tour that the event was played on.