Raw Data Archive Notes & UpdatesData Archive Notes
UPDATED —
NIGHT MODE
DAY MODE
API Changelog
2022-11-01
"course_name" variable is now consistent across European and Challenge Tours.
2022-10-10
There was a bug with the traditional stats (DD, DA, GIR, proximity, scrambling) for the first two events of the 2023 PGA Tour season; fixed now.
2022-08-03
Fixed a few bugs in The Open Championship strokes-gained data.
2022-06-19
Added course par variable. Available for pga, eur, kft, cha, can, sam, champ, and liv tours.
2022-04-18
Replaced SG category data in Round 1 of the 2022 Valero Texas Open with our own SG calculated from the shot-level data. There were mistakes in the PGA Tour's data for this round; we will put it back if/when they fix the issue.
2022-02-08
Added 6 new variables to PGA Tour data (when available): driving distance, driving accuracy, GIR, scrambling percentage, fairway proximity, and rough proximity.
2021-11-22
Changed course_name variable for European Tour events involving Randpark GC (event_id = 2020103, 2020006, 2019004, 2017098) to correct an inconsistency between the events involving a single course and events with multiple courses.
2021-10-29
Added 2021 Masters strokes-gained category data. These SG numbers are generated using our own baseline functions (but are designed to be similar to what the PGA Tour uses).
2021-08-16
Updated the filtering method for WDs and DQs. Affects all events before the tour-specific date cutoffs listed in the general notes below.
2021-08-09
Corrected course_name at 2021 ISPS HANDA World Invitational (event_id=2021124) to account for multiple courses. This was a bug on our end.
Notes / Comments
General Notes
Only completed rounds are included in the data, with a few exceptions:
Round 1 of the 2020 PLAYERS Championship. Four players failed to finish their final hole of the first round before the event was cancelled; we assigned to these players the most likely score given their position on the hole.
Rounds 1&2 of the 2022 Joburg Open (eventid=2022100). Due to the sudden travel restrictions imposed on South Africa, 6 golfers withdrew after playing at least 15 holes in their last round. We assigned to these players the most likely score on their remaining holes.
The only tricky cases for determining round completion from our primary data sources are withdrawals and disqualifications. It is not that easy to identify — in an automated fashion — whether a round that resulted in a WD/DQ was completed or not. Currently we manually vet the data update each week from PGA, EUR, CHA, CAN, SAM, and CHAMP tours to ensure all completed rounds are included and all incomplete rounds are dropped. Listed below are the time periods (inclusive of the listed date) for which we didn't rely on an algorithm to filter WD/DQ rounds:
PGA: full time period
EURO: 2020-10-11 — present
KFT: 2020-10-11 — present
CHA: 2020-11-14 — present
CAN: 2021-06-26 — present
SAM: 2020-10-09 — present
CHAMP: 2020-08-21 — present
For rounds not included in these date ranges or from a tour not listed above, we apply a simple algorithm to label each round as complete/incomplete. If a player's tournament ends with a WD or DQ, only their last round played will be considered as a potentially incomplete round. We then look at their strokes-gained for the day (or, more accurately, the strokes-gained implied by their listed round score), and apply some basic filters: if they withdraw after rounds 2-4 and gain more than 2 strokes on the field in their final round, we drop the round; if they withdraw after round 1 and gain positive strokes on the field, we drop the round. This is a pretty conservative filter, as we feel the cost of including an incomplete round is higher than omitting a complete one.
All stroke-play tournaments are included (or the stroke-play portion of events with a Match Play component, e.g. 2019 ISPS Handa World Super 6 Perth). For the PGA Tour, a few tournaments are included only in select years: Reno-Tahoe (event_id=472) 2019 and later, Zurich Classic (event_id=18) before it became a team event (2016 and earlier).
Data Dictionary
sg_categories: Only shown in JSON format. (Also listed in the Historical Raw Data Event IDs endpoint). Value of "yes" indicates that SG category data is available for all rounds; "partial" indicates that SG category data is only available for some rounds; "no" indicates that SG category data is not available in any round.
traditional_stats: Only shown in JSON format. (Also listed in the Historical Raw Data Event IDs endpoint). Value of "yes" indicates that all traditional stats (DD, DA, GIR, Scrambling, Proximity) are available for all rounds and are derived from shot-level data (see details on specific variables below); "partial" indicates that traditional stats derived from shot-level data are only available for some of the rounds; "basic" indicates that only some of the traditional stats (DD, DA, GIR) are available and they are not derived from shot-level data but instead use PGA Tour definitions (see details on specific variables below); "no" indicates that traditional stats are not available in any round.
round_score: Total score in a specific round.
sg_app, sg_arg, sg_ott, sg_putt: Strokes-gained categories. Only available at Shotlink-equipped PGA Tour events. The values reported here are directly pulled from the PGA Tour website. In theory, each SG category should have a mean of zero by tournament-round-course (i.e. the PGA Tour subtracts off the mean in each category). This is almost always true, however there are a few exceptions:
if a player completes their round and then withdraws / is DQ'd, sometimes (very rarely) their data from that round is not included in the PGA Tour's SG calculation. The most consequential example of this is Round 2 of the 2021 Arnold Palmer Invitational: Robert Gamez fired a 92 but was then DQ'd, and as a result he is not included when the SG categories are demeaned by the PGA Tour. Therefore, the strokes-gained categories have a mean of zero excluding Gamez but won't when he is included (as in our data).
2021 Olympic Golf Competition. Shot-level tracking was not administered by the PGA Tour, and the strokes-gained category data was not demeaned.
Ignoring the above cases, the 4 SG categories should add up to strokes-gained total (defined as the difference between a player's score and the average score for that round and course). However, there are a small number of what appear to be mistakes on the part of the PGA Tour. One somewhat common mistake is that when a player doesn't hit a shot around-the-green they are given a value of 0 for SG:ARG, meaning that the mean SG:ARG for the field was not subtracted off. To fix this you can calculate sg_arg as the difference between sg_t2g and sg_ott+sg_app. We don't make this correction in the API data, as it's meant to be raw data. The remaining errors (20-30 of them, all since the 2020 season) appear to be fairly idiosyncratic and the source of the discrepancy is not always clear. To see these for yourself, calculate the difference between sg_total and the sum of sg_ott, sg_app, sg_arg, sg_putt.
There is a single instance (as of September 2021) where SG category data was not reported for a golfer in a round that should have had Shotlink data:
Viktor Hovland at 2021 U.S. Open #2 (event_id=536) Round 1.
sg_t2g: Strokes-gained from tee to green. Defined as the sum of sg_ott, sg_app, and sg_arg. The values here are directly pulled from the PGA Tour website (i.e. we do not perform the calculation ourselves), which means we can't guarantee the individual components always add up.
sg_total: Total strokes-gained. Calculated (by us) as the difference between a player's score and the average score for that round and course (for events with multiple courses). When strokes-gained categories are available, this should be equal to the sum of sg_ott, sg_app, sg_arg, and sg_putt, but for the reasons detailed above this will not quite be true in all cases.
driving_dist: Driving distance. When shot-level data is available (traditional_stats='yes' or traditional_stats='partial'), this equals the average distance of all drives on Par 4s and 5s (reloads included). When there is no shot-level data (traditional_stats='basic'), this equals the average of the two measured drives (selected by the PGA Tour) in each round.
driving_acc: Driving accuracy. When shot-level data is available (traditional_stats='yes' or traditional_stats='partial'), this equals the percentage of fairways hit counting intermediate rough, fringe and greens as fairway; reloads are not included. When there is no shot-level data (traditional_stats='basic'), this equals the percentage of fairways hit not counting intermediate rough as fairway.
gir: Greens in regulation. When shot-level data is available (traditional_stats='yes' or traditional_stats='partial'), this equals the percentage of greens hit in regulation counting fringes; when there is no shot-level data (traditional_stats='basic'), fringes in regulation are not counted as greens hit.
scrambling: Scrambling percentage. Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial'). Equal to the percentage of around-the-green shots from 50 yards and in that were holed out in 2 strokes or less. Shots deemed to be chip outs are excluded.
prox_rgh: Rough proximity. Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial'). Equal to the average proximity of all shots hit from locations other than the fairway or intermediate rough from a distance greater than 50 yards. Shots deemed to be layups or punch outs are excluded.
prox_fw: Fairway proximity. Only available at events with shot-level data (traditional_stats='yes' or traditional_stats='partial'). Equal to the average proximity of all shots hit from the fairway (par 3 tee shots included) or intermediate rough from a distance greater than 50 yards. Shots deemed to be layups or punch outs are excluded.
dg_id: Player ID. There is a single dg_id for each player. Please notify us if you find a player that has multiple dg_ids. Use dg_id when performing operations by player. Any changes we make retroactively to a player's dg_id will be posted in the changelog.
player_name: Player's name. Will not necessarily be the same for all data points for a given player, although it should be. Use dg_id instead of player_name when performing operations by player.
event_id: Tournament ID. For PGA, KFT, SAM, and CAN tours, event_id is constant across years. (However, note that in seasons where an event was played twice, e.g. the Masters in 2021, a new tournament number is used for the second playing of the event). For all other tours, event_id changes by year. Within a tour and season, or within a tour and calendar year, event_id uniquely identifies a tournament.
event_name: Event name. May change by year for any tour.
course_num: Course ID. For PGA, KFT, SAM, and CAN, course_num is constant for a given course. However, a course number may "change" if the course undergoes substantial changes. For example, Pebble Beach Golf Links at the 2019 U.S. Open is assigned a different number than its typical assignment for the AT&T Pro-Am. Here is the full list of PGA Tour courses with multiple IDs:
Muirfield Village (23, 893), Pebble Beach (5, 666), Quail Hollow (872, 241, 698), Ridgewood (745, 873), Hamilton (694, 874), Liberty National (762, 886), Bethpage Black (689, 880), Sea Island Plantation (231, 889), TPC Four Seasons (19, 822), Chambers Bay (818, 100), Liberty National (762, 886), Torrey Pines South Course (4, 744), Keene Trace (823, 884), Winged Foot (502, 891).
If you find a PGA Tour course that has multiple course numbers and is not listed above, please notify us.
When performing operations by course on the above-listed tours, use course_num. For all other tours, course_num varies by course-year. For tours other than PGA, EUR, KFT, CHA, CAN, SAM, and CHAMP, course_num is not meaningful, and therefore multi-course events are not distinguishable from single-course events.
course_name: Course name. For the European Tour (EUR) and Challenge Tour (CHA), we have made course_name constant for a given course (i.e. spelling, naming convention is identical across years and tours). Use course_name when performing operations by course on the European and/or Challenge tours. For all other tours, the course_name variable may not be constant for a given course. If you find a course that has different values for course_name (on EUR or CHA), please notify us.
course_par: Course par. Available for PGA, EUR, KFT, CHA, CAN, SAM, CHAMP, and LIV tours.
fin_text: Official finishing position.
season: Official season as defined by each tour.
year: Calendar year.
event_completed: Official date of the final round of the tournament (e.g. if the event is delayed 1 day to a Monday, this date will still be that of the Sunday).
tour: Professional tour that the event was played on.