Evaluating the MLB Draft: Creating a Model
The MLB Draft allows for teams to add to their organization through a process of selecting players starting with the 1965 draft. In 2002, the Oakland Athletics draft was deemed the Moneyball Draft as the organization tried to use statistical analysis on the draft to draft the best possible players given a budget constraint, deeming Nick Swisher as the best player in the draft.
A team gets a player in their minor league system and once that player is added to the 40-man roster, the club has control for six full seasons, but given the incentives to withhold a player in the minor leagues to obtain an extra season before the player is free agent eligible, the metric looked here for the draft is WAR_7, or the average WAR (Fangraphs version) of the first seven seasons of a players career. If a player does not reach the Major Leagues, they are assigned -2.5 WAR_7, going along with the research done by Nate Werner of the now defunct The Point of Pittsburgh. Ideally, data would be available for the level a player reached, as a Triple-A player would be closer to being replacement level and a Low-A level player would be closer to -5 wins given their distance from the Major Leagues.
The draft data comes from Baseball-Reference’s draft tracker and was acquired using Chris Long’s web scraper in the Ruby language. For this study, we will look at the start of the century up through the year 2015, to allow time for players to reach the MLB. Average debut time was calculated based on school type (college or high school) and position type (pitcher vs positional), with these two inputs (school and position) also being included in the eventual model. The average time until debut (rounded to the nearest full year) for those that made the Major Leagues are:
|High School||5 years||5 years|
|Junior College||5 years||5 years|
|College||4 years||4 years|
The data only includes players that signed to simplify the model. To estimate the value that a player produces, the expected WAR, called WARE_7, needs to be calculated. This is an OLS regression of WAR_7 = log(Overall Pick) + Position + School, and the result is the WARE_7 of each player drafted between 2000-2015. Surplus WAR, sWAR_7, is the excess WAR_7 a player produces (if a player has a WAR_7 of 3.0 and a WARE_7 is 1.5, the player’s sWAR_7 is 1.5).
The goal of estimating value is in terms of financial value, which requires the use of the dollar/win framework, or the amount one win is valued at in the open market. All the dollar values used from this point will be in millions. Since time is dynamic and that value changes, the net present value (NPV) is calculated for the seven years a player is in the Major Leagues. If a player debuts in 2005, their average dollar/win is the NPV of the dollar/win for the years 2005-2011. For players who debuted where there are less than seven years, the dollar/win is the NPV up until 2019. The discount rate was used by looking at the change in dollar/win from 2019 and 2000, a 7.7 percent per year change (i.e. the discount rate is 7.7 percent). Those values are in the table below:
|Year||NPV Dollar/Win||Year||NPV Dollar/Win|
Dollar/Win estimated from Fangraphs
Surplus Value (SV) is then simply WAR_7*Dollar/Win, and is a measurement of the dollar value generated by a player. Since a team could take a different player, a comparison needs to be made and we need to know the Expected Surplus Value, or ESV. Using a loess model, which allows for the weighting of the surplus values generated by each pick. A player drafted 10th overall could have been drafted eighth, ninth, eleventh, twelfth, etc and since there is asymmetric information on the draft boards, weighting these points matters. The model is then SV = log(Overall Pick) to generate the ESV for each player drafted. Surplus Value Over Expectation (SVOE) is then SV-ESV, and provides the estimated value that a player has contributed to the team over the expected player taken at that draft position.
For an example, look at the Astros last two number one overall picks who signed: Mark Appel in 2013 and Carlos Correa in 2012. The table below illustrates the metrics above:
Carlos Correa, a high school positional player, was expected to provide the Astros with 1.06 WAR per season and provide a value of $7.54 million annually to the club over his first seven years. Mark Appel, a college pitcher, was expected to provide 1.19 wins a season over his first seven years in the MLB.
Correa has ended up producing 3.70 wins over his first five years after debuting in 2015, and helped the Astros win the 2017 World Series, hitting .288/.325/.562 in the playoffs. Mark Appel never made the Major Leagues and has since retired from the game of baseball. Correa has produced $17.75 million on average for the Astros, while Mark Appel has cost the Astros $45.58 million a year compared to what the expected player would provide, though Appel was eventually traded to the Philadelphia Phillies for Ken Giles.
All estimates for players can be found in the MLB Draft tab of the site. Before the results is a quick explanation of the model and what the variables mean. The results feature two tabs: a player draft value tab for each player and a team tab to sum up the results for each draft class. Teams produce negative WAR_7 for all years just given the volume of players who don’t reach the MLB. To assess for how well a team’s draft class is, the sWAR_7 and SVOE columns are the most important.