Sports

Using Projection Models for 2015 Fantasy Baseball Drafts

If you’ve ever participated in a fantasy draft of any kind, you’re familiar with the concept of projections. Projections, they’re (almost) as simple as they sound. What do you project a given player to accomplish based on his past accomplishments? Projections are not restricted to the realm of fantasy sports however; teams utilize projections as well, to assist them in player valuation. In this post I’ll explain how you can use projections for player valuation for your own fantasy baseball league, using a model based either on Standing Points Gained above replacement, or Fantasy Points above replacement, depending upon the scoring system of the league [1]. In addition, I’ll be posting the R code used to perform the models, as well as Draft Cheat Sheets containing relevant draft info from many of the sources we searched.

5X5Roto10TeamDraftsheet

5x5Roto12TeamDraftsheet

Points10TeamDraftsheet

Points12TeamDraftsheet

Special Request (AL-only 4×4 10 Team no R or K): 4x4ALonly10Team

Update(3/21/2015):The projection database and draftsheets have all been updtaed, and the R code used to calculate TOTspgAR and FPtsAR has been posted. Enjoy.

I’ve compiled a database of as many publicly available projection systems as I could find
(ESPN, Yahoo, CBS, Steamer, Razzball, Zips, Clay Davenport’s and TG), and used this database to produce my own projection system, which I’ll refer to as the Duffy model (You’re right, I suck at naming things) [2-9]. I will have a later piece outlining how the various projection systems go about their business (tentatively titled: “99 Projections, But a Glitch in Each One?”), for now though, I’ll say the Duffy model takes the average of all the model’s (except TG as noted below) rate statistics (H-rate, BB-rate, etc.) for each player and projects them over the average number of plate appearances, or innings pitched for pitchers, predicted by the models [10]. In addition, as the Zips model makes no attempt to project playing time in the major leagues, we modified it by applying its rate based statistics over the average projected playing time from the other models, except the TG model for reasons explained below [11].

The Yahoo dataset used here is significantly smaller than the other datasets (approximately 300 players vs 900 players). The players it projects for are the top players, i.e. the ones receiving the most plate appearances, thus accounting for its skewed distribution shown below. This is an observer bias caused by my refusal to manually scrape projections for 1000 hitters and pitchers, though the supremely thorough among you might call this laziness (If you know how to programmatically scrape off Yahoo fantasy sports please let me know. I’m pretty sure I’m just running into the Great Pay Wall of RotoWire). On another note, the TG model has a curiously flat density function while every other model is clearly bimodal, so I’m somewhat skeptical of it. For instance, TG projects Kolten Wong to have a mere 303 PA, while each other model has him projected for over 500 PA [12]. The Duffy model closely matches the other relatively large datasets, Steamer and ModZips, in terms of projected plate appearances, as well as the other projected statistics. The Duffy model projections that will be used here to generate Fantasy Baseball values are also attached in raw form. The projected plate appearance distribution density functions for each model are plotted below in Figure 1, where the Duffy model is shown with increased line width (if you know how to display this increased line width in the legend using the ggplot2 R package, please let me know!).

Graph image

Figure 1: Projected PA Population Density by Model

Standing Points Gained – Theory

One of the most common fantasy baseball formats is known as rotisserie, where each team is scored in a number of hitting and pitching categories, often 5 each [13]. The team that scores the highest (or lowest for ERA and WHIP) in each category receives the maximum Standing Points, which is equivalent to the number of teams in the league, while the team that scores the least in each category scores 1 Standing Point. Each team is then ranked based on the sum total of its Standing Points from each stat category. The question in player valuation then becomes, how valuable is a HR vs a SB? A SV vs a W? Or even ERA vs a W?

In order to evaluate the comparative worth of one statistic versus another, we must see how each stat correlates with Standings Points, or mathematically speaking, determine the slope of the function Team Stat vs Standings Points Earned. As it was easiest to find data for previous league standings from my own fantasy league, which goes by the acronym VSNIC, such a plot is shown below for that league’s run scoring history in Figure 2. It is interesting to note what appears to be a decrease in the scoring environment from 2011 to 2014 (in 2009 and 2010 the league was fewer than 10 teams), perhaps further evidence that it’s a good time to be a pitcher.

article1b

Figure 2: VSNIC Runs vs Standing Points Earned

Similar functions can be described for each of the other statistics (R, HR, RBI, SB, AVG, W, SV, K, ERA, WHIP in standard 5×5 scoring). For the SPG model on the attached draft sheets, the slope values were taken from Razzball, and are shown below in Table 1 [14]. (Note: If performing this modeling yourself, it is always best to use your own league’s scoring history, if possible.)

article1c

Table 1: Razzball Standing Pint Gained Slopes

Standing Points Gained – Calculation

Using the slope of the simple linear model shown above, it can be seen that on average an additional 26.9 runs scored will result in one standings point gained (in VSNIC’s format). Then a player such as Mike Trout, who we project to score 110 runs in the season, will be worth 4.1 standings points, in runs alone. The same calculation, shown below, can be performed for each of a player’s counting statistics to determine the standings points gained for each counting statistic category.

article1l

The calculation for rate statistics such as AVG and ERA, is slightly more complicated however, as each player’s contribution to the team’s rate is weighted by both his and his team’s number of attempts, either AB or IP. In order to determine a given player’s SPG for a rate statistic we must determine how the addition of his stats to the stats of an average team composed of one too few players (or mathematically speaking, n-1) compares to the fantasy league average rate statistic. This concept is perhaps demonstrated best in the equations shown below for slugging percentage standings points gained, again taking Mike Trout and the settings from 10-team VSNIC as our example (VSNIC uses OBP and SLG, not AVG or HR), where bars over variables indicate the average of the given variable. Mike Trout’s projected 322 TB in 584 AB would shift a team composed of n-1 average players, from the league average SLG of 0.436 to a SLG of 0.444, meaning he is responsible for approximately 1.4 standings points gained for SLG.

article1d

By summing over the standings points gained for each statistical category we can obtain a single metric of total standings points gained for each player. This is again illustrated below using Mike Trout as the example player, with VSNIC scoring.

article1e

Throughout this discussion we’ve used offensive statistics as the examples, however similar computations are possible for each pitching statistic by again properly distinguishing between counting and rate statistics.

Replacement Level – Theory

At this point we’ve created a single metric by which to value every player in MLB whom we have predicted stats for. This is a powerful tool, but not a tool by which you should draft your lineup, as this metric neglects to account for some things, notably position scarcity. Every position’s hitters were not created equal; it is much easier to find a “good” hitter at 1B or OF, as these are positions where teams can hide good hitters who struggle defensively. Accounting for this position scarcity requires us to determine replacement level for each position, in a way that is similar to Wins Above Replacement, or WAR. Think of replacement level as “that guy” you can pick up on the waiver wire the day after the draft, Matthew Berry once termed this “The Wandy Line” [15].

In fantasy baseball, the first player at the position who won’t get drafted, based upon the number of teams and the roster settings of the fantasy league, defines replacement level for each position. This can be slightly confusing to wrap your head around, so we’ll illustrate using standard 10-team ESPN roster settings (1x C, 1x 1B, 1x 2B, 1x 3B, 1x SS, 1x 2B/SS, 1x 1B/3B, 5x OF, 1x Util, 9x P, 3x Bench). With 10 teams each needing exactly one C the result is simple, replacement level is the 11th C as ranked by Total Standings Points Gained. For 2B and SS however, each team requires 1x 2B, 1x SS and 1x 2B/SS, therefore, assuming an even split of the additional mixed roster spot between 2B and SS, replacement level is the 16th 2B or SS.

Before we define replacement level for 1B, 3B, or OF we must determine how to distribute the bench slots between hitters and pitchers. For this analysis we determined replacement level by allotting 2 bench slots to hitters and 1 bench slot to pitchers, leaving us with 1x Util and 2x BN slots to distribute to 1B, 3B, and OF. Our model allots 1.5 of these 3 slots to OF, 0.3 to 3B and 1.2 to 1B (I’ll explain how I got these values in a bit). This gives us (5 + 1.5) * 10 = 65 OF drafted, or 66th OF as replacement level. Splitting the corner infield spot evenly among 1B and 3B, gives us (1 + 0.5 + 0.3) * 10 = 18 3B, or 19th 3B as replacement level, and finally (1 + 0.5 + 1.2) * 10 = 27 1B, or 28th 1B as replacement level. The levels of replacement that were used to compile the attached draftsheets are summarized in Table 2 below.

article1f

Table 2: Position Replacement LEvel

These replacement levels are by no means intended as definitive, as the distribution of utility, mixed position and bench spots is left to the modeler’s discretion. The rankings produced using these levels gave me the “looks right” vibe, however tuning these levels is an iterative process.

The quantitative approach that was used in this iteration was to examine the replacement level statistics for each position as shown in Table 3 in the next section. Notice how we’ve tuned replacement level total standing points gained for each position, other than C and SS (which are exceptions in hitting ability due to the large defensive requirements), to be approximately the same level.

Standing Points Gained Above Replacement – Calculation

Now that we’ve defined replacement level for each position, we can calculate the standings points gained in each stat category that we might expect to be able to pick up off the waiver wire for a given position. We determined each position’s replacement level standing points gained for each category by taking the local average around replacement level, taking the two players just above and below replacement level. This is shown below for catcher’s replacement level runs standing points gained in a 10-team format. Values for each position’s replacement level standing points gained in each category of a 5×5 rotisserie 10-team format is shown below in Table 3 (RepPos of NA is used to account for players lacking position information, where Repspg in each category was taken as average of all positions).

article1g

Table 3: Replacement Level SPG by Statistic and Position (10-Team Roto Scoring)

article1h

Performing this calculation for each statistic and position allows us to compare each player’s standing points gained in each category to that of a replacement player, i.e. his value above replacement. We show this calculation taking Buster Posey’s projected 69 R, good for 2.79 Rspg, as our example to compare against a replacement catcher.

article1j

We now have a single metric measuring each player’s total standing points gained above his hypothetical replacement, and this is the quantity we should base our draft decisions around.

Points Leagues

Points leagues, an alternative scoring system for fantasy baseball, present us with a simpler analysis to determine each player’s comparative worth [16]. Rather than having to determine how valuable each type of statistic is, there is only one thing that counts: Points. In order to account for position scarcity, replacement level must be determined and accounted for, in the same fashion as above for rotisserie scoring. The roster settings and thus replacement levels used in our analysis are the same as used above, while the points scoring settings are shown below in Table 4.

Table 4: Standard Points League Scoring Setting

article1k

Draftsheets

5X5Roto10TeamDraftsheet

5x5Roto12TeamDraftsheet

Points10TeamDraftsheet

Points12TeamDraftsheet

Special Request (AL-only 4×4 10 Team no R or K): 4x4ALonly10Team

The Draft Sheets attached should be fairly self-explanatory, though we’ll explain what all the data means and where it came from. They are sorted by either total standing points gained above replacement (TOTspgAR) or total fantasy points above replacement (TOTFPtsAR), depending on the scoring type.

Take note that each draft sheet’s value above replacement transitions from positive to negative values as you cross the “draftable” threshold, i.e. “The Wandy Line” (251 for 10-team leagues and 301 for 12-team). The draftable threshold on each sheet is highlighted in red, and the bordering indicates the projected round structure of a snake-style draft, that is a non-auction draft.

If performing this modeling yourself, you should check the player values around the draftable threshold to see if you screwed up, if it doesn’t go from positive to negative near the correct threshold, start debugging.

  • nameLast : Not going to explain…
  • nameFirst : Nor this.
  • teamID : Team info was taken from Zips projections, then converted to Lahman Database’s format
  • ESPNPos : Position eligibility according to ESPN Projections
  • YahooPos : Position eligibility according to Yahoo Projections
  • Hand : Player’s handedness (L, R, or B for switch hitters). Taken from Zips projections. No we didn’t accommodate switch pitchers [17].
  • Age : How many trips around the sun the player has experienced. Taken from Zips.
  • ESPN : Ranking in ESPN’s Top 300 players (if available) [18]
  • Yahoo : Ranking according to Yahoo Projections (if available)
  • ESPNKp : Ranking according to ESPN’s Tristan Cockcroft’s Keeper Rankings (if available) [19]
  • ESPNPosRnk : ESPN’s position ranking (only available for players that appear in the ESPN Top 300 rankings)
  • SPGPosRnk (FPtsPosRnk) : Position rank according to TOTspgAR or TOTFPtsAR model
  • TOTspgAR (TOTFptsAR) : ….you should reread the above.
  • Statistics Projected by Duffy Model
  • Rotoisserie Format
  • Hitting Stats (H, R, RBI, SB, AVG)
  • Pitching Stats (W, SV, K, ERA, WHIP)
  • Points Format
  • Hitting Stats (TB, R, RBI, SB, BB, K, TOTFPts)
  • Pitching Stats (IP, W, SV, H, K, BB, ER, TOTFPts)

R Coding

I only recently began teaching myself how to code in R, so I’m sure there are better ways to go about a lot of this, and any suggestions are more than welcomed.

Github Code

Concluding Remarks

Although we think we’ve put together a pretty good model, it is by no means perfect. When determining replacement level this model requires defining a single position for each player and it has trouble accounting for multi-position eligibility, though in truth each player should be judged as the “weakest hitting” position they’re eligible at [20]. This becomes difficult to incorporate, and somewhat subjective as the “weakest” hitting position is affected by the setting of replacement level, so we’ve simply used the positions listed by Zips in our modeling. Therefore, when valuing uber-utility men Ben Zobrist or Josh Harrison, you may want to make note of their Zips position, as well as where they are eligible in your league, and potentially adjust their ranking accordingly. In addition, in order to define a position for DH’s, I’ve lumped them in with the 1B, though this is a relatively minor issue as only a handful of DH’s are relevant anyway.

The standing points gained model used here has an additional oversimplification. It has assumed that the stats a player accrues are solely used to increase the team’s standing points, which is not entirely true [21]. Each stat category has a varying amount of stats that must be accrued simply to obtain last place (remember that y-intercept from our linear regression?); only after achieving last place do additional stats contribute to standing points in the way we’ve assumed.

We obviously were not the first to try to quantitatively determine fantasy value above replacement, such as this and this, and we’d be seriously remiss if we didn’t mention Tanner Bell, on whose work this model was largely based [22-24,1]. I still can’t figure out how he does this stuff in Excel though.

As the projection system used here is compiled from 7 sources, each of which only update their projections occasionally, the response time of our rankings to players with recently shifting playing time expectations is slow. Put simply, players such as Cliff Lee, Yu Darvish and Marcus Stroman are all projected to be pitching this year in these rankings, to put it mildly, it doesn’t really look like that will happen and I can’t really suggest you should be drafting any of them [25].

Although I don’t believe this model is 100% mathematically rigorous based on the discussion points above, it does possess a certain simplicity and common sense approach that appeals to me as an engineer. Generally speaking, don’t draft in a vacuum, adjust these rankings as you see fit, because if you didn’t it’d be our team not yours.

References

  1. Bell, T. (2013, March 22) Create Your Own Fantasy Baseball Rankings: Part 5 – Understanding Standings Gain Points.
  2. ESPN Stats & Info (2015, March 01) Sortable 2015 Projections (Hitters and Pitchers).
  3. Yahoo Fantasy Sports (2015, February 21) 2015 Projections (Hitters and Pitchers).
  4. CBS Sports Fantasy Baseball (2015, March 01) 2015 Projections (Hitters and Pitchers).
  5. Steamer Projections (2015, March 01) 2015 Projections (Hitters and Pitchers).
  6. Razzball Projections (2015, March 01) 2015 Projections (Hitters and Pitchers).
  7. Szymborski, D. (2015, March 01) 2015 Projections (Hitters and Pitchers).
  8. Davenport, C. (2015, March 01) 2015 Major League Projections (Hitters and Pitchers).
  9. TG Fantasy Baseball (2015, March 01) 2015 Fantasy Baseball Projections (Hitters and Pitchers)
  10. Carter, S.C. (2015, March 01) 99 Problems
  11. Cockcroft, T.H. (2015, March 01) Inside the Projections Process.
  12. ESPN Stats & Info (2015, March 01) Kolten Wong.
  13. Yahoo Sports (2015, March 01) Rotisserie (Roto) Scoring System Overview
  14. Grey (2015, March 01) Stats Needed to Win Your Fantasy Baseball League.
  15. Berry, M. (2011, April 27) The Wandy Line.
  16. ESPN Stats & Info (2015, March 01) Scoring Formats
  17. Wikipedia (2015, March 01) Pat Venditte.
  18. ESPN Stats & Info (2015, March 01) 2015 Fantasy Baseball Roto Top 300.
  19. Cockcroft, T.H. (2015, March 13) Top 250 Keeper League Rankings.
  20. Bell, T. (2013, December 26) A Warning About Calculating Replacement Level.
  21. Zola, T. (2010) A Discussion of Standing Gains Points Theory: The Epiphany.
  22. Sanders, Z. (2011, October 17) Fantasy Value Above Replacement: New and Improved!
  23. Zola, T. and Leibowitz, R. (2015 March 01) Player Valuation.
  24. Bell, T. (2015, March 01) Smart Fantasy Baseball
  25. Cockcroft, T.H. (2015 March 14) Darvish, Stroman, Lee Shift in Ranks

Doug Duffy | Author

I'm just a data scientist that likes to do cool things with data in my spare time. Let's talk about it.

Comments are closed.

2016 AL/NL Preview

Updated: 27/02/2016

The Changing Face of the MLB

Updated: 27/02/2016