One of my favorite rating systems for NBA teams is the Simple Rating System (SRS). A team’s SRS rating is made up of two things: average margin of victory and strength of schedule. The rating is denominated in points above or below average, where zero is average.

These ratings can be used to estimate the probability that home team A will defeat visiting team B. Let me explain how I went about the process of converting SRS ratings into win probabilities.

I started by looking at all regular season games from 1976-77 through 2012-13 in seasons that ended in an odd number (i.e., 1976-77, 1978-79, etc.).

For each game I computed the difference between the home team’s SRS and the visiting team’s SRS.* In other words:

dSRS = hSRS – vSRS

** I should note that I used the team’s SRS at the end of the season, not the team’s SRS when the game was played.*

I then built a logistic regression model using a home win indicator as the dependent variable (1 = home win, 0 = home loss) and dSRS as the independent variable. That model yielded the following equation for expected win probability:

p = 1 / (1 + e

^{-(0.616591 + 0.166622 × dSRS)})

For example, if the 2012-13 Miami Heat played the San Antonio Spurs in Miami, its win probability would be:

dSRS = 7.03 – 6.67 = 0.36

p = 1 / (1 + e^{-(0.616591 + 0.166622 × 0.36)}) = 0.663

To test this model, I looked at all regular season games from 1976-77 through 2012-13 in seasons that ended in an even number (i.e., 1977-78, 1979-80, etc.).

I created “buckets” of games by rounding the SRS difference to the nearest one, then computed the actual and expected home winning percentage within each bucket. Here are the results for the buckets that contained at least 250 games:

dSRS | Games | Actual | Expected |
---|---|---|---|

-11 | 306 | 0.206 | 0.229 |

-10 | 376 | 0.239 | 0.259 |

-9 | 435 | 0.278 | 0.293 |

-8 | 577 | 0.341 | 0.328 |

-7 | 631 | 0.385 | 0.366 |

-6 | 752 | 0.370 | 0.405 |

-5 | 901 | 0.425 | 0.446 |

-4 | 938 | 0.506 | 0.488 |

-3 | 1114 | 0.530 | 0.529 |

-2 | 1142 | 0.572 | 0.570 |

-1 | 1291 | 0.607 | 0.611 |

0 | 1248 | 0.646 | 0.649 |

1 | 1288 | 0.686 | 0.686 |

2 | 1147 | 0.733 | 0.721 |

3 | 1093 | 0.745 | 0.753 |

4 | 937 | 0.811 | 0.783 |

5 | 887 | 0.794 | 0.810 |

6 | 754 | 0.837 | 0.834 |

7 | 630 | 0.849 | 0.856 |

8 | 582 | 0.878 | 0.875 |

9 | 436 | 0.890 | 0.892 |

10 | 380 | 0.895 | 0.907 |

11 | 306 | 0.931 | 0.921 |

I was satisfied with the out-of-sample results, so I rebuilt the model using all regular season games from 1976-77 through 2012-13 and obtained the following win probability formula:

p = 1 / (1 + e

^{-(0.613230 + 0.167546 × dSRS)})

And here are the results for that model across all seasons:

dSRS | Games | Actual | Expected |
---|---|---|---|

-14 | 251 | 0.131 | 0.150 |

-13 | 307 | 0.189 | 0.173 |

-12 | 478 | 0.205 | 0.198 |

-11 | 594 | 0.224 | 0.226 |

-10 | 725 | 0.263 | 0.257 |

-9 | 904 | 0.291 | 0.290 |

-8 | 1208 | 0.330 | 0.326 |

-7 | 1394 | 0.371 | 0.364 |

-6 | 1551 | 0.393 | 0.403 |

-5 | 1839 | 0.450 | 0.444 |

-4 | 2018 | 0.495 | 0.486 |

-3 | 2223 | 0.524 | 0.528 |

-2 | 2386 | 0.567 | 0.569 |

-1 | 2525 | 0.604 | 0.610 |

0 | 2443 | 0.640 | 0.649 |

1 | 2504 | 0.687 | 0.686 |

2 | 2400 | 0.725 | 0.721 |

3 | 2208 | 0.740 | 0.753 |

4 | 2011 | 0.795 | 0.783 |

5 | 1837 | 0.801 | 0.810 |

6 | 1545 | 0.844 | 0.835 |

7 | 1388 | 0.851 | 0.856 |

8 | 1205 | 0.885 | 0.876 |

9 | 904 | 0.894 | 0.893 |

10 | 729 | 0.909 | 0.908 |

11 | 600 | 0.928 | 0.921 |

12 | 484 | 0.934 | 0.932 |

13 | 308 | 0.929 | 0.942 |

14 | 254 | 0.945 | 0.951 |

For example, in a matchup where the home team has a +2 advantage in SRS, the home team would be expected to win 72.1 percent of the time, a figure that is slightly less than the actual result of 72.5 percent.

Now as I mentioned earlier, I used end-of-season results when calculating the difference in SRS, but you can use this within the current season with some minor modifications.

Based on some work by Tom Tango that is summarized a here, I’ve found that a team’s SRS within season should be adjusted as follows:

aSRS = (G × SRS + 12 × 0) = (G × SRS) / (G + 12)

In other words, I am adding 12 games of league average performance (SRS = 0) in order to get a better estimate of the team’s “true” talent level.

For example, tonight the New Orleans Pelicans (G = 7, SRS = 2.85) are playing the Los Angeles Lakers (G = 8, SRS = -4.91) in Los Angeles. In order to calculate the Lakers win probability, we first have to adjust each team’s SRS:

LAL aSRS = (8 * (-4.91)) / (8 + 12) = -1.96

NOP aSRS = (7 * 2.85) / (7 + 12) = 1.05

Next compute the SRS difference:

dSRS = -1.96 – 1.05 = -3.01

Finally, plug this number into the win probability formula:

p = 1 / (1 + e

^{-(0.613230 + 0.167546 × (-3.01))}) = 0.527

So we would estimate that the Lakers have about a 52.7 percent chance to win tonight.

by
Awesome work! I have recently been considering the same problem but hadn’t had time to get around to it. Great job!

Thanks Ian.

I’m curious where you arrived at the 12 extra games value? Tango seems to suggest only 14 games are needed for the NBA total, so shouldn’t it be (14-g)?

Great work regardless however.

Using data from the last three 82-game seasons (2009-10, 2010-11, 2012-13):

Var(obs) = 0.1569^2

Var(rand) = 0.0552^2

Var(true) = 0.1569^2 – 0.0552^2 = 0.1469^2

0.5^2 / n = 0.1469^2 => n = 0.5^2 / 0.1469^2 = 11.59

Rounding up to the next higher whole number, we get a sample size of 12 games.

But shouldn’t it still be 12-g? Aren’t you arriving at an overly regressed to the mean aSRS total there? So shouldn’t it be 4 league average games, not the full 12, given they’ve already played 8?

I think I’m misunderstanding what the 12 games represent there – will that number adjust as we get more data? 50 games into the season for instance, we presumably no longer need to regress aSRS to the mean at all, right?

You are misunderstanding.

That’s the beauty of the regression equation: you add 12 games, regardless as to how many games you have played, be it two or two hundred.

You would still add 12 games of league average to regress the observed performance no matter how many games you have. It’s just that as you get more and more observed data, the 12 games worth of regression has a smaller and smaller effect. You don’t have to shrink the number of games worth of regression to add–the fact that the number of observed games keeps growing while the regression constant doesn’t is what makes the regressed portion less significant the more data you have.

The 12 games represents the idea that after 12 games, a team’s true talent is most likely to be approximately halfway between their observed performance and league average. For example, if you look at every team that wins 9 out of 12 games (.750 W%), and then look at how those teams perform the rest of the season, they’ll probably win about 62.5% of their remaining games on average. Adding 12 games of league average performance to 12 games of observed performance reflects that by estimating that true talent is halfway between the observed and average performance at that point.

It turns out that regression to the mean works by always adding the same number of games worth of regression for any number of observed games. If you have 50 games of data for each team, then compare that 50-game performance to the rest of the season, teams will tend to regress about 20% toward the mean (50/(50+12)~80%).

Great stuff, Justin!

My question as it pertains to the “add x games of 0.0 SRS” variant: is 12 the correct number for SRS? I know 12 is the correct number for WPct, at least for recent seasons, because it’s derived from the variance in winning percentages. But is it valid to apply that same number to SRS? My gut instinct is that it’s not, because 12 games into the season, SRS actually conveys a lot more info than WPct. So I’d expect to have to add fewer than 12 games of 0.0 SRS to get a team’s “true” talent estimate.

(Fwiw, I found that number to be 7 here, using a different method: http://t.co/LCkdjvp9SK)

I agree with Neil. Great point!

@Neil Paine: I’ll have to take a look at that, but since you only looked at one season I would caution against concluding too much from your work. That said, you have a valid point.

Now, how would you convert that expected WP into a point spread? Certainly it won’t be the dSRS, although for small differences it will be close.

ACtually, he shows that, by simply regressing the SRS (adding 12 games of average). You simply need to add the home side advantage.

However, Justin I think went a bit too far with his estimate of the Lakers win % tonight. See, his original equation works, because it used unregressed SRS.

But with the Lakers example, he’s using regressed SRS. So, if he wants to have one formula, he needs to adjust the SRS first in the main table as well.

@MGL: You can convert average point differential into an expected winning percentage using this formula:

W% = 1 / (1 + e^(-0.13959 * (Avg. Pt. Diff)))

In the Spurs at Heat example, we know the expected winning percentage of the Heat is 0.663:

0.663 = 1 / (1 + e^(-0.13959 * P))

Solving for P (i.e., the point spread) we get:

P = -ln(1/0.663 – 1) / 0.13959 = 4.85

Justin’s equation can be rewritten as:

x = EXP(-dSRS * .167)

win% = 1.844 / (1.844 + x)

However, that dSRS in the model is the UNREGRESSED value.

If you used a regressed value (meaning Games / (Games + 12) x dSRS), then the “x” equation becomes:

x = EXP(-reg_dSRS * .191)

I have a question about the games where both teams are equal. Have you tried to see if the homecourt advantage is bigger if both teams are really good (5+ SRS) compared to average or bad teams (-5 SRS). I think its possible that HCA between even teams is higher compared to bad teams. That could also explain why HCA is bigger in the playoffs.

I think the main reason HCA appears to be bigger in the playoffs is because they give more home games in a given series to the higher seed, which is likely to also be the better team in the series, thus amplifying the apparent HCA of playoff games.

HCA advantage is big in the playoffs both sides. That is despite the fact that both teams have equal travel and rest times whereas in the regular season, some road teams back to backs and long travel times.

http://apbr.org/metrics/viewtopic.php?f=2&t=8021

You’d do better to include HCA as a factor in the equation. It has not remained constant over time. Around 1990 or so the HCA took a big drop. Huge, really. In the mid 80s the HCA was as high as 5.5 pts/game. Only one or two teams in a season would have winning road marks.

Most likely explanation? The move from two refs to three.

How would you do this with a sport like football or baseball, where the HFA is significantly less than in basketball?