Sunday, December 4, 2011
On Ranking Systems
It's that time of year again: No, not for spreading tidings of comfort and joy, but rather time to spread complaints about team ranking systems and the BCS.
So I figured I'd create my own ranking system.
Implicit in the idea of the ranking system is that there is a total ordering of the teams. So we should be able to assign each team a real value by which they can be sorted.
While in many sports, a team's record should suffice, in college football in particular teams play very few games. There are about 730 college football teams and they each play between 12 and 14 games a season. Frankly, the idea that we can accurately describe all of these teams and their relative abilities with a single number is farcical. But it's a very popular farce and I want to play, too. And I, of course, want my farce to be both my own and, in some sense, better than everybody else's.* And for me, better means "saner."
*And that's why it's so popular.
Here's a rough description. I wrote a Python program that implements all of this.
So, since this is a ranking problem, I decided to start off by assigning every team a "strength" score, s. Teams with higher strengths tend to beat teams with lower strengths. The team rankings are simply the teams sorted by strength. Now, we just need some sort of model that relates two team's strengths to something observable, like winning. A function that estimates the chance of winning is a good choice. It should take the team A's strength and team B's strength as arguments and return one team's (let's say team A's) chances of winning. As team A's strength increases (holding team B's constant), the function should increase and asymptotically approach 1.0. As team A's strength decreases, the function should asymptotically approach 0.0. For two teams with equal strengths, the function should give a win probability of 0.5.
Meet the logistic function. If we pass it team A's score minus team B's score, it fits our criteria.
Now we have our model: we assign each team a strength and predict the outcome of a game as one team's likelihood of winning by taking the difference of the teams' strength scores and passing these to the logistic function. We can try and come up with these strengths ourselves, or we can automatically tune the model to fit our observations -- the results of each game.
To do that, we know we want to minimize the error between the model's prediction and the game result. Let's assign each game a 1.0 if Team A wins and a 0 if Team B wins. If we allow for ties, we'll assign them a 0.5. We can then take our team strengths and our model and compare our model results from each game to observed results. We can then take the difference of our model result and the game result value and square this to get rid of the sign (it doesn't matter). This is the method of least squares.
We sum all of the squared differences -- the model's errors -- from each game. To make our model as accurate as possible, we want to minimize this sum. So we assume a set of strengths, say 1.0 for each team, and use a minimization tool to find the set of team strength scores that minimizes the error between the model and the observed games. Since we're solving for all of the teams' strengths simultaneously while using all of the games played by all of the teams that season, we automatically account for strength of schedule. There are a lot of minimization (optimization) techniques. I used SciPy's implementation of BFGS.
Once we have the solution vector of team strengths, we can sort by strength and that's our ranking.
Anyway, that's the basic idea. I'll hopefully talk more about it later. Maybe about how you can include things like score in the results and maybe improve on this write up.