Each week we will be posting updates to the Intro to Football Analytics story stream. The goal is to use data about college football to learn more about the game, show how we can use analytics' principles to gain insights, and to educate our readers about the basic theories of Football Analytics. If you have doubts about the Money Ball revolution in sports then hopefully this will show you the power of football analytics as well as the limits of data analysis in sports. Hopefully you will tune in each week to learn something, and we will be able to teach it well enough.
This post should be seen as an introductory post into the whole realm of Football Analytics. It won't even scratch the surface, but hopefully if you have an interest in learning more about the game of Football this post should help you on your journey. What I have done is get a collection of articles from some of my favorite football blogs; Advanced Football Analytics, Football Perspective, Football Outsiders, and Football Study Hall. These aren't the most "ground-breaking" or "revolutionary" articles, they are just a collection of thoughts and ideas that I thought represented the Football Analytics Movement. If you have any articles that meant a lot for you or did a great job of using analytics principles then please put them in the comments. Hopefully you enjoy.
Advanced Football Analytics
We can measure the values of situations and, by extension, the outcomes of plays by establishing an equivalence in terms of points. To do this we can start by looking back through recent NFL history at the ‘next points scored’ for all plays. For example, if we look at all 1st and 10s from an offense’ own 20-yard line, the team on offense will score next slightly more often than its opponent. If we add up all the ‘next points’ scored for and against the offense’s team, whether on the current drive or subsequent drives, we can estimate the net point advantage an offense can expect for any football situation. For a 1st and 10 at an offense’s own 20, it’s +0.4 net points, and at the opponent’s 20, it’s +4.0 net points.
WPA [Win Probability Added] is very sensitive to the context of the game. That same interception that cost -0.08 when a team was down by 7 points in the 2nd quarter would cost much more if it the offense was leading by a point late in the 4th quarter. Putting your opponent in immediate field goal range would be nearly fatal.
A variable's standard deviation (SD) is a measure of the width of its distribution. The ratio of the SD of offensive SR [Success Rate] to the SD of defensive SR is 1.25. The ratio for EPA [Expected Points Added] is 1.27. And the ratio for WPA [Win Probability Added] is 1.26. Offenses are spread out 25% wider than defenses in terms of performance and impact on outcomes.
So NFL coaches are playing minimax after all! They’re just using a very simple payoff function for the value of each play—either success or failure. The correlation between offensive run and pass EPA is smaller than for SR—0.35. This is remarkable because it suggests that coaches are not as sensitive to the magnitudes of the payoffs as they are to the simple dichotomy of "success." This is understandable, because without an EPA model running in your brain, it’s impossible to accurately assess the value what the myriad of possible outcomes on any given play. Coaches are human, and the easiest and simplest way to value outcomes is to say, "Yeah, I think we’re better off than before," or "Nope, that didn’t work."
That -0.57 is meaningful because it's measured in terms of the impact on winning, but it only measures half of the story. It only captures the half of plays that result in setbacks for the offense, so even the best teams will have a negative sum. Going a step further, however, we can calculate that the NFL average so far in 2010 is -0.38 -WPA for all 32 offensive lines. Chicago's offensive line, therefore, is responsible -0.19 WPA per game more than the average team. After all that theorizing and number crunching, I can still wrap my head around that result: The Bears offensive line has played bad enough to cost their team a 19% chance of winning each game.
Of course, not all patterns are real, and sometimes that rustle is just the sound of the wind. Just because you see a surprising split — maybe a player dominated the second half of the season after a slow start — doesn’t mean that the "trend" is real. For example, here are some splits from the 2011 season:
Reggie Wayne was much better against teams that wear the color blue than when facing teams that have no blue in their uniforms.
Being ahead late in games is strongly correlated with winning games, of course. And think what that means: quarterbacks who are ahead late in games play conservatively, which increases their completion percentage and decreases their interception rate. As it turns out, passer rating significantly overvalues those two statistics, which compounds the problem. Conversely, teams trailing in games are likely to lose, and also likely to throw riskier passes, lowering completion percentages and increasing interception rates. As a result, the driving factor behind the correlation between passer rating and wins is a third factor that causes both: leading late in games.
Pythagorean records aren’t perfect predictors of the future, and no one claims that they are. To the contrary, it is established that there are better models one could use to predict a team’s future record. That said, for predictive purposes, Pythagorean records certainly have one benefit: they are more predictive than actual win-loss records.
Once we mark the time of score, it’s pretty easy to come up with an average score margin — the Game Script — at any point in the game. Let’s use a simple game as an example. In ’04, the Chargers went into Cleveland and won 21-0, scoring touchdowns in each of the first three quarters. How do we come up with the average score margin — the number I’m calling the Game Script?
The Chargers first touchdown came with 2:13 to go, so for the first 12:47 (or 12.8 minutes), the Chargers were tied with the Browns, 0-0. The next score didn’t come until there was 2:24 left in the half, so for 14.8 minutes, the Chargers led by 7 points. The final score came with 8:36 left in the third quarter, which means the Chargers led by 14 points for 8.8 minutes, and then by 21 points for the final 23.6 minutes. The game script score for this game is therefore 12.04, the average scoring margin for the Chargers during this game.
Football Study Hall
Football basically boils down to creating opportunities and converting them. You create opportunities based on where you start your drives (field position) and how well you move the ball (efficiency and explosiveness); that's a significant part of the equation. FBS teams created between 3.0 (Miami-OH) and 9.7 (Baylor) scoring opportunities (i.e. trips inside the 40) in 2013. They created scoring opportunities on between 23.5 percent (Miami-OH again) and 67.0 percent (Florida State) of their drives. You can overpower a team by simply creating more opportunities than they can either thwart or create themselves.
Dean Oliver's Four Factors for basketball, to which I made liberal reference in the initial Five Factors post, are unrelated to each other. From a statistical standpoint, your ability to shoot efficiently has nothing to do with your ability to rebound, and your turnover rates aren't tied to your ability to draw or avoid fouls. These football factors as originally structured, however, are tangled into knots. Success Rate, the primary efficiency measure, has a strong correlation with Points Per Play, my original explosiveness measure. Field position is dictated in large part by the quality of the offense and defense (i.e. efficiency and explosiveness). While finishing drives is its own skill to an extent, it still has a strong correlation to pure quality as well. And even turnovers are tied in part to quality among the randomness
To describe those results as "compelling" would be selling them short. It's a landslide. On the final count, the higher-ranked team according to the recruiting rankings won roughly two-thirds of the time, and every "class" as a whole had a winning record against every class ranked below it every single year. (The only exception came last year, when "three-star" teams came up short in head-to-head meetings with "one-star" teams. Otherwise, the hierarchy held across every line.) The gap on the field also widened with the gap in the recruiting scores: While "one-star" recruiting teams fared slightly better against blue-chip opponents than "two-star" teams, both groups combined managed a grand total of 19 wins over "five-star" opponents in 112 tries. Broadly speaking, the final results on the field broke along a straight line demarcated on signing day.
In most games, teams were trying to average in the 32-36 range (win percentage in this range: 66 percent) instead of the 24-28 range (win percentage: 32 percent). [...]
Just go buy this year's or next year's Football Outsiders Almanac. As a football fan you will not regret it.
14% of the variation in a team's non-garbage time margin of victory can be explained by the difference between their sack rate on offense and their sack rate on defense. Teams who sack their opponent more than they get sacked themselves tend to outscore their opponents (I'm telling you guys, this is revolutionary stuff right here). An eight percent increase in sack rate margin (1 less sack on offense and 1 more sack on defense per 25 drop-backs) would be expected to increase your margin of victory by almost 5 points.
So we really do not see that much improvement as a kicker gets older. How many field goals you make on average is more about talent than experience (duh) but I thought there would be a little more of an example of progress with something that is very skill based.
To me it is hard to conclude anything from this plot other than this: How you get the ball has little impact on how you perform on that drive. It is mostly based on where you get the ball. Just in case, here is a plot showing the largest difference in the data, receiving the ball from a turnover on downs or from a punt.
If you get the ball inside your opponent's 10 and turn it over on downs, you are more likely than not to have your opponent just go ahead and punt the ball right back to you (the top line is Punt %, the line that crosses it at the end is Touchdown %). Perhaps these numbers should tell more coaches to use fourth down as an opportunity to gain more yards to get a first rather than kick a field goal in the redzone.
The main effect of the rule change is definitely the touchback frequency, the spike at your own 25 yard line is huge. But other than that, there have only been small shifts in kickoff returns. In total the new kickoff rule changes moved the receiving team back by about an extra yard and a half per drive on average. Teams used to start the drive with about 72 yards to go, and teams now start their drives with 73.5 yards to go, on average. We actually could have calculated this without using any data on returns whatsoever. If we assume that 35% of kickoffs in 2012 were going to get the 5 yard boost from the new touchback spot, our average start spot would increase, in favor of the offense, by 1.75 yards (.35*5). If we assume that the other kickoffs that were returned had a 5 yard penalty from the new kickoff spot, that would hurt the offense by 3.25 yards (.65*5). Adding these together and we get a difference of 1.5 yards in favor of the kicking team, and this is exactly what we see empirically.
And my single most favorite sports analytics post of all time, The Dennis Rodman Series from Skeptical Sports. I still like to re-read this every couple of months.