When last we left the large internet tome on fourth-down decision-making in college football in February 2021, I closed (in part) with this:
[T]ruth be told, it seems like some teams are using data-driven thinking as that way forward — the national decision-making chart above seems to indicate that there’s an increasing trend towards trusting the data in the sport in the last four years, much like there has been across sport in general over the same timeframe. It seems grandiose to stake that claim, especially given that we established that only 10% of 130 FBS programs are truly optimizing their fourth-down decision-making. However, seeing the underlying trend towards higher and higher “go” rates gives me hope that there are football programs out there that are putting the copious amounts of data they have to work to find and optimize not just fourth down performance, but also other general program inefficiencies.
At the time and based on the four years of data we had at our disposal, that hope in data-driven decision-making seemed justified: as more coaches versed in insights on in-game margins (like fourth-down decision-making) would matriculate into higher and higher positions on coaching staffs, coaches would (on the whole) get better at making these kinds of decisions.
To that end, there’s some evidence to support a continued commitment to mining these in-game margins: earlier this year, Max Olson at The Athletic reported on the use of Championship Analytics’s fourth-down charts book at a variety of FBS programs across the country, and the 2023 season itself featured a number of high-leverage fourth-down decisions (albeit failures, in the most famous of cases). Additional reporting from The Athletic’s Seth Emerson suggests that after decades of conservatism on fourth downs, some coaches believe their counterparts have now gone too far in the opposite direction.
The coach that went on record with this latter belief has pedigree, but frankly I’m not convinced he’s the most reliable arbiter on this matter nor do I agree with his position. The fundamentals of decision-making on in-game margins are simple — like I mentioned in a mailbag a few months ago:
In these situations, we have to think less of risk minimization and more of reward maximization — at the end of the day, you play to win the game.
With another college football season now complete, there’s no better time to interrogate publicly-stated assumptions about the sport, and with newer developments in public college football analytics, we can now take a look at fourth-down decision-making across the entire four-team College Football Playoff era (2014 to 2023). Did our hope for more data-driven decision making in 2021 actually bear fruit? Let’s find out.
As before, I’ll defer to the cfb4th model built by Jared Lee to evaluate coaches’ fourth-down decisions. Jared introduced this model in full detail on UteZone back in 2021, but here are the meat and potatoes:
[Y]ou start by seeing how likely every possible number of yards gained is based on past fourth down attempts and on a number of different factors such as distance to the first down, field position, strength of the offense, and strength of the defense...Adding in 3rd down plays helps to tell how teams perform when they’re trying to get a first down. Once you have the probabilities for each possible outcome, you do the same thing as field goals and punts and calculate what the win probability would be for that result and average the results together....Once you know the expected win probability of each decision, it is as simple as picking the play call that gives you the highest chance of winning the game.
If you’re interested in reading through the source code for the model, it’s available here. This model makes heavy use of cfbfastR’s win probability model, which you can read about here (the link is to the NFL version, but the CFB one is similar). For our purposes, I’ve only considered FBS versus FBS games. The code used to build the following charts and tables can be found here.
Throughout this piece, I’ll refer to some fourth-down situations as “obvious”. In the parlance of our study, this will mean a fourth down that meets the following criteria:
- The optimal choice (IE: the choice with the highest expected win probability) has an expected win probability that is 1.5% or better than that of the other choices, AND:
- It is the first quarter, OR:
- The offense has a win probability greater than 10% before the fourth down attempt.
Why an arbitrary difference of 1.5%? This dividing line is borrowed from the NFL version of this analysis, which inspired Jared’s development of a CFB-specific model and Twitter bot (RIP).
Why the added filters on quarter and win probability? We want the situations we discuss to be “meaningful” — that is, they should have some impact on the game. A ‘go’ decision by a team down 25 points near end of the fourth quarter is not going to have a significant bearing on the final result, while one in the first quarter in a (usually) tied or close game probably will. These filters are also borrowed from the NFL version of this analysis.
On the whole, college coaches are getting more aggressive on fourth downs — and as they do, they’re taking the time to be smarter. There could be a variety of reasons for this: the proliferation of the aforementioned Championship Analytics fourth-down book; younger, more data-savvy assistants; or the development of homegrown analytics departments. However, I wonder if the answer is higher level and much simpler: football is just a copy-cat sport. If a coach sees one of their counterparts doing something that leads to successful outcomes, it’s possible they implement that same thing in their own program. We’ve seen ideas diffuse this way before: see RPOs, motion, etc — and at the end of the day, like those concepts, data-driven thinking is just another tool in a coach’s toolkit.
Let’s take a look at the 2023 season in particular:
- Louisiana Tech topped the net and total win-probability gained leaderboards by a substantial margin. They made critical decisions when it mattered — but football is a capricious sport and LaTech couldn’t close the gap in other ways in a bunch of one-score games on their schedule (versus North Texas, versus WKU, at MTSU, and versus NMSU).
- Washington and Oregon both made some high profile fourth-down decisions this season (a few in games versus each other), but the Huskies got the most bang for their buck, along with a nice, shiny Cotton Bowl trophy and a National Championship Game appearance.
- Texas Tech played five games that ended within one score, going 2-3 — it’s possible that without their aggressiveness on fourth downs, things could have gone even worse.
- To no one’s surprise, Iowa’s lack of bite on offense stretched to fourth downs. At least their punter set a record or two.
- For a team that ended up winning the Mountain West, Boise State forfeited a surprising amount of win probability (in aggregate) in ‘obvious-go’ situations. Could more aggression in one-score games (like those against Central Florida, Colorado State, and Memphis) have helped Andy Avalos keep his job earlier in the year? Maybe!
The ACC is...volatile. The conference has generally tracked with national trends in ‘obvious-go’ rate, but you’ll notice a few blips above and below the national average in the table above. Very curious.
- Virginia...what are we doing here?
- In a critical year for Jeff Hafley, he nailed all eight of Boston College’s important fourth-down decisions, (most likely) buying himself another year on the job.
- Wake Forest sitting atop the ACC in win-probability gained does not surprise me one bit; we know Dave Clawson has a penchant for offensive aggression and forward thinking.
- Miami negating its own accrued advantages is so wonderfully on-brand.
It’s best we break Tech’s data down coach by coach.
Paul Johnson: very aggressive
Johnson obviously had his faults when it came to fourth down tactics (an oft-used hard count play that usually failed and burned a timeout comes to mind), but his performance overall was impressive. Given my memory of his tenure, I expected nothing less.
Geoff Collins: somehow? aggressive?
I had to do a double take after generating this table: this result definitely did not line up with my priors. It’s a fascinating thought experiment to think about how Collins showed this much prowess at these critical decisions, despite the rest of his tenure being so poor. It’s possible the answer is as simple as “game state matters”: these attempts may have been at lower win-probability marks in an effort to claw back into games rather than salt them away OR continue fighting blow for blow.
Brent Key: conservative to an extreme fault, but in small sample
Here’s where things get particularly squirrely: Key has proven himself...divisive on fourth-down conversion attempts. He’s a first-time head coach, so we can expect some growing pains, but I’ve felt that there have been more than a few missed opportunities to shift the odds in Tech’s favor in the past 21 games. Two decisions I found particularly lacking:
- 2022 versus Duke: Key punted on 4th & 2 from the Duke 36 with a minute left in the first half. The punt ended up being a touchback, so Duke was only really set back 16 yards. Our model suggested that Key go for it here, with an expected win probability gain of 4.5%. He also had the option to attempt a field goal, but given an average field goal accuracy of 37% from 53 yards out, that would have resulted in nearly the same net win probability as punting. This game ended up going to overtime with Tech forcing Duke into a difficult 52 yard field goal to extend the game. Duke missed, Tech won — case closed, right? I’m not so sure: it’s entirely possible that if Tech could have gained two yards in this spot and scored before the end of the half, it could have avoided overtime entirely.
- 2023 versus Athens: We discussed this one at length in a mailbag and a podcast shortly after the game, so I’ll reuse my answer from the mailbag (mostly). To recap the game situation: Tech had a 4th and 1 at the Athens 8 with the game tied near the end of the first quarter.
Listen to the clip for a longer, more nuanced of the matter, but bottom line: based on the data (even with the caveats we discuss), it’s a nearly indefensible decision to kick...at the end of the day, you play to win the game. This bears out in the data, which effectively states the following:
1. You’re at the +8. If you fail to convert, the most likely outcome is that the opponent gets the ball back at their own 8, which is a far more difficult position for them to be in than their own 25 off a touchback.
2. You’re playing the #1 team in the country (in your own house!). You are expected to lose by 24+ points. You have nothing to lose by making the most high-risk, high-reward choices whenever possible (and this one doesn’t even have particularly high risk!).
Our model once again suggested that Key go for it here, with an expected win probability gain of 2.5%. In choosing to kick here (and therefore not at worst extending the drive and at best scoring a touchdown), Key most likely left four points on the table in a game that ended in an eight-point margin.
Despite the talk of coaches attempting conversions too often on fourth down, they sure don’t seem to actually do that.
All too often, we tend to remember the public failures of such decisions and judge them by gut feeling rather than thinking about all of the times these decisions worked out. Remember: the goal is reward maximization instead of risk minimization.
Bottom line: by our measures, college football coaches have indeed gotten smarter and smarter on fourth down, and there’s limited evidence to support any notion of dogmatic overcorrection. Moreover, it’s reasonable to expect coaches to continue to get smarter at this as they collect more information to have directly at hand (e.g. the Championship Analytics book, StatsBomb’s American Football product) or hire more analytics minded staffers (see: Michigan, Pitt, UVA, etc).
While we’ve discussed the outcomes of the fourth-down model to death, it’s important to note some of its inherent limits.
The expected points model that backs the fourth-down model has built-in sample size bias: better teams drive more often and score more often than worse teams (great talk on this here if you’d like to learn more). Thus, we’re producing expected point values based on a skewed distribution of historic performance rather than a properly normally distributed sample. Since win-probability models depend on expected points to measure in-game performance, this underlying data skew also affects how we calculate win probability for each available choice for a fourth down and thus which option we deem “optimal” based on the expected win probability of its outcomes.
Additionally, these models don’t account for all situational factors: the fourth-down and win-probability models don’t know things like weather, quality of defense faced (though using the Vegas spread provides us some idea of relative team strength), offensive and defensive injuries, kicker quality (see: Tucker, Justin), etc. It’s possible that internal team versions of these models have the ability to account for these factors, as might the Championship Analytics fourth-down book — but alas, no one has yet been willing to give up either of these golden geese.
Finally, ‘obvious-go’ rate is not particularly stable (IE: the correlation of this rate for a team in Year 1 and the same team’s rate in Year 2 is positive but weak — only 0.16), and as a result, it’s hard to say anything definitive about a specific coach’s tendencies over time. At the same time, football is a small-sample-size sport, and these ‘obvious-go’ situations are no different — they’re inherently affected by game flow, opponent strength, and opportunity (to that end, the stability of the number of ‘obvious-go’ situations is 0.24 — so similarly positive and weak). If we had more than a (small) handful of these situations to evaluate coaches with year-over-year, maybe we’d be able to come to firmer conclusions about coaching philosophy.
Even with all of these caveats, we can still pick out overarching trends — the direction and the idea of progress here is still valid, even if magnitude may (possibly) be less so. With that in mind, compare where college football is right now to the pros. The NFL treats fourth down as almost a solved problem: going for fourth downs is now the new normal.
The NFL has the market cap and the creativity of front offices to afford to innovate and research to find edges in the sport. Can college football materially benefit from drafting off innovation from the pros? Yes, certainly: 2023 Washington has been a pre-eminent example of how this can work out for programs that listen to the data in front of them (see above).
But consider this: the NFL mandates parity in roster resources and mechanisms, while college football...decidedly does not. This lack of built-in competitive balance naturally affects the importance of finding and taking advantage of in-game margins. One might argue that there’s more parity amongst the top 15 teams in the nation, but even then, we’ve seen how far ahead the three best-recruiting teams are from the rest of the pack. So what’s the point of winning on the margins if it might not lead to winning on the field at the end of the day?
Well, we know targeted strategies work in this sport. Georgia Tech fans saw this first hand for a decade with the flexbone option: there are ways to control your range of outcomes by controlling who has the ball, how long they have it for (measured by minutes/plays/yards/etc), and what they do with it. There are certainly paths for smart teams to utilize unique schemes and data to find success by analyzing in-game margins. In the face of overwhelming talent deficits, they just have to stack those advantages to make them count on the scoreboard.
Special thanks to those who built the resources that made this writeup possible: Jared Lee for cfb4th and his Twitter bot, Saiem Gilani and others for cfbfastR, and Andrew Weatherman for cbbplotR and cbbData (used to make very pretty tables).