clock menu more-arrow no yes mobile

Filed under:

Georgia Tech Football: Advanced Stats - Evaluating Coaches

Full-on nerdery afoot.

Georgia Tech v Boston College Photo by Maddie Malhotra/Getty Images

Yesterday, we focused on team evaluation metrics. Today, let’s switch gears and talk about evaluating coaches. To borrow my own words from 24 hours ago:

It’s incredibly important that we challenge traditional thinking in sport (regardless of if that’s on the field or off, but today, we’re focusing on the on-field stuff). Data analysis is not always about finding the most effective way to win; at its core, it’s about learning more about how the sport works.

And challenging traditional thinking is incredibly important when it comes to coaches, who are frequently some of the most traditional of traditionalists. As I mentioned in a previous piece on fourth down decision making:

Football traditionalists (which make up the vast majority of coaches) frequently preach conservatism and “trusting your defense” when it comes to fourth down. In years past, no one has ever been (seriously) hung out to dry in a press conference for punting on fourth down. Risk aversion is often rewarded, while risk taking is often criticized. Coaches will often argue that their gut told them that punting was the right decision, despite the butterfly effect that decision may have had in the game.

At the end of the day, coaches are evaluated on their trophy cases and win-loss record, but again, we’re concerned with process — how do they fill said trophy case? How do they win games? How do they create advantages for their teams on the field, on the playsheet, or on the practice field?


Why don’t we attempt to quantify the creation of said advantages? How can we effectively evaluate coaches on how they do their jobs and contribute to their win-loss records? Let’s first identify some important dimensions for evaluation:

  • Game management: how well does a coach do things that win them individual games? What trends are they following in their play calls? How have they performed on the margins in close games?
  • Recruiting: What is the level of talent that a coach brings to their campus?
  • Player Development: Once talent is on campus, how good is that talent? How well does that talent project to the pros?

The next step is evaluating these dimensions before we use them for evaluation (it’s evaluation all the way down!). How might we quantify each of these dimensions?

  • Game management: “Obvious Go” rate, early-downs pass rate, win percentage in one-score games

We talked about the concept of “obvious go” situations when faced with a punt-versus-go situation on fourth down in a previous piece. In short, this metric is the rate at which a coach attempts a conversion on a fourth down when expected to do so (by the model) and when a successful conversion would net them 1.5+ win probability points.

Early-downs pass rate is a measure of in-game aggressiveness: passing on first and second downs are known to net higher EPA than rushers on said downs, so how often does a coach’s offense pass in those situations?

Win percentage in one-score games is how we are evaluating “winning on the margins”. This is the simplest of our metrics (and potentially flimsiest for analysis purposes, at least on the face), but in lieu of something along the lines of robust post-game win probabilities or second-order wins, this is the best we’ve got for now.

  • Recruiting: class score

This is nothing more than a measure of the strength of a team’s recruiting class as calculated by 247Sports. You can see sample class scores from 2021 here.

  • Player Development: Draft Value Over Expected (DVOE), created by other-friend-of-the-program @drmartylawrenc1

There’s a lot that went into building DVOE (you can read our friend’s entire paper on the matter here), but here are the CliffsNotes:

The Draft Value data is from work by @statsbylopez Value and Draft Monetary Value are utilized for this analysis because it allows us quantify the value of undrafted players (i.e., 0). Whereas if the draft pick were used, the value to assign to undrafted players would be ambiguous.

To define an Expected Draft Value and Expected Draft Monetary Value, a blended logarithmic function was fitted to the rolling average data (black line in Fig 1 & 2). These “Expected” values follow a smoother monotonic relationship than the simple rolling average.

The difference between a recruits Actual Draft Value (DV) and Expected Draft Value (EDV), can be expressed as Draft Value Over Expected (DVOE).

Low-ranked recruits that were drafted early have high, positive values of DVOE. Conversely, high-ranked recruits that went undrafted have low and/or negative values of DVOE.

(A more rigorous scientific analysis would probably have checked our set of stats here and other stats for their correlations with win percentage or point differential, but we’re just here to play with data and have fun, aren’t we?)


With cfbfastR and the R programming language, we can grab and organize all of our metrics for all coaches between 2014 and 2020 (check the dataset and code out here). But now that we have our coaching evaluation dataset, what can we do with it? We could just pick out the top coaches in each metric, but wouldn’t it be nice to identify coaches who are similar to each other across all metrics? It turns out that by using a technique called k-means clustering, we can do just that. Friend-of-the-program @CFBNumbers on Twitter describes k-means clusters and how to build them thusly in his piece on quarterbacks in the NFL Draft:

K-Means Clustering is a very popular unsupervised method of machine learning that essentially attempts to take a dataset and group similar things together... It is a very simple and effective way to analyze similarities... ...I had to use principle component analysis (PCA) to reduce the number of mentions to the maximum of two dimensions. The number of clusters is something you have to manually choose, but using various techniques you can identify the optimal number of clusters.

When we generate four k-means clusters (4-means clusters?) with our dataset, we get an image that looks something like this:

Doing some PCA on these clusters reveals what’s most important to each cluster (more on these later on):

Data from @cfbfastR. Figure by @akeaswaran.

The clusters aren’t perfectly separable and so they may have churned out two unique groups and two large murky ones, but there are some interesting trends we can parse out given who is in each group and what dimensions are important to each group. For clarity, coaches’ names aren’t shown in the chart, so let’s do a roll call:

Cluster #1: “G5 Mains”

Notable Coaches: Nick Rolovich, Jay Norvell, Dino Babers, Chris Creighton, Josh Heupel

Analysis: Most of the coaches contained in this group have coached G5 teams or lower-tier P5 teams for most of their careers. From the PCA chart, you can see that this cluster generally had weaker recruiting classes and poor records in one-score games with not much more development than expected. These coaches are good at some of the game management pieces, but are held back (overall) by the talent on their rosters.

Cluster #2: “P5 Lifers”

Notable Coaches: Brian Kelly, Bryan Harsin, Dan Mullen, David Shaw, Lincoln Riley, Geoff Collins

Analysis: Most of these coaches were/are P5 head coaches — and boy, there’s a lot of them. Remember how I said two clusters were unique and two clusters were murky? This one is definitely one of the murky ones (with the other being Cluster 1); there’s nothing that sets each coach significantly apart from each other in this group (another cluster would have shown up in some of the pre-clustering analysis if so), but on the whole, this group is almost the opposite of Cluster 1: these coaches pairs strong recruiting classes with close wins, but they also do not generate much in the way of draftable talent.

Cluster #3: “Weird Offense Friends”

Notable Coaches: Willie Fritz, Jeff Monken, Lane Kiffin, Paul Johnson, Ken Niumatalolo

Analysis: Here’s where we get into the unique clusters. I found it extremely interesting that the clustering algorithm was able to identify this group of mostly triple-option or triple-option-adjacent offensive disciples from the data. There’s a couple of outliers in this group (Steve Addazio, Rocky Long, and Luke Fickell in particular, at least to my eye), but on the whole, you see a lot of coaches here that run unique offenses with some sort of option element, whether that’s pure triple (Johnson + Lunsford + his academy counterparts), RPO sets (Kiffin, Drinkwitz, Satterfield), or a fusion of both (Healy and Fritz).

Cluster #4: “Consistently Good”

Notable Coaches: Nick Saban, Dabo Swinney, Ryan Day, Urban Meyer, Kirk Ferentz, Chris Petersen

Analysis: This group is the creme de la creme of college football in terms of consistent Kevin Sumlin and Ed Orgeron. At first glance, it doesn’t make a whole lot of sense for these two to be included in the same echelon as Saban and Dabo. However, the PCA for this cluster clues us into why they’re named here: the most important dimension here was DVOE. These guys all put a bunch of talent into the draft, with Sumlin most likely living off of Johnny Manziel and Mike Evans’ selections’ in 2014 and Orgeron benefitting heavily from the exodus of players from his 2019 LSU team, tabbed by some as “possibly the best CFB team ever”.

Wrapping up

Using k-means clustering, we’ve identified groups of similar coaches and why they’re similar, but we can also identify what sets certain coaches (and certain styles of play) apart. Like we’ve been trying to do all along, we’ve uncovered trends in process: we’ve identified how coaches operate in similar ways, and, by using metrics that (in theory) correlate with wins, we’re seeing how those operations affect their results. The bottom line is always going to be the bottom line — wins are always gonna be the deciding factor for a coach’s fate — but identifying and evaluating how different coaches get to that bottom line can be incredibly valuable.

All data provided by @cfbfastR. If you’re interested in toying around with the coaching dataset, you can find that here.