clock menu more-arrow no yes mobile

Filed under:

Georgia Tech Football: Advanced Stats - Team Evaluation Metrics

Question everything.

Georgia Tech v Florida State Photo by Don Juan Moore/Getty Images

In the latter stages of 2019, I wrote a novella walking through my first efforts building models based on the work of ESPN’s Bill Connelly. Of all of the words I wrote in that piece, this short selection remains most salient:

College football is an inherently human and physical endeavor, but quantifying and analyzing each and every play helps us question and verify football conventions (“establishing the running game”, “punting is always good”, “don’t risk fourth down attempt in no-man’s-land”) and discover new ways to optimize and evaluate performance in the sport.

It’s incredibly important that we challenge traditional thinking in sport (regardless of if that’s on the field or off, but today, we’re focusing on the on-field stuff). Data analysis is not always about finding the most effective way to win; at its core, it’s about learning more about how the sport works. We’re popping open the metaphorical hood and figuring out what makes the sport and actions inside it tick. What teams do with more information about trends and underlying metrics in their sport is somewhat secondary (if you’re not working for a team that is) — it’s all about process: building data sets, identifying and evaluating trends, and communicating those trends is critical.

This opening philosophical bluster brings us to today’s target of analytical criticism: traditional team evaluation metrics — how do they work? Are they Bad™️? Why?

Let’s break this down one rhetorical question at a time:

What do we mean when we talk about “traditional team evaluation metrics”? Well, you might have seen a pop-up on a broadcast that looks something like this:

Here, the broadcast is relying on offensive yardage to tell you something about San Diego’s former NFL team. Typically, these rankings are based on per-game averages or season totals. “Averaging these stats across games seems like a reasonable basis of comparison,” one might say. “They don’t seem bad to me.”

Well, mysterious anonymous fan: you’re not wrong! Using these stats (and similar) are perfectly reasonable ways to evaluate a team’s performances that we’ve used for over 100 years at the pro level and 150 years at the college level. At a surface level, these metrics make sense: you want to compare teams’ averages as they’ve played different numbers of games, and using the ranking rather than the raw number provides a base level of comparison against other teams.

“So, what’s the problem here?”

Well, there’s a few that I see:

  1. Per-game averages are still aggregates just like season totals are, just over a smaller time scale.
  2. In the NFL and college football, each team doesn’t play the same set of teams. There’s no adjustment based on the difficulty of a team’s schedule. We might talk about a team’s yards-per-game rank as inflated or affected by their schedule in passing, but the actual number doesn’t account for this.
  3. These are descriptive statistics, not predictive — they say nothing about projected performance in future game and against future opponents.

Let’s double back to a point we covered when we talked about traditional box scores last year:

Big numbers look great on the stat sheet and racking up yardage does help teams win games, but (they) tell us nothing beyond what happened. We want to know how it happened.

Simply put, these on-screen rankings tell an incomplete story. The time interval they’re collected over and the context they provide are useful for at-a-glance comparison, but they raise too many questions under further analytical scrutiny. Therefore, a better evaluation metric must:

  1. Evaluate team performance on a per-play or per-drive basis, as those are more basic building blocks to success during a game, AND:
  2. Adjust for scheduling differences, AND:
  3. Effectively project team performance in the future.

This is where our oft-mentioned friend Bill Connelly comes into play: SP+ (née S&P+), developed during the late-aughts, accomplishes these three bits (well....let’s call it 2.5 — more on this in a bit). The system and its formulas have evolved over the past almost-decade, but the original conceit (which we’ll focus on here) is in the name: SP+ stands for Success Rate (and) Isolated Points per Play (AKA: IsoPPP) plus, with the plus denoting comparison to league average. If this looks familiar to you, it’s because this is exactly how baseball’s OPS+ is calculated — in fact, in his 2013 book Study Hall, Connelly notes that OPS was what inspired the original formula. So, how does SP+ meet our criteria?

  1. Success Rate and IsoPPP are inherently per-play metrics. We talked about EqPPP here last year, and Robert defined success rate for us here yesterday.
  2. Remember how I noted that SP+ only met 2.5 of our three criteria? Here’s where the half comes into play: adjusting this composite such that its national average is the the national average in points per game isn’t a true opponent adjustment, but it removes some team-specific context from the raw metric.
  3. Success Rate and IsoPPP are consistently highly positively correlated to per-game point differential. Simply put, a team that has good success rate and IsoPPP marks will tend to continue to have good success rate and IsoPPP marks, AND if said team was compared to an opponent who had worse success rate and IsoPPP marks, that team would be expected to win a game versus said opponent.

Thus, by using SP+ and other stats that meet our three criteria, we’re able to better compare teams by identifying where they’re going compared to national average, rather than where they’ve been. We’re able to evaluate teams based on if they would beat other teams in a vacuum (IE: on a neutral field), rather than based on whom they have already beaten. We haven’t even touched on more complex metrics like EPA and CPOE here; despite relying less advanced stats, we’ve still been able to better analyze college football games and team performances just by popping open the hood, running a couple of diagnostics, and seeing what makes post-game score-lines tick. This is what modern sports analytics is all about: thinking critically about traditional forms of team evaluation, figuring out where they fall short, and building better, more informative ways forward.

Want to learn more about college football analytics? Revisit some of FTRS’ previous analytics articles here. Robert’s advanced stats recaps from the 2020 season can be found on his profile. You can also visit Football Study Hall for more analytics thinkpieces.