clock menu more-arrow no yes mobile

Filed under:

Projecting and Predicting the NCAAT: A discussion with Dr. Joel Sokol

Every year, people ranging from the hoops diehard to the casual sports fan race to fill out their brackets following selection Sunday. Methodologies for filling out the brackets include picking based on RPI, picking based on site proximity to competing universities, and even picking based on what mascots would win in a cage fight. Professor Joel Sokol is an associate professor in the School of Industrial and Systems Engineering at the Georgia Institute of Technology and has been an influential figure in devising a new methodology for predicting the outcome of NCAA tournament brackets called the "LRMC System".


Dr. Joel Sokol

Professor Sokol's method has been featured on ESPN and even CNN as the accuracy of his system has been daunting. FTRS wanted to talk to Dr. Sokol directly and learn more about his system and methodology.  Here is the first of three parts in a sequence of questions Winfield and myself e-mailed Professor Sokol in an effort to better grasp his NCAA hoops prediction system.

FTRS: Since we are a Georgia Tech blog, can you give us a synopsis of your relationship with Georgia Tech? Do you root for Georgia Tech or any other universities in college athletics?

Sokol: This is my 11th year as a professor at Tech, so I certainly do root for them. I also root for my undergraduate alma mater, Rutgers. If they're playing head-to-head, I'll usually root for whichever team needs the victory more.

FTRS: What is the LRMC system?

Sokol: The LRMC system is our method of predicting which teams are likely to do well in the NCAA (basketball) tournament. ("Our" means that I originally developed it with Paul Kvam (ISyE professor), and since then have also worked on it with George Nemhauser (ISyE professor) and, most recently, with Mark Brown (math professor at CCNY -- which happens to be the only school to ever win the NCAA and NIT tournament in the same year).)

"LRMC" stands for "
logistic regression/Markov chain", which are the two mathematical models we combined to create our rankings.

FTRS: What variables are taken into account for the LRMC system? What factors seem to be the most indicative of a successful NCAA tournament team? Are there any factors that sports media talking heads like to attribute to successful teams that statistically appear to be unimportant?

Sokol: LRMC was designed to use only basic scoreboard data that's available to everyone -- it looks at the winner, loser, location, and margin of victory of each game between Division 1 teams. [At the time, that's all that was available. These days, you can find all sorts of other neat numbers about each team at Ken Pomeroy's site.]

Among the factors that LRMC considers, the teams that tend to do well in the NCAA tournament generally have blown out a lot of opponents (showing that they're not just better, but much better), won almost all their home games (given a home-court advantage, a very good team should be even less likely to lose), and never been blown out (or maybe been blown out once under some special circumstances, like Duke was last week when Georgetown had one of the best shooting days ever).

The "talking heads" often focus on different things that we've found to be somewhat irrelevant. Recent record (often "last 10 games") doesn't seem to be important; it might seem so because better teams are likely to do well in their last 10 games (simply because they're better!), but among two teams with near-equal rankings, the one with the better last-10 record is not significantly more likely to do well in the NCAA tournament. [Consider the last time Tech went to the Final Four, in 2004: the part of their record that looked strongest was early in the season (they even beat eventual-champion Connecticut by double figures at a neutral court), not late.]

But the biggest talking-head point that the numbers don't bear out is the effect of close games on win/loss record. We all probably hear over and over how some team "just knows how to win tight games" (or how another "hasn't learned how to win tight games yet"), but the numbers show that reality is different: close games are mostly determined by luck. So a team that has lost a lot of close games (especially close road games) will probably perform better in the NCAA tournament than its win/loss record would suggest, and a team with a lot of close wins (especially at home) will probably perform worse.

In fact, the reason we developed LRMC is related to this. Back around 2002 or so, Tech lost a holiday tournament game to Tennessee on a buzzer-beater half-court shot. At the end of the season, Tech went to the NIT with no "NCAA bubble team" discussion... but several of the prognosticators said Tech would've been a bubble team (not necessarily in the NCAA tournament, but at least in the conversation) if they had managed to win just one more game.

So basically, those people were saying that if Tennessee's half-court shot had missed, they would've thought Tech was an NCAA-caliber team... but because the half-court shot went in, somehow Tech was only good enough for the NIT.

That didn't seem to make sense to us -- our mathematical/basketball gut feeling was that winning a game by 1 or losing by 2 depending on the outcome of a last-second shot probably didn't make much difference; the real "story" of both games is that the two teams were essentially evenly-matched that day. And when we started looking at the data, it turned out we were right... and we were on our way to developing LRMC.

Note: Dr. Sokol will be at the Lindbergh Taco Mac in person to discuss his system and its methodologies on Saturday, February 13th at 7:15 pm right before the Wake Forest basketball game. This could be an excellent opportunity for you to learn more about the LRMC system on a deeper level.

Click here for an LRMC Summary written by Dr. Sokol himself

If you would like to see the accuracy of Dr. Sokol's system, click here to see how the LRMC predicted the 2009 basketball tournament.

Click here to view the LRMC's current 2010 rankings.


Parts II and III will be up Thursday and Friday at 8AM.