7 min read
There are 150 quintillion (i.e. the one after quadrillion) permutations to consider when completing your NCAA bracket. Some of us don’t have time to review them all; if you are likewise short on time, you can let MapR do the heavy lifting for you and get your personalized bracket from the Crystal B-Ball!
In this post, we describe the methodology of the Crystal B-Ball and use it to make some predictions about who’s going to the 2016 Final Four and offer some probabilities about which team will be crowned the national champion in Houston on April 4. We cannot guarantee victory in your office pool, but we can promise you won’t get a perfect bracket.
Completing a “smart” bracket is deceptively simple. All you need to do is determine a win probability of the teams in each game (they add to 100% so you only need to find one of them), simulate a large number of matchups, and then tabulate the most likely outcomes. As in many cases, what appears to be simple is often incredibly challenging. Determining accurate win probabilities for each team is one of those obstacles. Here’s a summary of the current advanced metrics used for selecting the brackets from the NY Times.
The goal of generating a customized bracket is to link the fundamentals of the game with historical outcomes and present those options to a user such that he or she can select the importance of each. Here is a list of those options:
Based on the weight the user supplies for each category, the probabilities are adjusted for each match (with a small dose of randomness to simulate the unpredictability of the tournament) and the games are “played”, round-by-round, until a winner has been determined and the bracket completed.
The randomness in the methodology will generate endless numbers of unique brackets, especially in the earlier rounds. To summarize the results, we need to generate many brackets and tabulate. It should be noted that when numerous simulations of random events are aggregated, probabilities emerge in the long run, filtering out the “madness” of long shots, dark horses and Cinderella teams. Keep in mind while assessing these results, that Goliath beats David 99 out of 100 times, but if they only fight once, there’s always a chance.
The following results are based on 1 million brackets generated from the Crystal B-Ball with the following weights that favor fundamentals over reputation (you can set your own, plus award one team an automatic spot in the Final Four):
|Foul Shot Conversion||84%|
The table below shows the percent of simulations in which each team reached the Final Four.
Table 1 - Final Four Probabilities, by Region
Kansas represented the South in the Final Four in 34% of the brackets. This was the only region where the #1 seed dominated the others. In the Midwest region, Michigan State, the #2 seed, appeared to be an overwhelming favorite to reach the Final Four (44% of brackets compared to Virginia at 18%). Oklahoma and Xavier, also #2 seeds, rounded out the most likely Final Four participants.
Using a similar aggregation, the teams that were most frequently crowned as champions among the 1 million brackets appear in the table below.
Table 2 - Championship Probabilities
The numbers suggest Michigan State is the team to beat this year. It appears that the Jayhawks are poised to give them their toughest challenge, if they survive a potential semi-final against Oklahoma.
Predicting an event doesn’t seem to influence the outcome but it does make it more fun, especially when this particular event can be considered, in statistical terms, a stochastic mystery wrapped in a random puzzle. In other words, the Spartans should wait to win before cutting down the nets in Houston.
Some of these games may come down to a buzzer-beater, questionable officiating, or one of those random bounces that makes a ball destined for the basket roll around the rim and out. We have yet to build a model that reliably accounts for those “one-in-a-quintillion” shots.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.