Bracketology for 2020 NCAAs

Jim Hyla · February 13, 2020, 07:52:18 PM

Quote from: upprdeckall i know its a lot more fun being 6 games left wondering how Cornell can screw it up than it is 6 games left hoping other teams screw it up.

And a lot more fun than wondering how Schafer talks to his team.

Also Adam's stats are probably better than any for the above.::crazy::

Robb · February 13, 2020, 10:26:23 PM

Quote from: adamwWell if Jim read my article (I'm sure he was one of the ones rolling his eyes) - he'd know that those brackets are neither "projections" nor "predictions" at all. It bothers the ever loving hell out of me when they're called that.

Ok, but to be *really* pedantic, I wouldn't necessarily call a KRACH-based Monte Carlo simulation a "prediction" either. The future results are based exactly on prior performance, and each team performs exactly that well through the remainder of the season - there's no new information being added. Therefore, I am a little skeptical that it is much/any more predictive than a strict "if the season ended today" approach, because even in the Monte Carlo approach, the addition of new information ended today, too.

This is especially true since you're boiling it down to a single predicted bracket (which is obviously not what the real bracket will be, no matter the methodology). Looking at the histograms of the 20,000 final PWR ranks for each team would be more interesting/informative for me than trying to "predict" an exact bracket.

David Harding · February 13, 2020, 11:31:37 PM

Quote from: BearLoverI found the old thread, which includes jfeath's incredibly helpful analysis:

[Ugh sorry, having problems posting the link on my phone, but it's post #12 or so on the thread titled "2018 ECAC Permutations"]

The thread is here http://elf.elynah.com/read.php?1,213740,page=1

adamw · February 14, 2020, 12:03:37 AM

Quote from: Robb
Quote from: adamwWell if Jim read my article (I'm sure he was one of the ones rolling his eyes) - he'd know that those brackets are neither "projections" nor "predictions" at all. It bothers the ever loving hell out of me when they're called that.
Ok, but to be *really* pedantic, I wouldn't necessarily call a KRACH-based Monte Carlo simulation a "prediction" either. The future results are based exactly on prior performance, and each team performs exactly that well through the remainder of the season - there's no new information being added. Therefore, I am a little skeptical that it is much/any more predictive than a strict "if the season ended today" approach, because even in the Monte Carlo approach, the addition of new information ended today, too.

This is especially true since you're boiling it down to a single predicted bracket (which is obviously not what the real bracket will be, no matter the methodology). Looking at the histograms of the 20,000 final PWR ranks for each team would be more interesting/informative for me than trying to "predict" an exact bracket.

Fair enough of course - but at least it's "predicting" future games and then factoring that into a creating a final bracket. The others do nothing of the sort and are thus completely useless.

adamw · February 14, 2020, 12:08:25 AM

Quote from: BearLoverI may be using the wrong terminology, but what do you mean you are not accounting for uncertainty? And would you put your money where your mouth is and assert that if this season were to be replicated from this point forward one million times, in 2/3 of those scenarios Cornell would finish with exactly the 3-seed? That Minn St would finish 3/4 of the time with exactly the 2-seed? That NoDak would finish 7/8 of the time with the 1-seed? Do you also believe Cornell is almost three times as likely to win the ECAC championship as Clarkson, and more likely to win it than the rest of the ECAC combined? I believe these numbers are quite a bit off, and a few years ago someone on here ran a regression [correct terminology?] showing that the tails of these models are off--the chances of a top team beating a bottom team are overstated by the model, which over the course of a lengthy stretch (in this case, ~12 remaining games) leads to significantly underrating volatility.

"uncertainty" in the mathematical sense, as I understand it, takes into consideration that only using relatively small sample sizes of past results, is too "precise" - so to speak - and therefore pulls things closer to the mean. This will account for the high odds you're talking about, and addresses your concerns - if we can get around to implementing it. I'm not the expert on this, however. The same dumb conversation from past years spurred us - i.e. John Whelan - to come up with an algorithm that adds uncertainty to the mix.

I do wonder - pray tell - how I'm supposed to put my money where my mouth is in regards to the season being played out 1,000,000 times. How are we supposed to test this hypothesis so that I can wager with you?

As it stands, we play out the season 20,000 times - which is pretty stable. Doing it 1,000,000 times isn't going to change things. Cornell will still be a 3 seed just about the same amount of times. I really don't know how else you'd like me to "prove" anything.

BearLover · February 14, 2020, 02:05:55 AM

Quote from: adamw
Quote from: BearLoverI may be using the wrong terminology, but what do you mean you are not accounting for uncertainty? And would you put your money where your mouth is and assert that if this season were to be replicated from this point forward one million times, in 2/3 of those scenarios Cornell would finish with exactly the 3-seed? That Minn St would finish 3/4 of the time with exactly the 2-seed? That NoDak would finish 7/8 of the time with the 1-seed? Do you also believe Cornell is almost three times as likely to win the ECAC championship as Clarkson, and more likely to win it than the rest of the ECAC combined? I believe these numbers are quite a bit off, and a few years ago someone on here ran a regression [correct terminology?] showing that the tails of these models are off--the chances of a top team beating a bottom team are overstated by the model, which over the course of a lengthy stretch (in this case, ~12 remaining games) leads to significantly underrating volatility.

"uncertainty" in the mathematical sense, as I understand it, takes into consideration that only using relatively small sample sizes of past results, is too "precise" - so to speak - and therefore pulls things closer to the mean. This will account for the high odds you're talking about, and addresses your concerns - if we can get around to implementing it. I'm not the expert on this, however. The same dumb conversation from past years spurred us - i.e. John Whelan - to come up with an algorithm that adds uncertainty to the mix.

As others have said above, the model assumes that every team will perform as well as they have thus far--the third-best team will continue to perform as the third-best team, the second-best team will continue to perform as the second-best, etc. In reality, there is an extremely wide range of outcomes for how well a team can perform over its remaining games.

(Though, @Robb and @Tom Lento, why wouldn't each team have a 100% chance of finishing in exactly its current spot if what you are saying is true? And then what would be the point of running more than one simulation? I had understood that the possibility of the third-best team not performing as the third-best team going forward *is* built into the model, it just isn't weighted heavily enough.)

Quote from: adamwI do wonder - pray tell - how I'm supposed to put my money where my mouth is in regards to the season being played out 1,000,000 times. How are we supposed to test this hypothesis so that I can wager with you?

I wager $50 that Cornell finishes somewhere other than exactly third in NCAA seeding. You wager $100 that they finish exactly third. Would you make that bet?

Quote from: adamwAs it stands, we play out the season 20,000 times - which is pretty stable. Doing it 1,000,000 times isn't going to change things. Cornell will still be a 3 seed just about the same amount of times. I really don't know how else you'd like me to "prove" anything.

I did not mean to suggest 1,000,000 simulations would yield a different result than 20,000. I just used one million as a random large number. 20,000 will do just fine.

Robb · February 14, 2020, 06:49:57 AM

Quote from: BearLover(Though, @Robb and @Tom Lento, why wouldn't each team have a 100% chance of finishing in exactly its current spot if what you are saying is true? And then what would be the point of running more than one simulation? I had understood that the possibility of the third-best team not performing as the third-best team going forward *is* built into the model, it just isn't weighted heavily enough.

Because according to KRACH, even the third-best team has a non-zero probability of losing to the worst team. That's why there would be a range of final PWR ratings for each team - in some of the trials, Cornell will be unlucky and lose a bunch of games to worse teams, and in some trials they won't. That's where the histogram comes in - so you can see the range of possibilities and get an estimate of how likely each of those possibilities appears to be (again, assuming that each team really is exactly as good as it's current KRACH rating).

Maybe it really is "most likely" that Cornell winds up 3rd, but if the range of possibilities is that we could end up anywhere from 1st to 20th, then 3rd might only happen in 15% of the trials, so then you'd get a sense that it's far more likely that Cornell finishes worse than 3rd even though 3rd is the most likely individual result.

KenP · February 14, 2020, 08:10:48 AM

Cornell's KRACH SOS is 30; Other than Clarkson (42, also ECAC) you have to go down to #17 Lowell to find a weaker SOS. What this means is that we are on a steep slope and need to keep our footing. One "bad" loss or tie will drop us more than a bad loss by North Dakota or Minnesota State etc. Our percentages in these simulations will hold steady with wins but can change very quickly with a single loss.

Trotsky · February 14, 2020, 09:23:01 AM

Quote from: KenPWhat this means is that we are on a steep slope and need to keep our footing.

Perfect analogy. Very nice!

adamw · February 14, 2020, 09:49:00 AM

Quote from: RobbThat's why there would be a range of final PWR ratings for each team - in some of the trials, Cornell will be unlucky and lose a bunch of games to worse teams, and in some trials they won't. That's where the histogram comes in - so you can see the range of possibilities and get an estimate of how likely each of those possibilities appears to be (again, assuming that each team really is exactly as good as it's current KRACH rating).

Maybe it really is "most likely" that Cornell winds up 3rd, but if the range of possibilities is that we could end up anywhere from 1st to 20th, then 3rd might only happen in 15% of the trials, so then you'd get a sense that it's far more likely that Cornell finishes worse than 3rd even though 3rd is the most likely individual result.

The Matrix shows a possibility of Cornell finishing anywhere from 1 to 13, currently. I'm really bad with graphical thingies - so, thus, the Matrix, as opposed to a histogram. No difference really.

https://www.collegehockeynews.com/ratings/probabilityMatrix.php

As I said above - the gaps in RPI are significant from 2 to 3 and, moreso, 3 to 4. That is one reason Cornell is pretty stable in that spot. Doesn't mean it will definitely happen.

But for the umpteenth straight year, BearLover is focusing on the wrong thing. The simulation is what it is. Obviously it only goes by past results. It can't do anything else. The gripe you have is with how KRACH works, not how the simulation works. Ratings are based on relatively small samples of past results, and can't possibly take into account many things. Therefore, adding an "uncertainty" node into the algorithm will smooth things out a bit -- though it's a somewhat generic addition, and isn't based on any sort of analysis of team strength, injuries, or whatever.

Getting hung up on trashing the simulations really misses the point. Though, as I've said, we're definitely trying to "improve" it.

upprdeck · February 14, 2020, 09:57:15 AM

If someone could create a better simulation, they would be getting rich from it and not posting to this board I suspect.

adamw · February 14, 2020, 10:33:10 AM

Also - to further answer the question which I thought was obvious, but I guess not ... things aren't 100%, because every game is "played" using KRACH as the weighted probability of a team winning that game. So, Team A has 600 KRACH ... Team B has 300 KRACH ... Team A has a 2/3 chance of winning that game. Play out every game for the rest of the season this way ... 20,000 times ... and you get your simulation.

The issue, then, is Cornell's chances of winning a game being overweighted if you think the KRACH differences aren't realistic.

But taking offense to the idea that the top 3 ends up exactly how it is now, without really understanding why the simulations come out that way, is silly.

KenP · February 14, 2020, 10:59:40 AM

Adam, I have a suggestion. Cornell has guaranteed itself an ECAC home playoff series, i.e. our worst-case ECAC rank is 8. Can you do something similar for NCAA participation? Specifically:

Create a worst-case simulation for each team. Set their games to losses, rerun the Monte Carlo simulation for the rest of the field, and grab the lowest PWR from the modified simulation. Add that as a

Everyone likes to know when their team officially "clinches" a playoff berth... this approach would give fans that answer.

P.S. other comments / suggestions:
* check your number formatting. Some values round up to "1.0%" while others round down to "1%"
* repeat table header at the bottom.
* not sure if you can have a slider bar at both the top and bottom but that would be nice too
* consider similar best case scenarios. playoffstatus.com has that which is why i saw Vermont is officially out of the race.
* for the best case and worse data... either show as additional columns or use that information and enter "x" for impossible year-end rankings. ("Impossible" may not be the right word given this is only 20,000 simulations.. but hopefully you and your readers will get the point.)

BearLover · February 14, 2020, 11:26:12 AM

Quote from: adamw
Quote from: RobbThat's why there would be a range of final PWR ratings for each team - in some of the trials, Cornell will be unlucky and lose a bunch of games to worse teams, and in some trials they won't. That's where the histogram comes in - so you can see the range of possibilities and get an estimate of how likely each of those possibilities appears to be (again, assuming that each team really is exactly as good as it's current KRACH rating).

Maybe it really is "most likely" that Cornell winds up 3rd, but if the range of possibilities is that we could end up anywhere from 1st to 20th, then 3rd might only happen in 15% of the trials, so then you'd get a sense that it's far more likely that Cornell finishes worse than 3rd even though 3rd is the most likely individual result.

The Matrix shows a possibility of Cornell finishing anywhere from 1 to 13, currently. I'm really bad with graphical thingies - so, thus, the Matrix, as opposed to a histogram. No difference really.

https://www.collegehockeynews.com/ratings/probabilityMatrix.php

As I said above - the gaps in RPI are significant from 2 to 3 and, moreso, 3 to 4. That is one reason Cornell is pretty stable in that spot. Doesn't mean it will definitely happen.

But for the umpteenth straight year, BearLover is focusing on the wrong thing. The simulation is what it is. Obviously it only goes by past results. It can't do anything else. The gripe you have is with how KRACH works, not how the simulation works. Ratings are based on relatively small samples of past results, and can't possibly take into account many things. Therefore, adding an "uncertainty" node into the algorithm will smooth things out a bit -- though it's a somewhat generic addition, and isn't based on any sort of analysis of team strength, injuries, or whatever.

Getting hung up on trashing the simulations really misses the point. Though, as I've said, we're definitely trying to "improve" it.

Sorry, I don't agree. KRACH is not meant to be predictive. Therefore, a simulation based entirely on KRACH to predict future outcomes is flawed. The simulation is called the "Pairwise Probability Matrix" and assigns a percentage to all possible outcomes. To a casual viewer, like me, there is no note or disclaimer anywhere on the page that suggests the model is NOT predictive. In fact, below the chart there is a line that reads: "these numbers accurately reflect each team's possible finish to a high rate of precision."

I do not believe almost anyone reading the matrix understands it is limited to extrapolating existing KRACH over the rest of the season, or what this reliance on KRACH entails. I have seen dozens of posts on this forum over the past several years that look to the percentages as good predictive data. I don't think the simulation provides good predictive data.

I don't mean to sound overly harsh. CHN is one of my most-visited websites and I'd love to see the simulation improved. But in its current form, the simulation is not particularly helpful and I'd argue it is actively misleading.

Dafatone · February 14, 2020, 11:33:32 AM

Quote from: BearLoverAs others have said above, the model assumes that every team will perform as well as they have thus far--the third-best team will continue to perform as the third-best team, the second-best team will continue to perform as the second-best, etc. In reality, there is an extremely wide range of outcomes for how well a team can perform over its remaining games.

There really isn't much else to be done, though. Models as to what's going to happen in the future rely on what's happened in the past.

Could things be more complex? Sure. You could use goal differentials or advanced possession/shot metrics to estimate team quality. You could also weight for recency (Cornell's been weaker over the last month or so than the rest of the season, so maybe we're "worse" than our total record). But that's all more complicated and not necessarily more accurate.

How do you go about introducing "uncertainty" other than taking an existing model and blurring the results by some factor?