Bracketology 2016-17 Style

Started by Jim Hyla, December 22, 2016, 06:54:56 AM

Previous topic - Next topic

jkahn

Quote from: adamw
Quote from: BearLoverI don't know enough about the model/KRACH/etc. to say anything especially productive, but I will say that, as a casual observer, the 98% number, the 91% number from a different model, and the 85% number from the Matrix before the RPI game all failed the eye test.  If I had to guess why, it's because KRACH, as adamw said, is meant to be descriptive of what has occurred rather than predictive.  Thus, it doesn't account for regressions to the mean, etc.  Cornell is not going to beat RPI nine times out of ten, even if their past records would equate to such a mismatch.  And beating Clarkson is far closer to a coin flip than a sure thing, even though the models gave Cornell a very high chance of winning that too.

Yes and no. I'm not sure I'd use "regression to the mean" as the right way to put it, because that assumes you know what the mean is, which you really can't do from past results. But I know what you're driving at.  Intuitively you think Cornell-Clarkson were closer.  But they key is, how do you model that?  Not just go by "feel."  I think there may be a way to use goal differential and things like PDO and Team Corsi, to come up with some sort of counter-weight on straight KRACH.  But I couldn't tell you exactly what that would be.  Goal diffs have always been a dicey thing in hockey.  Corsi may be better, but has its flaws.  There might be some balance there.

I'd also like to be able to definitively answer the question why Cornell was affected more than Penn State, for example. With an actual demonstration.

There's a flaw in the model somewhere.  If a loss last night brings Cornell to a 65% NCAA chance, and there was a 35% KRACH chance of the happening, and if we assume that the other results were pretty much at an average expectation, then:
if the was a 35% chance of 65% after the game and a 65% chance of 98% or higher (say even 100%), then the chance before last night should be no greater than .35 x .65 + .65 x 1.00 or about 88%.  And even the 65% after last night feels way overstated to me (again based on KRACH).  Using the KRACH prior to last night, there's now a 35% chance we'll be swept and the odds are now against us winning the series.  And if we lose in three, the added 1-1 to our record will drop our RPI further.
Jeff Kahn '70 '72

Tom Lento

Quote from: adamw
Quote from: BearLoverI don't know enough about the model/KRACH/etc. to say anything especially productive, but I will say that, as a casual observer, the 98% number, the 91% number from a different model, and the 85% number from the Matrix before the RPI game all failed the eye test.  If I had to guess why, it's because KRACH, as adamw said, is meant to be descriptive of what has occurred rather than predictive.  Thus, it doesn't account for regressions to the mean, etc.  Cornell is not going to beat RPI nine times out of ten, even if their past records would equate to such a mismatch.  And beating Clarkson is far closer to a coin flip than a sure thing, even though the models gave Cornell a very high chance of winning that too.

Yes and no. I'm not sure I'd use "regression to the mean" as the right way to put it, because that assumes you know what the mean is, which you really can't do from past results. But I know what you're driving at.  Intuitively you think Cornell-Clarkson were closer.  But they key is, how do you model that?  Not just go by "feel."  I think there may be a way to use goal differential and things like PDO and Team Corsi, to come up with some sort of counter-weight on straight KRACH.  But I couldn't tell you exactly what that would be.  Goal diffs have always been a dicey thing in hockey.  Corsi may be better, but has its flaws.  There might be some balance there.

I'd also like to be able to definitively answer the question why Cornell was affected more than Penn State, for example. With an actual demonstration.

To the extent Cornell (or any team) has over-achieved against expectations you're going to see these weird scenarios in any model. The issue is one of modeling uncertainty. You know the input - raw game outcomes - is information poor given the length of the college hockey season. Your model, therefore, has fairly high uncertainty baked into it. If you just do iterative monte carlo you're going to get a particular distribution of results but that doesn't mean you can take the probability output and say "Cornell is 90% likely" - you need to understand where you might be affected by hidden error and apply other measures to balance the outcome predictions the model uses. Basically, straight KRACH is not sufficiently information rich to avoid this kind of overstated certainty.

This is a big reason why the 538 election predictor had Trump at 30% to win and 10% to win while losing the popular vote when the Princeton Election model had a 98%+ chance for a Clinton victory. 538 assumed, based on past polling data, that polling errors tend to be correlated across clusters of states, and applying that correction shifted the 2016 prediction drastically in ways that it did not in 2012, when correlated error wasn't likely to matter.

To go back to hockey, KRACH helps account for the uncertainty caused by insular schedules which either unfairly boost records of teams in weak conferences or unfairly punish teams in strong conferences. I mean, that's pretty much what it does. However, it doesn't account for the uneven distribution of random outcomes you're likely to see over a small sample of college teams playing a short schedule. I don't know that reversion to the mean is the explanation in this case, but it's an explanation, and we can get a better model of what that "mean" should be than whatever current record would indicate. Like you, I'm not sold on goal differential in general but looking at fraction of 1-goal outcomes might help. I do think PDO and corsi are worth a look.

Something to consider:

Cornell's PDO this season is ~102. Corsi around 50%
Penn State: PDO is ~98. Corsi around 60%
Providence: PDO is ~100. Corsi around 56%

Let's assume, and I know not everybody around here agrees, that these shot attempt rates are predictive of game success over a large sample but have high variance and low predictive certainty for individual games or even entire NCAA seasons.

Now consider schedule strength rankings and schedule outcomes - these three teams are pretty even on that front. But assuming those season averages for Corsi/PDO are representative, and not being dragged way up or down by a handful of outliers, that suggests Cornell's tournament prediction % should have been substantially lower than the predictions for Penn State or Providence from the beginning. Cornell's record was a lot better than one might expect relative to schedule strength and on-ice performance as reflected in these metrics. The pure KRACH-based model clobbers Cornell for losing a "should win" game because the next iteration both accounts for the damage to Cornell's record caused by the loss *and* the sharper decline in predicted win rate as a result of the fall in past win %. In this case, at least, if you use just a corsi weighting adjustment to the KRACH prediction Cornell's predicted chances probably would've dropped drastically from the outset, because there would've been a downward adjustment to their raw KRACH-based win probabilities. The model would almost certainly be less volatile with respect to this specific comparison set.

Now, that doesn't mean the model would be better, and how to adjust it correctly is the hard part - there are whole bodies of literature about this. One way is to do it empirically - you've got a lot of college hockey playoff history to use, and you can simulate all manner of different models and test their predictions against the observed distributions. Even just 10 years * N teams where N is number of conference tournament QF participants gives you something to model against. It's worth investigating in the offseason - if you've already got the shot data it's not going to be all that hard.

Dafatone

Quote from: jkahn
Quote from: adamw
Quote from: BearLoverI don't know enough about the model/KRACH/etc. to say anything especially productive, but I will say that, as a casual observer, the 98% number, the 91% number from a different model, and the 85% number from the Matrix before the RPI game all failed the eye test.  If I had to guess why, it's because KRACH, as adamw said, is meant to be descriptive of what has occurred rather than predictive.  Thus, it doesn't account for regressions to the mean, etc.  Cornell is not going to beat RPI nine times out of ten, even if their past records would equate to such a mismatch.  And beating Clarkson is far closer to a coin flip than a sure thing, even though the models gave Cornell a very high chance of winning that too.

Yes and no. I'm not sure I'd use "regression to the mean" as the right way to put it, because that assumes you know what the mean is, which you really can't do from past results. But I know what you're driving at.  Intuitively you think Cornell-Clarkson were closer.  But they key is, how do you model that?  Not just go by "feel."  I think there may be a way to use goal differential and things like PDO and Team Corsi, to come up with some sort of counter-weight on straight KRACH.  But I couldn't tell you exactly what that would be.  Goal diffs have always been a dicey thing in hockey.  Corsi may be better, but has its flaws.  There might be some balance there.

I'd also like to be able to definitively answer the question why Cornell was affected more than Penn State, for example. With an actual demonstration.

There's a flaw in the model somewhere.  If a loss last night brings Cornell to a 65% NCAA chance, and there was a 35% KRACH chance of the happening, and if we assume that the other results were pretty much at an average expectation, then:
if the was a 35% chance of 65% after the game and a 65% chance of 98% or higher (say even 100%), then the chance before last night should be no greater than .35 x .65 + .65 x 1.00 or about 88%.  And even the 65% after last night feels way overstated to me (again based on KRACH).  Using the KRACH prior to last night, there's now a 35% chance we'll be swept and the odds are now against us winning the series.  And if we lose in three, the added 1-1 to our record will drop our RPI further.

Keep in mind that other games factor in, too.  It may be that yesterday's other results were unlikely and/or bad for us, skewing the numbers.

adamw

Quote from: Tom LentoNow, that doesn't mean the model would be better, and how to adjust it correctly is the hard part - there are whole bodies of literature about this. One way is to do it empirically - you've got a lot of college hockey playoff history to use, and you can simulate all manner of different models and test their predictions against the observed distributions. Even just 10 years * N teams where N is number of conference tournament QF participants gives you something to model against. It's worth investigating in the offseason - if you've already got the shot data it's not going to be all that hard.

Tom, all of this perfectly explains why KRACH is not a good predictive model (or, good enough) and I greatly appreciate the discussion on ways the simulation can be improved.

But this still doesn't necessarily answer why Cornell dropped as much as they while other teams did not. Or maybe you did explain it, and I'm not getting it.

In other words - there's no way for KRACH to know, before hand, that Cornell has theoretically over-achieved. So while that might explain the flaw in its predictive value - it doesn't explain the math of why they dropped when others didn't.

Am I making sense?

By the way - we don't have Corsi-esque shot data before 2 seasons ago. We have general shots per game going back to 2002, but that's not as good.
College Hockey News: http://www.collegehockeynews.com

Dafatone

Quote from: adamw
Quote from: Tom LentoNow, that doesn't mean the model would be better, and how to adjust it correctly is the hard part - there are whole bodies of literature about this. One way is to do it empirically - you've got a lot of college hockey playoff history to use, and you can simulate all manner of different models and test their predictions against the observed distributions. Even just 10 years * N teams where N is number of conference tournament QF participants gives you something to model against. It's worth investigating in the offseason - if you've already got the shot data it's not going to be all that hard.

Tom, all of this perfectly explains why KRACH is not a good predictive model (or, good enough) and I greatly appreciate the discussion on ways the simulation can be improved.

But this still doesn't necessarily answer why Cornell dropped as much as they while other teams did not. Or maybe you did explain it, and I'm not getting it.

In other words - there's no way for KRACH to know, before hand, that Cornell has theoretically over-achieved. So while that might explain the flaw in its predictive value - it doesn't explain the math of why they dropped when others didn't.

Am I making sense?

By the way - we don't have Corsi-esque shot data before 2 seasons ago. We have general shots per game going back to 2002, but that's not as good.

I'm almost certain it's our higher win%, which means one loss hurts our RPI more than other teams.  I can't dig up the RPI numbers from before yesterday, but CHN lets me set individual games to ties.  We're at .5455.  Had we tied (I know, impossible), we'd be at .5499.  Had we won, .5543.

Doing the same with Penn State and Providence, Penn State is at .5507.  Had they tied, .5546, had they won, .5584.  Smaller boosts than we would have had, but significant.

Providence is at .5516, had they tied, .5554, had they won, .5589.  So, same as Penn State, roughly.

The results are less striking than I expected.  Weirdly, we shuffled up to 12th in some of these scenarios.

In conclusion, I don't have much of a conclusion.

abmarks

How about the simplest explanation- we could just have had a number of PWR comparisons that were razor close?  THat matrix uses krach to simulate game results but ultimately fills out the PWR- so the issue is more likely in the PWR than the karch numbers.
 
Looking at the probability matrix, I think you need to look at the probabilities of landing in any given PWR final position.  I'm curious what the distribution was for us across positions before last night, because as of 920 pm, we have crazy high probabilities of landing in 15 or 16, while Penn St. and Providence skew their individual probabilities much more towards the higher PWR finishes

The answer to the riddle will be found by seeing which individual PWR comparisons are likely to get flipped against us as opposed to the other two teams having the same thing happen.



Tom Lento

Quote from: adamw
Quote from: Tom LentoNow, that doesn't mean the model would be better, and how to adjust it correctly is the hard part - there are whole bodies of literature about this. One way is to do it empirically - you've got a lot of college hockey playoff history to use, and you can simulate all manner of different models and test their predictions against the observed distributions. Even just 10 years * N teams where N is number of conference tournament QF participants gives you something to model against. It's worth investigating in the offseason - if you've already got the shot data it's not going to be all that hard.

Tom, all of this perfectly explains why KRACH is not a good predictive model (or, good enough) and I greatly appreciate the discussion on ways the simulation can be improved.

But this still doesn't necessarily answer why Cornell dropped as much as they while other teams did not. Or maybe you did explain it, and I'm not getting it.

In other words - there's no way for KRACH to know, before hand, that Cornell has theoretically over-achieved. So while that might explain the flaw in its predictive value - it doesn't explain the math of why they dropped when others didn't.

Am I making sense?

By the way - we don't have Corsi-esque shot data before 2 seasons ago. We have general shots per game going back to 2002, but that's not as good.

Drat, I thought the data went back at least 4-5 seasons before going into shot per game territory. :(

This isn't at all about KRACH "knowing" about the fact that Cornell over-achieved. It's just what you might see when a model suddenly hits a sharp correction at some extreme point in the distribution. The fact that the loss pulled Cornell's record in the direction of what corsi would suggest was as much coincidence as anything. The real question is whether or not such a volatile result is reasonable.

The cause of this volatility for Cornell but not Penn State is much harder to tease apart, and frankly I didn't really answer your question directly because I can't. One hypothesis is others here are right and the reason has to do with relative winning percentages and either how that interacts with RPI or some artifact of the KRACH-based predictive model that tends to over-emphasize high winning percentages. That's really the only difference I notice between those teams - they all have similar outcomes against reasonably similar competition (Providence less so than Penn State), but Cornell has a much higher win %. I'm not an expert here so I'm really guessing, but this makes intuitive sense to me when I think about how KRACH handles perfect teams - they're expected to be perfect against everybody, and as soon as they drop a point to somebody their rating falls sharply into line with a more reasonable projection.

You could test this win % hypothesis - run the model on Thursday's inputs. Then do it again but update Cornell's record to include a 2-2-1 season series against an imaginary KRACH-neutral opponent. See how it changes the model predictions. See how much Cornell's volatility changes after Friday's results are added to the model. If the volatility is due to the model having a self-reinforcing high win % expectation, you'd expect 1) Cornell's odds on the pre-Friday model to decrease and 2) Cornell's shift in odds on the post-Friday model to be less volatile.

Of course it could also just be related to Cornell's relative position in the comparison rankings.

upprdeck

all it has shown is that at 2-0 we were sitting pretty good. anything less was killer.

the question will be what happen when the b10 teams beat up on each other next weekend

do we fall behind idle Vt if we lose

does 1-1 get us past ND if they lose are we rooting for stsloud to come back against nDakota

Dafatone

Miami blows it late to a top team one last time.

St. Cloud and ND going to OT.  I think we want St. Cloud, after all.  At the very least, we want them to win tonight and take ND to 3 games.

Swampy

Two things strangely missing from this rather technical discussion are standard errors and confidence intervals. I'm neither familiar with, nor particularly interested in, the inner workings of the models. But it seems to me that, other things being equal, the standard errors of a prediction regarding a team's prospects will decrease as the team plays more games, but the impact will decrease as the number of games increases. Since Cornell has played fewer games than the factory schools, one would therefore expect an additional game to have a bigger impact on Cornell's standard errors than the competition's. Also, since confidence intervals are wider with smaller N, an additional datum will have a larger impact, percentage wise.

But since such considerations have been absent from the discussion thus far, I have the impression that the predictions are essentially point estimates rather than interval estimates (with some plausibly associated probability distribution) -- the famous "margin of error" routinely misrepresented and misunderstood by the media when reporting survey data.

Comments from someone in the know?

Tom Lento

Quote from: SwampyBut since such considerations have been absent from the discussion thus far, I have the impression that the predictions are essentially point estimates rather than interval estimates (with some plausibly associated probability distribution) -- the famous "margin of error" routinely misrepresented and misunderstood by the media when reporting survey data.

Comments from someone in the know?

This is generally correct, yes. The prediction is made based on a set of simulation results, which produces a frequency distribution of outcomes reached. You basically do some arithmetic to get the percentage. Think of it as a rigorous initial odds-making.

Loads of detail on one such model:

https://fivethirtyeight.com/features/how-our-2015-16-nba-predictions-work/

Once the rating and simulation inputs are settled, they do something quite typical:

QuoteOnce the adjustments are made, we simulate the regular season 10,000 times to find the average final record of each team and the percentage of simulations that each team makes the playoffs. We use NBA tiebreaking rules to seed teams in the playoffs (including the change this year that makes overall record the top factor in seeding) and then simulate the playoffs 10,000 times to find the winner of the finals.

As with our other sports forecasts, we run our simulations "hot," meaning that a team's CARM-Elo rating is updated after each simulated game within a simulated season. This matters more than you might think; essentially, it accounts for the possibility of hot streaks and cold streaks, as well as the increased uncertainty in projecting a team's fortunes the further you go into the future. This tends to compress playoff and championship odds as compared with running the simulations cold. For instance, as of launch, our model gives the Warriors a 52 percent chance of winning the NBA title, which might sound high — but their probability would be even higher, 73 percent, without this adjustment.

The second paragraph suggests the CHN model's lack of updating on KRACH odds could very well be causing some havoc with their predictions.

adamw

Quote from: Tom LentoThe second paragraph suggests the CHN model's lack of updating on KRACH odds could very well be causing some havoc with their predictions.

As I allude to in our explainer article on the site, I feel like a valid argument can be made to keep KRACH as a snapshot from when the simulation starts. But I can't articulate the reason very well.

On the other hand, I do know that re-calculating KRACH on the fly after every game would be all but impossible. As it is, running 20,000 simulations takes like 4 hours. And each simulation contains a few dozen games or so. At least. If KRACH were re-calculated after each simulated game within each simulation, I think it might take a week to run.  Of course, I allow for the fact that I might be doing it wrong.

I could run fewer simulations. The whole thing seems to stabilize at around 3,000 or less. 20,000 is probably overkill. But it does allow for picking up on some outlier possibilities.
College Hockey News: http://www.collegehockeynews.com

jkahn

Quote from: adamw
Quote from: Tom LentoThe second paragraph suggests the CHN model's lack of updating on KRACH odds could very well be causing some havoc with their predictions.

As I allude to in our explainer article on the site, I feel like a valid argument can be made to keep KRACH as a snapshot from when the simulation starts. But I can't articulate the reason very well.

On the other hand, I do know that re-calculating KRACH on the fly after every game would be all but impossible. As it is, running 20,000 simulations takes like 4 hours. And each simulation contains a few dozen games or so. At least. If KRACH were re-calculated after each simulated game within each simulation, I think it might take a week to run.  Of course, I allow for the fact that I might be doing it wrong.

I could run fewer simulations. The whole thing seems to stabilize at around 3,000 or less. 20,000 is probably overkill. But it does allow for picking up on some outlier possibilities.
Adam, your CHN model showed a 98% chance before this weekend.  Given that we had a 35% KRACH chance of losing on Friday, then 35% of the model's average Cornell NCAA chances after a Friday loss plus 65% of Cornell's average NCAA chances after a Friday win would have to get you to that 98%.  So, that would mean, if the model was correctly programmed, that Cornell, after a Friday loss would still have a 94% average chance.  So if the 65% after Friday was right, that was quite a huge outlier.  And the 94% chance after a loss does not at all pass the smell test.  It's not the non-adjusting of KRACH that's causing the problem.
Jeff Kahn '70 '72

Dafatone

Quote from: jkahn
Quote from: adamw
Quote from: Tom LentoThe second paragraph suggests the CHN model's lack of updating on KRACH odds could very well be causing some havoc with their predictions.

As I allude to in our explainer article on the site, I feel like a valid argument can be made to keep KRACH as a snapshot from when the simulation starts. But I can't articulate the reason very well.

On the other hand, I do know that re-calculating KRACH on the fly after every game would be all but impossible. As it is, running 20,000 simulations takes like 4 hours. And each simulation contains a few dozen games or so. At least. If KRACH were re-calculated after each simulated game within each simulation, I think it might take a week to run.  Of course, I allow for the fact that I might be doing it wrong.

I could run fewer simulations. The whole thing seems to stabilize at around 3,000 or less. 20,000 is probably overkill. But it does allow for picking up on some outlier possibilities.
Adam, your CHN model showed a 98% chance before this weekend.  Given that we had a 35% KRACH chance of losing on Friday, then 35% of the model's average Cornell NCAA chances after a Friday loss plus 65% of Cornell's average NCAA chances after a Friday win would have to get you to that 98%.  So, that would mean, if the model was correctly programmed, that Cornell, after a Friday loss would still have a 94% average chance.  So if the 65% after Friday was right, that was quite a huge outlier.  And the 94% chance after a loss does not at all pass the smell test.  It's not the non-adjusting of KRACH that's causing the problem.

That 65% is dependent on other games.  It's not just "if we lose on Friday, we have a 65% chance no matter what.". Had other games gone differently, it could have been higher.  Or lower.

These games aren't in a vacuum.

Dafatone

Quote from: abmarksHow about the simplest explanation- we could just have had a number of PWR comparisons that were razor close?  THat matrix uses krach to simulate game results but ultimately fills out the PWR- so the issue is more likely in the PWR than the karch numbers.
 
Looking at the probability matrix, I think you need to look at the probabilities of landing in any given PWR final position.  I'm curious what the distribution was for us across positions before last night, because as of 920 pm, we have crazy high probabilities of landing in 15 or 16, while Penn St. and Providence skew their individual probabilities much more towards the higher PWR finishes

The answer to the riddle will be found by seeing which individual PWR comparisons are likely to get flipped against us as opposed to the other two teams having the same thing happen.



One thing to consider.  PWR is no longer a mystery.  It's just RPI.  If a whole bunch of factors line up perfectly, a team can win a comparison against a higher RPI team.  But they need to win head to head against them.  At any given time, there are only a few slight differences between the RPI rankings and the pairwise.