2018 ECAC Permutations

Started by Give My Regards, February 18, 2018, 11:38:41 PM

Previous topic - Next topic

Trotsky

It is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."

It could be wrong, of course, but the "mustness" of the feeling inverts reality.  When a well-developed algorithm conflicts with your assessment of likelihood it's most likely indicating that your brain is wrong.  There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.

BearLover

Quote from: abmarks
Quote from: BearLover
Quote from: adamw
Quote from: imafrshmn... there's reason to believe that the CHN simulations don't really capture the low-probability outcomes as well as they should (which is to say it's all a little bit overconfident). Because KRACH ratings are assumed to be (1) constant,  (2) unbiased, and (3) not-uncertain, there is no way to account for recent trends (like Clarkson tanking, for example), variations in luck, and inherent uncertainty/variability of a team's strength. Of course, these assumptions simplify this exercise to a point where it's easily understood.

repeating: anyone wants to help write a better algorithm, be my guest.

All of the above is true.   1. can be solved easily, but requires a lot of additional computing time - way too much to be worth it.  3. there's a lot packed there. Some would be good to adjust for, others have more dubious value
adamw, I wish I had the mathematical or computing background to help improve the algorithm. I can only say that the 59% chance the model gives us of winning the ECAC tournament has to be wrong. 40% would already be pushing it. Someone want to go back and check how often a team whom this model gave a "60% chance" of winning an 8-team tournament didn't end up winning it?

The frigging model is 100% correct, BearLover.   TLDR version: Bearlover doesn't understand instructions.

Bearlover, If you'd read the explanation of the model, it clearly states that:

Quote from: CHN says:These are the results of 20,000 Monte Carlo simulations of the remaining games prior to Selection Day. The winner of each game in the simulation was determined randomly, weighted by KRACH.

The simple translation of this is that, based on the relative value of the KRACH ratings, IN THE LONG RUN (when variance is removed) We are expected to win 59% of the time. I.e. we'd be expected to win 11,800 of the 20,000 times.  So, as defined, the model is accurate.  

This model can't possibly be used to tell you the likelihood of wining the tournament when the tournament is only played out once, not 20,000 times. Because, variance.

Please stop shitting on the model as wrong. It's not. It is what it is.
Hey buddy, I understand what variance is and that no model can predict what will happen in any single instance. I am saying that over 20,000 ECAC tournaments,  we would win it fewer than 11,8000 times. Sorry your attempt at being condescending missed the entire premise of my post.

You say, "as defined, the model is accurate." That's some impressively circular logic! If this predictor is, as you say, "based on the relative value of the KRACH ratings," my issue is with KRACH as a predictor of hockey games, not with the model's application of KRACH (I thought that was obvious from my initial post, but guess not). Specifically, my issue is that KRACH fails to account for the variance over a small sample size of games leading up to this point in the season. There is a very low degree of certainty that Cornell is actually the second-best team in the country. We could realistically be anywhere from 1-25. Does KRACH--and, by extension, the model--account for that?

BearLover

Quote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."

It could be wrong, of course, but the "mustness" of the feeling inverts reality.  When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong.  There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.

Trotsky

Quote from: BearLover
Quote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."

It could be wrong, of course, but the "mustness" of the feeling inverts reality.  When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong.  There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.

BearLover

Quote from: Trotsky
Quote from: BearLover
Quote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."

It could be wrong, of course, but the "mustness" of the feeling inverts reality.  When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong.  There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
The only person (erroneously) arguing this point is abmarks, who misinterpreted my argument. This model doesn't just look wrong--it looks heinously wrong. We can't know for sure until someone checks it against empirical data (past tournaments, as I suggested above). AFAIK, KRACH isn't meant to be predictive. In fact, from what I understand, it fails very badly at being predictive. No one is saying "this model is wrong because in my experience [this number] cannot be true!" People are saying, "this model looks very wrong, [this improperly weighted input] is probably why, someone please check it against empirical data to be sure."

jfeath17

I started doing some analysis on how accurate using KRACH is to predict results. I will follow up in the next post with some more details on what I did, but the basic gist was I collected the KRACH based projected winning percentage for the better ranked team and the result of the game (win/tie/loss). I then used a logistic regression to this data using the KRACH prediction as the independent variable. I used the results of 1129 games over the two previous seasons.

KRACH | Result
------+-------
 0.50 | 0.5390
 0.55 | 0.5750
 0.60 | 0.6102
 0.65 | 0.6444
 0.70 | 0.6771
 0.75 | 0.7081
 0.80 | 0.7374
 0.85 | 0.7647
 0.90 | 0.7899
 0.95 | 0.8131
 1.00 | 0.8343

jfeath17

Some more details:

I used data from the past two complete seasons. I hope to add more seasons, but the dates for games on USCHO from 3 years ago seem to be in a mix of d/m/y and m/d/y which kinda breaks things. (On that note, if anyone knows of a easily parsable database of game results that would be great since I'm currently copying the table from USCHO into excel and exporting it as a csv.)

I step through the schedule week by week and update the KRACH rating for every team. Then , I calculate the KRACH based projection of winning percentage for the upcoming week's games. I always use the higher ranked teams winning likelihood so they are all within the 0.5-1 range. I then save the result of the game (tie, higher ranked won, higher ranked loss) along with the KRACH based likelihood. I start this process at the beginning of January to ignore the early season variability of KRACH (also to avoid any of the complexities of calculating KRACH on undefeated teams). Now that I am typing this up I realize that stepping through on a weekly basis isn't really necessary and I am thinking about changing it to a day by day step.

Now I am left with two variables, KRACH Prediction Win Likelihood and the actual result of the game. I tried a couple different things here to try to find a good correlation between the two. Based on my research, I think the best way is to use a Logistic Regression and those are the results shown in the above post. I don't consider myself an expert in this stuff at all so I very well could be making some bad assumptions here. If anyone has a better method to compare them, I'm interested to hear.

If anyone has any questions or suggestions for further things to try out, I'd love to hear them.

abmarks

Quote from: BearLover
Quote from: Trotsky
Quote from: BearLover
Quote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."

It could be wrong, of course, but the "mustness" of the feeling inverts reality.  When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong.  There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
The only person (erroneously) arguing this point is abmarks, who misinterpreted my argument. This model doesn't just look wrong--it looks heinously wrong. We can't know for sure until someone checks it against empirical data (past tournaments, as I suggested above). AFAIK, KRACH isn't meant to be predictive. In fact, from what I understand, it fails very badly at being predictive. No one is saying "this model is wrong because in my experience [this number] cannot be true!" People are saying, "this model looks very wrong, [this improperly weighted input] is probably why, someone please check it against empirical data to be sure."

There will never be enough tourney results to create a dataset big enough to generate an empirical conclusion of any precision, once again, because variance.  


If you weren't so lazy, you might have read the FAQ page at CHN, which was answered by JTW (of this forum) himself.  https://www.collegehockeynews.com/info/?d=krach .  Here is part of it, with key info bolded and italicised.


Q. Can you tell us a little more?

A: Getting a bit more technical: The Bradley-Terry system is based on a statistical technique called logistic regression, in essence meaning that teams' ratings are determined directly from their won-loss records against one another. KRACH's strength of schedule component is calculated directly from the ratings themselves, which is a key point. It means that KRACH, unlike many ratings (including RPI), cannot easily be distorted by teams with strong records against weak opposition.

The ratings are on an odds scale, so if Team A's KRACH rating is three times as large as Team B's, Team A would be expected to amass a winning percentage of .750 and Team B a winning percentage of .250 if it played each other enough times. The correct ratings are defined such that the "expected" winning percentage for a team in the games it's already played is equal to its "actual" winning percentage.

Q. And so why is this so great?

A: In other words, if you took one team's schedule to date, and played a theoretical "game" for each game already actually played, using the KRACH ratings themselves in order to predict the winner, then the end result would be a theoretical won-loss percentage that matches the team's actual won-loss percentage. Pretty cool.

It is not possible to do any better than that with a completely objective method. Any other method would introduce arbitrary-ness and/or subjectivity.

Q. What are the limitations?

A: Well, KRACH can't predict the future. Nothing can. The idea behind such ratings systems is to use them in order to properly select and seed tournaments. Champions are then determined on the ice. All systems are designed to analyze past results, not necessarily predict future ones. Though, by theory, the more sound the analysis of the past, the better the ability to predict future results.

KRACH is "perfect" in its analysis of past results. But that should not be construed to mean that it definitively decides which team is better. When dealing with sample sizes like this, you never know. Team A could lose to Team B, be below them in KRACH, and then turn around and beat Team B the next three times. KRACH would then change. It does not invalidate what KRACH represented at the time, however.

BearLover

Quote from: abmarksIf you weren't so lazy, you might have read the FAQ page at CHN, which was answered by JTW (of this forum) himself.  https://www.collegehockeynews.com/info/?d=krach .  Here is part of it, with key info bolded and italicised.
You'll be happy to know I read that entire primer before my initial posts. It doesn't answer any of my questions or help your case in any way.

Quote from: abmarksThere will never be enough tourney results to create a dataset big enough to generate an empirical conclusion of any precision, once again, because variance.
There exists data from hundreds of tournaments and thousands of games that we can compare against KRACH-based predictions.

You're also being obtuse in you emboldening/italicizing of clauses from the KRACH FAQ (thanks, by the way!). Yeah, KRACH isn't meant to be predictive. And yeah, nothing can better objectively measure past results. No one cares about those things. The question at hand is whether KRACH happens to be predictive to a significant enough degree that it's worth using in models that predict outcomes of hockey games. That was always the question, not whether KRACH is a nice way of seeding for tournaments or whether this predictor misapplied KRACH. Since there is absolutely nothing, here in this thread, or included in the KRACH FAQ, or gleaned from comparing this model against other sports/hockey prediction models, to suggest that KRACH is even a remotely good predictor of future hockey game outcomes, I'm going to assume KRACH is not a good predictor of future hockey game outcomes, and that therefore this model isn't good. Happy to be proven otherwise (in hopefully a more polite manner).

Why are you in such a foul mood, anyway?

David Harding

Quote from: jfeath17Some more details:

I used data from the past two complete seasons. I hope to add more seasons, but the dates for games on USCHO from 3 years ago seem to be in a mix of d/m/y and m/d/y which kinda breaks things. (On that note, if anyone knows of a easily parsable database of game results that would be great since I'm currently copying the table from USCHO into excel and exporting it as a csv.)

I step through the schedule week by week and update the KRACH rating for every team. Then , I calculate the KRACH based projection of winning percentage for the upcoming week's games. I always use the higher ranked teams winning likelihood so they are all within the 0.5-1 range. I then save the result of the game (tie, higher ranked won, higher ranked loss) along with the KRACH based likelihood. I start this process at the beginning of January to ignore the early season variability of KRACH (also to avoid any of the complexities of calculating KRACH on undefeated teams). Now that I am typing this up I realize that stepping through on a weekly basis isn't really necessary and I am thinking about changing it to a day by day step.

Now I am left with two variables, KRACH Prediction Win Likelihood and the actual result of the game. I tried a couple different things here to try to find a good correlation between the two. Based on my research, I think the best way is to use a Logistic Regression and those are the results shown in the above post. I don't consider myself an expert in this stuff at all so I very well could be making some bad assumptions here. If anyone has a better method to compare them, I'm interested to hear.

If anyone has any questions or suggestions for further things to try out, I'd love to hear them.

Thank you for tackling the question with data.  This may be too simplistic, but how about sorting the games into 10 bins based on the predicted probability of the the favored team winning. 0.50<=x<0.55, 0.55<=x<0.60, etc.  For each bin, calculate the fraction of the games that the favored team won. Graph the actual fraction vs the calculated fraction.  You could multiply the number of games in each bin by the predicted fraction for the favored team to get the predicted number, then take the square root to get some notion of the size of the predicted error.

Beeeej

Isn't it wonderful that we're arguing so vehemently over whether the statistical models are accurate when they say we're fantastic because our record is 22-3-2?

I definitely don't recall having these arguments in 1993. "No, we must be worse than our 6-19-1 record would suggest!!"
Beeeej, Esq.

"Cornell isn't an organization.  It's a loose affiliation of independent fiefdoms united by a common hockey team."
   - Steve Worona

KenP

Is there a way to factor in the odds of a tie?  I know it is incorporated into B-T ratings... but my guess is that hockey's non-binary results are a significant source of forecast error.

Swampy

Quote from: David Harding
Quote from: jfeath17Some more details:

I used data from the past two complete seasons. I hope to add more seasons, but the dates for games on USCHO from 3 years ago seem to be in a mix of d/m/y and m/d/y which kinda breaks things. (On that note, if anyone knows of a easily parsable database of game results that would be great since I'm currently copying the table from USCHO into excel and exporting it as a csv.)

I step through the schedule week by week and update the KRACH rating for every team. Then , I calculate the KRACH based projection of winning percentage for the upcoming week's games. I always use the higher ranked teams winning likelihood so they are all within the 0.5-1 range. I then save the result of the game (tie, higher ranked won, higher ranked loss) along with the KRACH based likelihood. I start this process at the beginning of January to ignore the early season variability of KRACH (also to avoid any of the complexities of calculating KRACH on undefeated teams). Now that I am typing this up I realize that stepping through on a weekly basis isn't really necessary and I am thinking about changing it to a day by day step.

Now I am left with two variables, KRACH Prediction Win Likelihood and the actual result of the game. I tried a couple different things here to try to find a good correlation between the two. Based on my research, I think the best way is to use a Logistic Regression and those are the results shown in the above post. I don't consider myself an expert in this stuff at all so I very well could be making some bad assumptions here. If anyone has a better method to compare them, I'm interested to hear.

If anyone has any questions or suggestions for further things to try out, I'd love to hear them.

Thank you for tackling the question with data.  This may be too simplistic, but how about sorting the games into 10 bins based on the predicted probability of the the favored team winning. 0.50<=x<0.55, 0.55<=x<0.60, etc.  For each bin, calculate the fraction of the games that the favored team won. Graph the actual fraction vs the calculated fraction.  You could multiply the number of games in each bin by the predicted fraction for the favored team to get the predicted number, then take the square root to get some notion of the size of the predicted error.

Some things I'd want to add to the discussion:

1. We predict the future all the time, usually with good results. Do you want to bet the sun won't come up tomorrow? That if I take an umbrella in the rain I'll get less wet than without one? Etc.

2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?

3. Looking at Clarkson again, these things are time series. So recent performance should be weighted more heavily. Is it?

4. What about using other information besides past performance? Donato away at the Olympics? Cornell's top D-men and third-leading scorer are injured? Surely a pro from Las Vegas would use such information to handicap a team.

Tom Lento

Quote from: BearLover
Quote from: Trotsky
Quote from: BearLover
Quote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."

It could be wrong, of course, but the "mustness" of the feeling inverts reality.  When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong.  There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
The only person (erroneously) arguing this point is abmarks, who misinterpreted my argument. This model doesn't just look wrong--it looks heinously wrong. We can't know for sure until someone checks it against empirical data (past tournaments, as I suggested above). AFAIK, KRACH isn't meant to be predictive. In fact, from what I understand, it fails very badly at being predictive. No one is saying "this model is wrong because in my experience [this number] cannot be true!" People are saying, "this model looks very wrong, [this improperly weighted input] is probably why, someone please check it against empirical data to be sure."

BearLover's complaints about the predictive model are valid, at least in terms of modeling decisions. The model in CHN (and playoffstatus, which is not meaningfully different) is based on a number of fairly strong assumptions, some of which are hidden, and that bears examining.

If it's meant to be an exercise in generating a valid distribution of outcomes without regard to predictive accuracy of empirical results, perhaps with the aim of starting conversation and giving fans something to gas about on weekdays, there's absolutely nothing wrong with it. If it's meant to be an effective predictor of tournament qualification and end of season results, it is questionable. I don't have the empirical data to know if it's *right* and I don't feel like doing that analysis, but the odds don't pass the sniff test. At this point, the predictions are so confident at the top end of the distribution that someone really would need to produce empirical outcomes showing the model's effectiveness before I'd believe those numbers. This doesn't mean I don't understand probability. It means I do understand modeling choices.

Failing to update KRACH along the way is a modeling choice everybody has talked about - that's actually a bad one because you're basically starting with a strong prior and failing to update it in any way. I understand the computational cost issues in play here but there's got to be a way to do that efficiently. If not, you could use something that has a lot of KRACH's desirable properties without the computational complexity (maybe Elo is better for this?) and then see how the models compare. That'll at least get you some sense of how much impact this decision has on your distributions.

Personally, I suspect the bigger problem with the predictive model CHN is using is it treats KRACH as 100% accurate - all of the variance between KRACH's predictions and actual empirical results are missing from the model itself. Excluding that input makes the model over-confident with respect to empirical predictors, and I think that's something you could address. At the very least you can externally validate the underlying assumption. There's plenty of data at this point - you can actually just do single point KRACH predictions and compare that with the distribution of empirical outcomes. There's no need to restrict to tournament games. If there's a lot of divergence (and early results in this thread suggest that this is, in fact, the case) the KRACH based monte carlo predictors will do a pretty bad job of predicting empirical reality.

One way to think about this point is to compare it to models of the presidential election (NO POLITICS - this is about modeling decisions). In 2016 a lot of models had these hugely, almost impossibly confident predictions of a Clinton victory. Several reputable polling-based predictive models had less than a 5% chance of Trump winning, while 538 had a 10% chance of Trump winning while losing the popular vote. If you look at the recaps one key reason was because those models took a fairly naive approach to modeling empirical error in polling predictors. Specifically, they failed to account for correlated polling errors across states with similar demographic characteristics. 538 took some (IMHO valid) criticism that their adjustments were being too strongly applied or that they weren't accounting for the error terms in those estimates, but in an empirical model I think that's a better class of mistake to make than just saying "this thing that happens every election doesn't happen in my model because I said so."

Back to hockey, one could imagine the same thing happening with KRACH. If KRACH systematically over-states the odds that highly rated Team A will beat lower ranked Team B, you'll get over-confident predictions for any team with a sufficiently strong record relative to its competition. Assuming jfeath17's data is correct, that is precisely what appears to be happening. This is not a flaw in KRACH, necessarily, because KRACH is meant to provide a ranked set rather than absolute determination of odds of victory. However, when using KRACH as a forward looking predictor you really do need to adjust for that variance if your model is to be empirically accurate.

This doesn't mean the pure KRACH model is useless - it's interesting, and it gives some baseline for discussion and adaptation, and it lets us talk about hockey (and math) on a Tuesday.

abmarks

OK Bearlover, what you are really arguing then is that KRACH itself is worthless for any use, and you are making a completely specious argument based on your intuition, not any actual examination of data or the methods used in the CHN model about likelihood of winning the Conference tournament.

So I'll make this objective with data.  I'm curious (genuinely, not snidely) whether you agree with what KRACH says for comparisons against particular opponents?   I ask, because if the individual comparisons are correct, then that 59% number is correct, since it's just math at that point.

We'll use standings as of today and assume all seeds hold so as to simplify this model.  We'd have the following matchups:

Quarters:     (For simplicity, let's call this one a single game, not best of three.)
We play #8 Yale.  Yale's KRACH is 98.3. Our KRACH is 512.9.
-This means that we are beating Yale 84% of the time.  (Not sure how to do the math, but this should also imply that our chances of winning a best of three are even higher?)

Semis
We play #4 Harvard.  HVD KRACH is 136.9. Our KRACH is 512.9.
-This means that we are beating HVD  79% of the time.

Championship
Champ  #2 Union       Union KRACH is 142.5. Our KRACH is 512.9.
-This means that we are beating Union 78% of the time.

===> Odds of winning tournament are 52%.   (84% x 79% x 78%)

This is essentially what the model did that got 59%.  (The difference is due to possibilities of upsets and that they ran monte carlo, not a single calculation. )

Bearlover, the question is- do you disagree with the individual KRACH comparisons?
-A

p.s. someone correct me if I got the KRACH math wrong.