2018 ECAC Permutations

adamw · February 20, 2018, 04:45:11 PM

Quote from: Tom LentoBack to hockey, one could imagine the same thing happening with KRACH. If KRACH systematically over-states the odds that highly rated Team A will beat lower ranked Team B, you'll get over-confident predictions for any team with a sufficiently strong record relative to its competition. Assuming jfeath17's data is correct, that is precisely what appears to be happening. This is not a flaw in KRACH, necessarily, because KRACH is meant to provide a ranked set rather than absolute determination of odds of victory. However, when using KRACH as a forward looking predictor you really do need to adjust for that variance if your model is to be empirically accurate.

This is where I'll again repeat that - if you have a better model - please feel free to share.

Jim Hyla · February 20, 2018, 05:16:18 PM

Quote from: Tom LentoOne way to think about this point is to compare it to models of the presidential election (NO POLITICS - this is about modeling decisions). In 2016 a lot of models had these hugely, almost impossibly confident predictions of a Clinton victory. Several reputable polling-based predictive models had less than a 5% chance of Trump winning, while 538 had a 10% chance of Trump winning while losing the popular vote. If you look at the recaps one key reason was because those models took a fairly naive approach to modeling empirical error in polling predictors. Specifically, they failed to account for correlated polling errors across states with similar demographic characteristics. 538 took some (IMHO valid) criticism that their adjustments were being too strongly applied or that they weren't accounting for the error terms in those estimates, but in an empirical model I think that's a better class of mistake to make than just saying "this thing that happens every election doesn't happen in my model because I said so."

538's last polls only forecast was 29% Trump. And they admit that their problem was not enough late state polling. So they couldn't have seen the state results that gave him the Electoral College.

So if you don't have the data, you can't get accuracy.

"Getting back to hockey", you have the same problem. Does anyone really think you can input the data on injuries, players in the Olympics, etc.?

This whole discussion is "worthless" unless someone can put up a better way.

We all can come up with problems with the "science", but unless someone is willing to put their money where their mouth is(or fingers are), we can carry on the discussion forever without anything changing.

Finally, it's interesting all the discussions that happen once we start winning again.

For that I'm happy.

Tom Lento · February 20, 2018, 05:18:26 PM

Quote from: adamw
Quote from: Tom LentoBack to hockey, one could imagine the same thing happening with KRACH. If KRACH systematically over-states the odds that highly rated Team A will beat lower ranked Team B, you'll get over-confident predictions for any team with a sufficiently strong record relative to its competition. Assuming jfeath17's data is correct, that is precisely what appears to be happening. This is not a flaw in KRACH, necessarily, because KRACH is meant to provide a ranked set rather than absolute determination of odds of victory. However, when using KRACH as a forward looking predictor you really do need to adjust for that variance if your model is to be empirically accurate.

This is where I'll again repeat that - if you have a better model - please feel free to share.

I don't have a better model handy because making a better model requires a lot of effort and I'm not currently unemployed (or employed in a place where I get paid to do this kind of thing). If I ever take a few months off work trying to build something like this would be super fun, although as a follower of the game trying to make the advanced stats more useful for me is probably what I'd do first.

That said, you can account for the variance against empirical reality by measuring it, adding uncertainty to the model (perhaps via weighting KRACH-predicted outcomes), and backtesting to validate.

More generally, though, I think you can start simpler by seeing how far off of empirical reality the model predictions have been. If you're 95% accurate, why bother? If you're way off, how much does the model improve by adjusting each individual assumption? Does adding the error variance into the simulation help? Or is the main issue the lack of KRACH updating? Or should you adjust KRACh weight by other factors (corsi, PDO, whatever)?

BearLover · February 20, 2018, 05:20:20 PM

Quote from: abmarksOK Bearlover, what you are really arguing then is that KRACH itself is worthless for any use, and you are making a completely specious argument based on your intuition, not any actual examination of data or the methods used in the CHN model about likelihood of winning the Conference tournament.

Huh? KRACH is the best tool we have for ranking/seeding teams. It's just a poor predictive tool. Tom Lento explained this better than I could have, so please refer to his post. To say KRACH is not a good predictive tool because it does not account for the (significant) natural variance leading up to a specified point in a hockey season (which it shouldn't/can't, because it's not meant to be predictive) is not "specious" and is based on an "examination of the methods used in the CHN model." I'm not sure at this point what is confusing about what I am saying, to be totally honest. So, no, those underlying predictive numbers you cited are not correct, because they're taking as certain the outputs of a model that ranks teams in a very random sport based on 25-ish very random events. We are not 80% to bear Harvard and Union.

abmarks · February 20, 2018, 07:19:41 PM

Quote from: SwampySome things I'd want to add to the discussion:

2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?

Let's say we know that in the long run, A beats B 75% of the time. So, over 100 games, A wins 75.

What the P(A winning) does NOT tell you is which of those 100 games A wins. A could go 0-10, then 75-5, then 0-10 over the course of those 100.

Taking that back to the topic at hand, short term results (ie the 1 game result in a tournament) are going to vary a lot vs. the long-term percentage.

adamw · February 21, 2018, 12:13:56 PM

Quote from: Tom LentoThat said, you can account for the variance against empirical reality by measuring it, adding uncertainty to the model (perhaps via weighting KRACH-predicted outcomes), and backtesting to validate.

More generally, though, I think you can start simpler by seeing how far off of empirical reality the model predictions have been. If you're 95% accurate, why bother? If you're way off, how much does the model improve by adjusting each individual assumption? Does adding the error variance into the simulation help? Or is the main issue the lack of KRACH updating? Or should you adjust KRACh weight by other factors (corsi, PDO, whatever)?

If I had any idea how to do this correctly, it would already have been done.

upprdeck · February 21, 2018, 12:32:13 PM

if you can improve the model lets get it working for horse racing as that would provide you the time to tweak the hockey model once we get rich..

Tom Lento · February 21, 2018, 04:11:40 PM

Quote from: Jim Hyla
Quote from: Tom LentoOne way to think about this point is to compare it to models of the presidential election (NO POLITICS - this is about modeling decisions). In 2016 a lot of models had these hugely, almost impossibly confident predictions of a Clinton victory. Several reputable polling-based predictive models had less than a 5% chance of Trump winning, while 538 had a 10% chance of Trump winning while losing the popular vote. If you look at the recaps one key reason was because those models took a fairly naive approach to modeling empirical error in polling predictors. Specifically, they failed to account for correlated polling errors across states with similar demographic characteristics. 538 took some (IMHO valid) criticism that their adjustments were being too strongly applied or that they weren't accounting for the error terms in those estimates, but in an empirical model I think that's a better class of mistake to make than just saying "this thing that happens every election doesn't happen in my model because I said so."

538's last polls only forecast was 29% Trump. And they admit that their problem was not enough late state polling. So they couldn't have seen the state results that gave him the Electoral College.

So if you don't have the data, you can't get accuracy.

"Getting back to hockey", you have the same problem. Does anyone really think you can input the data on injuries, players in the Olympics, etc.?

This whole discussion is "worthless" unless someone can put up a better way.

We all can come up with problems with the "science", but unless someone is willing to put their money where their mouth is(or fingers are), we can carry on the discussion forever without anything changing.

Finally, it's interesting all the discussions that happen once we start winning again.

For that I'm happy.

Note all of the caveats in my statement about 538s prediction. Their model gave Trump a 10% chance of winning the election while losing the popular vote, and a 29% chance of winning overall. The Princeton Election Consortium gave Trump a 2% chance of winning the election at all at one point late in the race.

The way 538 approaches these problems is to tune a model based on parameters that explain variance from empirical reality, and then to back-test that model against the actual results. If you look at their CARMElo ratings for the NBA they basically incorporate factors which might contribute to fatigue and injury (travel, back to back games) in ways that empirically affect performance, without worrying as much about whether or not Steph Curry the individual player will get hurt.

The same thing applies in hockey. The fact that the variance in single game outcomes is larger in hockey than basketball makes the problem harder, of course.

I can't really tell Adam exactly how to do this - I'm not that familiar with monte carlo and I don't have enough direct experience in this domain to do more than provide vague suggestions, and unless I take the time to get my hands dirty with it I won't be able to speak intelligently about which approaches to consider and which to discard.

Tom Lento · February 21, 2018, 04:48:54 PM

Quote from: adamw
Quote from: Tom LentoThat said, you can account for the variance against empirical reality by measuring it, adding uncertainty to the model (perhaps via weighting KRACH-predicted outcomes), and backtesting to validate.

More generally, though, I think you can start simpler by seeing how far off of empirical reality the model predictions have been. If you're 95% accurate, why bother? If you're way off, how much does the model improve by adjusting each individual assumption? Does adding the error variance into the simulation help? Or is the main issue the lack of KRACH updating? Or should you adjust KRACh weight by other factors (corsi, PDO, whatever)?

If I had any idea how to do this correctly, it would already have been done.

Yeah, that's the hard part. There's a ton of literature on evaluating predictive models, but I haven't done anything even adjacent to this field for years so I wouldn't know what to recommend as an intro. When my work involved statistical modeling it wasn't in these domains anyway, so I don't have answers off the top of my head either.

Just to be clear, I like the models, and it's fun (for me, at least) to think of alternatives. If I stumble across a relevant approach for you I'll pass it along. Thanks for putting them up for us!

Jim Hyla · February 21, 2018, 06:13:12 PM

The hard part is everyone complaining about what is out there, including Adam's, and no one has an answer, or is willing to help.

I think it's a lot of fun to look at these models, but I don't have a clue about what to do (easily) to improve them, so I enjoy what's there and keep my mouth shut about complaining they aren't good enough.

It seems we go through this every spring. At least every spring where it means something to us and the post-season.

Adam takes a lot of crap for no good reason, he's trying a lot harder than many others.

Now if he could only fix the app on my iPhone, so it wouldn't screw up so often, that would be nice.........:-D::bolt::

BearLover · February 21, 2018, 07:32:36 PM

Quote from: Jim HylaThe hard part is everyone complaining about what is out there, including Adam's, and no one has an answer, or is willing to help.

I think it's a lot of fun to look at these models, but I don't have a clue about what to do (easily) to improve them, so I enjoy what's there and keep my mouth shut about complaining they aren't good enough.

It seems we go through this every spring. At least every spring where it means something to us and the post-season.

Adam takes a lot of crap for no good reason, he's trying a lot harder than many others.

Now if he could only fix the app on my iPhone, so it wouldn't screw up so often, that would be nice.........:-D::bolt::

I have nothing against Adam and I love CHN. But that doesn't mean we should be quiet about predictions that are based on flawed assumptions. I also think it's better to have no prediction model at all than to have one that is based on flawed assumptions. Coverage of the 2016 election would have been vastly improved had flawed models like HuffPost's not existed. America would have known that for almost the entirety of the race Hillary was only a slight favorite, that the electoral college favored Trump, that Comey's letter very likely cost Clinton the election. Instead, the media, in part because of models like HuffPost's and others', covered Hillary's victory as a foregone conclusion. Obviously the stakes aren't as high here, but no one is helped by a model that wrongly portrays Cornell's odds against Union as 80%, or its odds of winning the ECAC as 60%.

Swampy · February 22, 2018, 10:23:29 AM

Quote from: abmarks
Quote from: SwampySome things I'd want to add to the discussion:

2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?

Let's say we know that in the long run, A beats B 75% of the time. So, over 100 games, A wins 75.

What the P(A winning) does NOT tell you is which of those 100 games A wins. A could go 0-10, then 75-5, then 0-10 over the course of those 100.

Taking that back to the topic at hand, short term results (ie the 1 game result in a tournament) are going to vary a lot vs. the long-term percentage.

I understand this but was talking about variance in several other senses. I'll explain them here. WARNING: THE FOLLOWING IS QUITE WONKISH.

Assumptions

Assume two teams, Team C and Team H, belong to a 12-team league in which teams play each other twice during the season. So each team plays 22 league games. Also assume teams earn 0 points in the league standings for a loss, 1 for a tie, and 2 for a win.

Estimation Variance

For the moment, ignore ties. Any data-based estimate of a team's chances of winning a game can be thought of as a function. If p_C is the probability Team C wins a game, then let p_C be the estimate of that probability. So that:

(1) p_C = f(data)

In other words, the estimated probability is a function of whatever data are used in the estimate. When we say "data," this includes the number of data points (sample size) used to make the estimate.

Now, if we know the mathematical properties of f() we may be able to derive, mathematically, an expression for the variance of p_C, var(p_C). Call this the estimation variance, a measure if the estimate's the precision.

If we do not know the estimating function's mathematical properties, we still may be able to estimate var(p_C) using simulation and resampling techniques.

Game and Game-Series Estimates

Think of a single game as an experiment with two possible outcomes: "success" and "failure." For simplicity, assume we actually know the real probability of each, so we don't have to use estimates like (1). To think about this, just consider Team C for now.

Let:

p = probability Team C wins
q = probability Team C loses = 1 - p

Furthermore, to convert the results into a number, define a random variable, X = 1 for a win and 0 for a loss. This is well known as a Bernoulli Trial, and X has a Bernoulli Distribution. The variance of X is given by:

(2) var(X) = pq

In the present context, call this "game variance" since it is the variance related to the outcome of a single game.

We can also think of a "series variance", which is the variance associated with a team winning a series. To simplify the math, let's disregard the fact that some series end after a team has won the majority of games in the series (e.g., 2 out of 3), and just think of the number of wins in a series. Define a second random variable, Y_n as the number of wins in a series of n games. If each game has the same probabilities of its outcome, then Y_n is the sum of n X's. In other words, it has a binomial distribution, the variance of which equals:

(3) var(Y_n) = np(1-p)

In both (2) and (3) the variance depends on the value of p. If p = 0, the variance is 0, and similarly for p = 1. The variance is at its maximum, 0.25, when p = 0.5.

It's important to note here that the variance depends on the underlying, real probabilities and is not a matter of estimation.

Comments on jfeath17's chart

[list=1]

The chart shows a relation between the Krach and actual game outcomes. Because of the properties of Bernoulli and Binomial distributions, the variance necessarily decreases as p moves away from 0.5 and closer to 1.0. So we would expect better predictions to the right of the graph. But the graph is almost a straight line up to about p = 0.85 and then drops off slightly. Maybe this is due to a weakness in the Krach, which is not intended to predict outcomes. Or maybe that's why they play the game.
The chart would be improved with confidence bands, which are sensitive to variance and graphically show how confident one should be about the fitted line.

Notice though that confidence intervals plotted around a curve like this, which is based on empirical data, themselves estimated from other data (as in Equation 1), have two sources of variance: Estimation Variance and Game variance.

Perfomance Variance

In addition to the above, we should consider the variance of a given team's performance. Some teams are reliable; others are erratic. This can be best explained with an example.

Suppose every one of the 10 "other" teams always scores exactly 3 goals in every game. Then if O is the number of goals one of these "other" teams scores, the expected number of goals is 3 (E
= 3[/i]), and the variance is zero (var
= 0[/i]).

Similarly, assume Team C always scores 4 goals when it plays. Then if C is a random variable equal to the number of goals Team C scores, E[C] = 4 and var[C] = 0.

We can see right away that over the season Team C will always win over the ten "other" teams, so just from playing them it will accumulate 40 points (10 teams, 2 games per team, 2 points per win).

But now consider Team H, which is more erratic. Let H be the number of goals it scores in any given game. Like Team C let Team H's expected number of goals be 4: E[H] = 4. But unlike Team C, var[H] will not be zero.

Instead, suppose H has the following probability mass distribution: P[H = 2] = 0.10, P[H = 3] = 0.15, P[H = 4] = 0.50, P[H = 5] = 0.15, and P[H = 6] = 0.10. So here we can see different results when Team H plays its 20 games against the 10 "other" teams: the expected number of losses is 2, the expected number of ties is 3, and the expected number of wins is 15. So when Team H plays the other teams, the expected number of points is only 33, unlike Team C's 40!

What about when Team C and Team H play each other? Even though both have the same expected number of goals, the variance of Team H means it will be expected to lose to Team C 25% of the time, tie Team C 50% of the time, and beat Team C 25% of the time. In each of their 2 games against each other during the regular season, 2 points is at stake. So Team H can expect 0.5 points from a tie (1 point x 0.5 probability) and 0.5 points from a win (2 points x 0.25 probability), or 1 point in total. Similarly for Team C. With both teams playing each other twice during the season, each expects to get 2 points. This makes sense, because they're evenly matched.

But in terms of total points in the league, Team C expects to have 42 points at season's end, but Team H expects only 35 points. Which is how things should be, because Team H sucks.

Notice here that the only difference between the two teams is their respective variances, but it makes a big difference. If we look more closely at games against the 10 "other" teams, we are much more confident that Team C will beat them, whereas we expect Team H to lose to some of them. This is why performance variance is also important in thinking about which teams are likely to win particular games. Again, here there's no estimation issue. We know what the probabilities really are, yet variance affects the outcome.

Technical Suggestion

Jfeath17 asked for suggestions regarding the graphical analysis. For this kind of work I highly recommend the R Project's free, open-source statistical software used in conjunction with the RStudio GUI interface. It would allow easy addition of things like confidence bands in the probability plots, weighting of recent time-series data, etc.

nshapiro · February 22, 2018, 11:39:41 AM

H sucks...sorry wrote this before I read your whole post

Trotsky · February 22, 2018, 11:54:06 AM

Quote from: nshapiroH sucks..

I believe you are referring to Team A.

Jeff Hopkins '82 · February 22, 2018, 11:57:17 AM

Quote from: Trotsky
Quote from: nshapiroH sucks..
I believe you are referring to Team A.

The party of the first part, hereafter referred to as the "Party of the First Part"...

No, I don't like that part.

Which part?

The first part.