Here we go again, folks. First the short version:
Possible ECAC tournament seedings (the number in parentheses is the best
seed the team can get with no help):
Cornell 1-2 (1)
Union 1-3 (2)
Clarkson 2-4 (3)
Harvard 3-4 (4)
Dartmouth 5-9 (5)
Colgate 5-9 (6)
Yale 5-9 (7)
Princeton 5-9 (7)
Quinnipiac 5-9 (9)
Brown 10-11 (10)
Rensselaer 10-12 (11)
St. Lawrence 11-12 (12)
...and now the grotesquely long version:
Once again, it's time for the ECAC Playoff Permutations! Oddly enough,
this year the teams have slotted themselves nicely into three tiers; there
are four teams that can finish in first through fourth, five that can
finish from fifth through ninth, and three that can finish from tenth
through twelfth. If you love drama, the final ECAC weekend is a little
short of if this year, although it makes things a lot easier to figure out.
Going into the final weekend of league play, here's a breakdown of where
each team in the ECAC could finish. As always, I'm greatly indebted to
John Whelan's excellent playoff possibilities script at http://www.elynah.com/tbrw/2018/ecac.cgiframe.shtml
For each ECAC team, I've listed the following:
THIS WEEKEND: The team's weekend games, its last two of the season.
ON THEIR OWN: The highest the team could finish with no help from the
competition. Generally, this involves a weekend sweep.
BEST CASE: The highest the team could finish if everything goes right.
WORST CASE: The lowest the team could finish if everything goes wrong.
This generally involves getting swept while teams nearby in the
standings win.
TIEBREAKERS: How the team would fare if they finished the season tied with
some other team which is currently close (i.e. within 4 points) in the
standings. Note that there may be cases in which Team A "could win or
lose" the tiebreaker against Team B, depending on whether there are
more than just those two teams tied. For instance, Colgate wins the
head-to-head tiebreaker against Princeton with a 1-0-1 record; however,
in a four-way tie involving these two, Yale, and Dartmouth, Colgate
would actually be seeded lower than Princeton. If a listed tiebreaker
result depends on more than just those two teams being tied, it is
marked with an asterisk:
Colgate could win or lose* against Princeton
For two or more teams tied in the standings, the ECAC tiebreakers are:
1. Head-to-head record in ECAC games (non-conference meetings, such as in
tournaments, do not count).
2. League wins.
3. Record against the top four teams in the conference.
4. Record against the top eight teams in the conference.
5. Goal differential (net goals) head-to-head.
6. Goal differential against the top four teams in the conference.
7. Goal differential against the top eight teams in the conference.
Note that if the tie is among three or more teams, the tiebreaking steps are
used in order until a team, or multiple teams, is/are separated from the
"pack". Once that happens, the process starts all over to break the remaining
ties. For example, when the above steps are applied to a four-way tie, once
one team is separated out leaving a three-way tie, the procedure goes back to
the first step with the three remaining tied teams.
Without further ado, here's how the final weekend looks:
Cornell:
THIS WEEKEND: At Rensselaer, at Union.
ON THEIR OWN: Wraps up first place with one point on the weekend.
BEST CASE: First.
WORST CASE: Will finish second with two losses if Union also beats
Colgate.
TIEBREAKERS: Loses to Union.
Union:
THIS WEEKEND: Colgate, Cornell.
ON THEIR OWN: One point will clinch second place.
BEST CASE: Finishes first with a sweep if Cornell also loses to
Rensselaer.
WORST CASE: Drops to third if they lose twice and Clarkson sweeps.
TIEBREAKERS: Beats Cornell; loses to Clarkson.
Clarkson:
THIS WEEKEND: Princeton, Quinnipiac.
ON THEIR OWN: Clinches third with a pair of wins.
BEST CASE: Climbs to second with two wins if Union loses twice.
WORST CASE: Falls to fourth with two losses if Harvard gets at least
one point.
TIEBREAKERS: Beats Union; loses to Harvard.
Harvard:
THIS WEEKEND: At Brown, at Yale.
ON THEIR OWN: Has already wrapped up fourth place and can do no
better without help.
BEST CASE: Takes third with a sweep if Clarkson gets no more than three
points.
WORST CASE: Fourth.
TIEBREAKERS: Beats Clarkson and Dartmouth.
Dartmouth:
THIS WEEKEND: At Yale, at Brown.
ON THEIR OWN: Will guarantee fifth place with a sweep.
BEST CASE: Fifth.
WORST CASE: Would slide to ninth with two losses if Quinnipiac wins
twice and Colgate and Princeton each get at least two points.
TIEBREAKERS: Beats Quinnipiac; loses to Princeton; could win or lose
against Colgate and Yale.
Colgate:
THIS WEEKEND: At Union, at Rensselaer.
ON THEIR OWN: A sweep wraps up sixth place.
BEST CASE: Rises to fifth with two wins if Dartmouth does not sweep.
WORST CASE: Would finish ninth with two losses if Yale gets at least
one point, Princeton gets at least two points, and Quinnipiac gets at
least three points.
TIEBREAKERS: Loses to Yale; could win or lose against Dartmouth and
Quinnipiac; could win or lose* against Princeton.
Yale:
THIS WEEKEND: Dartmouth, Harvard.
ON THEIR OWN: Clinches seventh with a pair of wins.
BEST CASE: Climbs to fifth if they win twice, Dartmouth does not beat
Brown, and Colgate does not sweep.
WORST CASE: Falls to ninth if they lose twice, Princeton does not get
swept, and Quinnipiac gets at least two points.
TIEBREAKERS: Beats Colgate and Quinnipiac; could win or lose against
Dartmouth and Princeton.
Princeton:
THIS WEEKEND: At Clarkson, at St. Lawrence.
ON THEIR OWN: Guarantees seventh with two wins.
BEST CASE: Would finish fifth with a sweep if Colgate gets no more
than two points and the Dartmouth-Yale winner loses its other game
(or if they tie, Dartmouth does not win its other game).
WORST CASE: Slides to ninth if they lose twice and Quinnipiac gets
at least two points.
TIEBREAKERS: Beats Dartmouth; could win or lose against Yale and
Quinnipiac; could win* or lose against Colgate.
Quinnipiac:
THIS WEEKEND: At St. Lawrence, at Clarkson.
ON THEIR OWN: Has clinched ninth and can do no better without help.
BEST CASE: Takes fifth with a sweep if Dartmouth loses twice, Colgate
gets no more than one point, Princeton gets no more than two points,
and Yale loses to Harvard.
WORST CASE: Ninth.
TIEBREAKERS: Loses to Dartmouth and Yale; could win or lose against
Colgate and Princeton.
Brown:
THIS WEEKEND: Harvard, Dartmouth.
ON THEIR OWN: Guarantees tenth with a three-point weekend.
BEST CASE: Tenth.
WORST CASE: Drops to eleventh if they lose twice and Rensselaer
gets at least two points.
TIEBREAKERS: Beats Rensselaer and St. Lawrence.
Rensselaer:
THIS WEEKEND: Cornell, Colgate.
ON THEIR OWN: One point would give the Engineers eleventh place.
BEST CASE: Gets tenth with a sweep if Brown gets no more than two
points.
WORST CASE: Would finish twelfth if they lose twice and St. Lawrence
wins twice.
TIEBREAKERS: Beats St. Lawrence; loses to Brown.
St. Lawrence:
THIS WEEKEND: Quinnipiac, Princeton.
ON THEIR OWN: Can do no better than twelfth without help.
BEST CASE: Finishes eleventh with a sweep if Rensselaer loses twice.
WORST CASE: Twelfth.
TIEBREAKERS: Loses to Brown and Rensselaer.
Quote from: Give My Regards...and now the grotesquely long version:
Wow, impressed and thankful, but how long did it take you?
Not strictly ECAC related, but this seems to confirm we're in the NCAA's regardless of what happens the rest of the way:
https://twitter.com/chnews/status/965582957078556672
Quote from: scoop85Not strictly ECAC related, but this seems to confirm we're in the NCAA's regardless of what happens the rest of the way:
https://twitter.com/chnews/status/965582957078556672
. Also shows 91% likelihood of being a #1 seed.
but this doesn show what happens if we do lose the next 4. just the likelihood of that are low.. even 2-4 probably keeps us high.
Quote from: upprdeckbut this doesn show what happens if we do lose the next 4. just the likelihood of that are low.. even 2-4 probably keeps us high.
This is true. As well, there's reason to believe that the CHN simulations don't really capture the low-probability outcomes as well as they should (which is to say it's all a little bit overconfident). Because KRACH ratings are assumed to be (1) constant, (2) unbiased, and (3) not-uncertain, there is no way to account for recent trends (like Clarkson tanking, for example), variations in luck, and inherent uncertainty/variability of a team's strength. Of course, these assumptions simplify this exercise to a point where it's easily understood.
Quote from: imafrshmn... there's reason to believe that the CHN simulations don't really capture the low-probability outcomes as well as they should (which is to say it's all a little bit overconfident). Because KRACH ratings are assumed to be (1) constant, (2) unbiased, and (3) not-uncertain, there is no way to account for recent trends (like Clarkson tanking, for example), variations in luck, and inherent uncertainty/variability of a team's strength. Of course, these assumptions simplify this exercise to a point where it's easily understood.
repeating: anyone wants to help write a better algorithm, be my guest.
All of the above is true. 1. can be solved easily, but requires a lot of additional computing time - way too much to be worth it. 3. there's a lot packed there. Some would be good to adjust for, others have more dubious value
wouldnt it require the ability to run the 20K simulations with the 1 know result?
Quote from: upprdeckwouldnt it require the ability to run the 20K simulations with the 1 know result?
You could do it where the KRACH gets re-computed after each 'day' in the simulation ... but that would require like 20 KRACH computations per simulation - multiplied by 20,000
you mean like the power of mining for crypto currancy
if we have to finish #2, I'd like it to be behind St. Cloud or Mankato. Extra likelihood of us being in Allentown.
Quote from: adamwQuote from: imafrshmn... there's reason to believe that the CHN simulations don't really capture the low-probability outcomes as well as they should (which is to say it's all a little bit overconfident). Because KRACH ratings are assumed to be (1) constant, (2) unbiased, and (3) not-uncertain, there is no way to account for recent trends (like Clarkson tanking, for example), variations in luck, and inherent uncertainty/variability of a team's strength. Of course, these assumptions simplify this exercise to a point where it's easily understood.
repeating: anyone wants to help write a better algorithm, be my guest.
All of the above is true. 1. can be solved easily, but requires a lot of additional computing time - way too much to be worth it. 3. there's a lot packed there. Some would be good to adjust for, others have more dubious value
adamw, I wish I had the mathematical or computing background to help improve the algorithm. I can only say that the 59% chance the model gives us of winning the ECAC tournament has to be wrong. 40% would already be pushing it. Someone want to go back and check how often a team whom this model gave a "60% chance" of winning an 8-team tournament didn't end up winning it?
Quote from: BearLoverQuote from: adamwQuote from: imafrshmn... there's reason to believe that the CHN simulations don't really capture the low-probability outcomes as well as they should (which is to say it's all a little bit overconfident). Because KRACH ratings are assumed to be (1) constant, (2) unbiased, and (3) not-uncertain, there is no way to account for recent trends (like Clarkson tanking, for example), variations in luck, and inherent uncertainty/variability of a team's strength. Of course, these assumptions simplify this exercise to a point where it's easily understood.
repeating: anyone wants to help write a better algorithm, be my guest.
All of the above is true. 1. can be solved easily, but requires a lot of additional computing time - way too much to be worth it. 3. there's a lot packed there. Some would be good to adjust for, others have more dubious value
adamw, I wish I had the mathematical or computing background to help improve the algorithm. I can only say that the 59% chance the model gives us of winning the ECAC tournament has to be wrong. 40% would already be pushing it. Someone want to go back and check how often a team whom this model gave a "60% chance" of winning an 8-team tournament didn't end up winning it?
The frigging model is 100% correct, BearLover. TLDR version: Bearlover doesn't understand instructions.
Bearlover, If you'd read the explanation of the model, it clearly states that:
Quote from: CHN says:These are the results of 20,000 Monte Carlo simulations of the remaining games prior to Selection Day. The winner of each game in the simulation was determined randomly, weighted by KRACH.
The simple translation of this is that, based on the relative value of the KRACH ratings, IN THE LONG RUN (when variance is removed) We are expected to win 59% of the time. I.e. we'd be expected to win 11,800 of the 20,000 times. So, as defined, the model is accurate.
This model can't possibly be used to tell you the likelihood of wining the tournament when the tournament is only played out once, not 20,000 times. Because, variance.
Please stop shitting on the model as wrong. It's not. It is what it is.
I think it's easy to believe that we are worse than our record, given that we've slowed down a little bit as of late, we're beaten up, and that all year, we seem to win a lot of close games.
But KRACH goes by record, not recent record or margin of victory. And our record is really, really, REALLY good.
I'd expect the 2nd best team in the country to win its tournament often, especially given that the next best teams are 8th and 21st. And KRACH says we're #2.
Are we actually #2? I have no idea. I haven't been watching other teams all that often. I have to figure that the extreme difference between conferences is at least a little bit flukey, and that the ECAC isn't as far behind the others as the results have said. But that really works in our favor, not against.
We've given up 38 goals in 27 games. That's absurd. Maybe we aren't as good as our performance, but our performance has been fantastic.
It is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."
It could be wrong, of course, but the "mustness" of the feeling inverts reality. When a well-developed algorithm conflicts with your assessment of likelihood it's most likely indicating that your brain is wrong. There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Quote from: abmarksQuote from: BearLoverQuote from: adamwQuote from: imafrshmn... there's reason to believe that the CHN simulations don't really capture the low-probability outcomes as well as they should (which is to say it's all a little bit overconfident). Because KRACH ratings are assumed to be (1) constant, (2) unbiased, and (3) not-uncertain, there is no way to account for recent trends (like Clarkson tanking, for example), variations in luck, and inherent uncertainty/variability of a team's strength. Of course, these assumptions simplify this exercise to a point where it's easily understood.
repeating: anyone wants to help write a better algorithm, be my guest.
All of the above is true. 1. can be solved easily, but requires a lot of additional computing time - way too much to be worth it. 3. there's a lot packed there. Some would be good to adjust for, others have more dubious value
adamw, I wish I had the mathematical or computing background to help improve the algorithm. I can only say that the 59% chance the model gives us of winning the ECAC tournament has to be wrong. 40% would already be pushing it. Someone want to go back and check how often a team whom this model gave a "60% chance" of winning an 8-team tournament didn't end up winning it?
The frigging model is 100% correct, BearLover. TLDR version: Bearlover doesn't understand instructions.
Bearlover, If you'd read the explanation of the model, it clearly states that:
Quote from: CHN says:These are the results of 20,000 Monte Carlo simulations of the remaining games prior to Selection Day. The winner of each game in the simulation was determined randomly, weighted by KRACH.
The simple translation of this is that, based on the relative value of the KRACH ratings, IN THE LONG RUN (when variance is removed) We are expected to win 59% of the time. I.e. we'd be expected to win 11,800 of the 20,000 times. So, as defined, the model is accurate.
This model can't possibly be used to tell you the likelihood of wining the tournament when the tournament is only played out once, not 20,000 times. Because, variance.
Please stop shitting on the model as wrong. It's not. It is what it is.
Hey buddy, I understand what variance is and that no model can predict what will happen in any single instance. I am saying that over 20,000 ECAC tournaments,
we would win it fewer than 11,8000 times. Sorry your attempt at being condescending missed the entire premise of my post.
You say, "as defined, the model is accurate." That's some impressively circular logic! If this predictor is, as you say, "based on the relative value of the KRACH ratings," my issue is with KRACH as a predictor of hockey games, not with the model's application of KRACH (I thought that was obvious from my initial post, but guess not). Specifically, my issue is that KRACH fails to account for the
variance over a
small sample size of games leading up to this point in the season. There is a
very low degree of certainty that Cornell is actually the second-best team in the country. We could realistically be anywhere from 1-25. Does KRACH--and, by extension, the model--account for that?
Quote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."
It could be wrong, of course, but the "mustness" of the feeling inverts reality. When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong. There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Quote from: BearLoverQuote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."
It could be wrong, of course, but the "mustness" of the feeling inverts reality. When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong. There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
Quote from: TrotskyQuote from: BearLoverQuote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."
It could be wrong, of course, but the "mustness" of the feeling inverts reality. When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong. There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
The only person (erroneously) arguing this point is abmarks, who misinterpreted my argument. This model doesn't just look wrong--it looks heinously wrong. We can't know for sure until someone checks it against empirical data (past tournaments, as I suggested above). AFAIK, KRACH isn't meant to be predictive. In fact, from what I understand, it fails very badly at being predictive. No one is saying "this model is wrong because in my experience [this number] cannot be true!" People are saying, "this model looks very wrong, [this improperly weighted input] is probably why, someone please check it against empirical data to be sure."
I started doing some analysis on how accurate using KRACH is to predict results. I will follow up in the next post with some more details on what I did, but the basic gist was I collected the KRACH based projected winning percentage for the better ranked team and the result of the game (win/tie/loss). I then used a logistic regression to this data using the KRACH prediction as the independent variable. I used the results of 1129 games over the two previous seasons.
KRACH | Result
------+-------
0.50 | 0.5390
0.55 | 0.5750
0.60 | 0.6102
0.65 | 0.6444
0.70 | 0.6771
0.75 | 0.7081
0.80 | 0.7374
0.85 | 0.7647
0.90 | 0.7899
0.95 | 0.8131
1.00 | 0.8343
Some more details:
I used data from the past two complete seasons. I hope to add more seasons, but the dates for games on USCHO from 3 years ago seem to be in a mix of d/m/y and m/d/y which kinda breaks things. (On that note, if anyone knows of a easily parsable database of game results that would be great since I'm currently copying the table from USCHO into excel and exporting it as a csv.)
I step through the schedule week by week and update the KRACH rating for every team. Then , I calculate the KRACH based projection of winning percentage for the upcoming week's games. I always use the higher ranked teams winning likelihood so they are all within the 0.5-1 range. I then save the result of the game (tie, higher ranked won, higher ranked loss) along with the KRACH based likelihood. I start this process at the beginning of January to ignore the early season variability of KRACH (also to avoid any of the complexities of calculating KRACH on undefeated teams). Now that I am typing this up I realize that stepping through on a weekly basis isn't really necessary and I am thinking about changing it to a day by day step.
Now I am left with two variables, KRACH Prediction Win Likelihood and the actual result of the game. I tried a couple different things here to try to find a good correlation between the two. Based on my research, I think the best way is to use a Logistic Regression and those are the results shown in the above post. I don't consider myself an expert in this stuff at all so I very well could be making some bad assumptions here. If anyone has a better method to compare them, I'm interested to hear.
If anyone has any questions or suggestions for further things to try out, I'd love to hear them.
Quote from: BearLoverQuote from: TrotskyQuote from: BearLoverQuote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."
It could be wrong, of course, but the "mustness" of the feeling inverts reality. When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong. There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
The only person (erroneously) arguing this point is abmarks, who misinterpreted my argument. This model doesn't just look wrong--it looks heinously wrong. We can't know for sure until someone checks it against empirical data (past tournaments, as I suggested above). AFAIK, KRACH isn't meant to be predictive. In fact, from what I understand, it fails very badly at being predictive. No one is saying "this model is wrong because in my experience [this number] cannot be true!" People are saying, "this model looks very wrong, [this improperly weighted input] is probably why, someone please check it against empirical data to be sure."
There will never be enough tourney results to create a dataset big enough to generate an empirical conclusion of any precision, once again, because variance.
If you weren't so lazy, you might have read the FAQ page at CHN, which was answered by JTW (of this forum) himself. https://www.collegehockeynews.com/info/?d=krach . Here is part of it, with key info
bolded and italicised.Q. Can you tell us a little more?
A: Getting a bit more technical: The Bradley-Terry system is based on a statistical technique called logistic regression, in essence meaning that teams' ratings are determined directly from their won-loss records against one another. KRACH's strength of schedule component is calculated directly from the ratings themselves, which is a key point.
It means that KRACH, unlike many ratings (including RPI), cannot easily be distorted by teams with strong records against weak opposition.The ratings are on an odds scale, so if Team A's KRACH rating is three times as large as Team B's, Team A would be expected to amass a winning percentage of .750 and Team B a winning percentage of .250 if it played each other enough times. The correct ratings are defined such that the "expected" winning percentage for a team in the games it's already played is equal to its "actual" winning percentage.
Q. And so why is this so great?
A: In other words, if you took one team's schedule to date, and played a theoretical "game" for each game already actually played, using the KRACH ratings themselves in order to predict the winner, then the end result would be a theoretical won-loss percentage that matches the team's actual won-loss percentage. Pretty cool.
It is not possible to do any better than that with a completely objective method. Any other method would introduce arbitrary-ness and/or subjectivity.Q. What are the limitations?
A:
Well, KRACH can't predict the future. Nothing can. The idea behind such ratings systems is to use them in order to properly select and seed tournaments. Champions are then determined on the ice. All systems are designed to analyze past results, not necessarily predict future ones. Though, by theory, the more sound the analysis of the past, the better the ability to predict future results.
KRACH is "perfect" in its analysis of past results. But that should not be construed to mean that it definitively decides which team is better. When dealing with sample sizes like this, you never know. Team A could lose to Team B, be below them in KRACH, and then turn around and beat Team B the next three times. KRACH would then change. It does not invalidate what KRACH represented at the time, however.
Quote from: abmarksIf you weren't so lazy, you might have read the FAQ page at CHN, which was answered by JTW (of this forum) himself. https://www.collegehockeynews.com/info/?d=krach . Here is part of it, with key info bolded and italicised.
You'll be happy to know I read that entire primer before my initial posts. It doesn't answer any of my questions or help your case in any way.
Quote from: abmarksThere will never be enough tourney results to create a dataset big enough to generate an empirical conclusion of any precision, once again, because variance.
There exists data from hundreds of tournaments and thousands of games that we can compare against KRACH-based predictions.
You're also being obtuse in you
emboldening/italicizing of clauses from the KRACH FAQ (
thanks, by the way!). Yeah, KRACH isn't
meant to be predictive. And yeah, nothing can better objectively measure
past results. No one cares about those things. The question at hand is whether KRACH happens to be predictive to a significant enough degree that it's worth using in models that predict outcomes of hockey games. That was always the question, not whether KRACH is a nice way of seeding for tournaments or whether this predictor misapplied KRACH. Since there is absolutely nothing, here in this thread, or included in the KRACH FAQ, or gleaned from comparing this model against other sports/hockey prediction models, to suggest that KRACH is even a remotely good predictor of future hockey game outcomes, I'm going to assume KRACH is not a good predictor of future hockey game outcomes, and that therefore this model isn't good. Happy to be proven otherwise (in hopefully a more polite manner).
Why are you in such a foul mood, anyway?
Quote from: jfeath17Some more details:
I used data from the past two complete seasons. I hope to add more seasons, but the dates for games on USCHO from 3 years ago seem to be in a mix of d/m/y and m/d/y which kinda breaks things. (On that note, if anyone knows of a easily parsable database of game results that would be great since I'm currently copying the table from USCHO into excel and exporting it as a csv.)
I step through the schedule week by week and update the KRACH rating for every team. Then , I calculate the KRACH based projection of winning percentage for the upcoming week's games. I always use the higher ranked teams winning likelihood so they are all within the 0.5-1 range. I then save the result of the game (tie, higher ranked won, higher ranked loss) along with the KRACH based likelihood. I start this process at the beginning of January to ignore the early season variability of KRACH (also to avoid any of the complexities of calculating KRACH on undefeated teams). Now that I am typing this up I realize that stepping through on a weekly basis isn't really necessary and I am thinking about changing it to a day by day step.
Now I am left with two variables, KRACH Prediction Win Likelihood and the actual result of the game. I tried a couple different things here to try to find a good correlation between the two. Based on my research, I think the best way is to use a Logistic Regression and those are the results shown in the above post. I don't consider myself an expert in this stuff at all so I very well could be making some bad assumptions here. If anyone has a better method to compare them, I'm interested to hear.
If anyone has any questions or suggestions for further things to try out, I'd love to hear them.
Thank you for tackling the question with data. This may be too simplistic, but how about sorting the games into 10 bins based on the predicted probability of the the favored team winning. 0.50<=x<0.55, 0.55<=x<0.60, etc. For each bin, calculate the fraction of the games that the favored team won. Graph the actual fraction vs the calculated fraction. You could multiply the number of games in each bin by the predicted fraction for the favored team to get the predicted number, then take the square root to get some notion of the size of the predicted error.
Isn't it wonderful that we're arguing so vehemently over whether the statistical models are accurate when they say we're fantastic because our record is 22-3-2?
I definitely don't recall having these arguments in 1993. "No, we must be worse than our 6-19-1 record would suggest!!"
Is there a way to factor in the odds of a tie? I know it is incorporated into B-T ratings... but my guess is that hockey's non-binary results are a significant source of forecast error.
Quote from: David HardingQuote from: jfeath17Some more details:
I used data from the past two complete seasons. I hope to add more seasons, but the dates for games on USCHO from 3 years ago seem to be in a mix of d/m/y and m/d/y which kinda breaks things. (On that note, if anyone knows of a easily parsable database of game results that would be great since I'm currently copying the table from USCHO into excel and exporting it as a csv.)
I step through the schedule week by week and update the KRACH rating for every team. Then , I calculate the KRACH based projection of winning percentage for the upcoming week's games. I always use the higher ranked teams winning likelihood so they are all within the 0.5-1 range. I then save the result of the game (tie, higher ranked won, higher ranked loss) along with the KRACH based likelihood. I start this process at the beginning of January to ignore the early season variability of KRACH (also to avoid any of the complexities of calculating KRACH on undefeated teams). Now that I am typing this up I realize that stepping through on a weekly basis isn't really necessary and I am thinking about changing it to a day by day step.
Now I am left with two variables, KRACH Prediction Win Likelihood and the actual result of the game. I tried a couple different things here to try to find a good correlation between the two. Based on my research, I think the best way is to use a Logistic Regression and those are the results shown in the above post. I don't consider myself an expert in this stuff at all so I very well could be making some bad assumptions here. If anyone has a better method to compare them, I'm interested to hear.
If anyone has any questions or suggestions for further things to try out, I'd love to hear them.
Thank you for tackling the question with data. This may be too simplistic, but how about sorting the games into 10 bins based on the predicted probability of the the favored team winning. 0.50<=x<0.55, 0.55<=x<0.60, etc. For each bin, calculate the fraction of the games that the favored team won. Graph the actual fraction vs the calculated fraction. You could multiply the number of games in each bin by the predicted fraction for the favored team to get the predicted number, then take the square root to get some notion of the size of the predicted error.
Some things I'd want to add to the discussion:
1. We predict the future all the time, usually with good results. Do you want to bet the sun won't come up tomorrow? That if I take an umbrella in the rain I'll get less wet than without one? Etc.
2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?
3. Looking at Clarkson again, these things are time series. So recent performance should be weighted more heavily. Is it?
4. What about using other information besides past performance? Donato away at the Olympics? Cornell's top D-men and third-leading scorer are injured? Surely a pro from Las Vegas would use such information to handicap a team.
Quote from: BearLoverQuote from: TrotskyQuote from: BearLoverQuote from: TrotskyIt is basic human psychology to look at a mathematical model that produces a counter-intuitive event probability and think "that model must be wrong."
It could be wrong, of course, but the "mustness" of the feeling inverts reality. When a well-developed algorithm conflicts with your assessment of likelihood it's indicating that your brain is wrong. There are a host of cognitive biases that make our seat-of-the-pants judgments prone to error.
Yeah, we know, we're past this point.
Evidently not.
The only person (erroneously) arguing this point is abmarks, who misinterpreted my argument. This model doesn't just look wrong--it looks heinously wrong. We can't know for sure until someone checks it against empirical data (past tournaments, as I suggested above). AFAIK, KRACH isn't meant to be predictive. In fact, from what I understand, it fails very badly at being predictive. No one is saying "this model is wrong because in my experience [this number] cannot be true!" People are saying, "this model looks very wrong, [this improperly weighted input] is probably why, someone please check it against empirical data to be sure."
BearLover's complaints about the predictive model are valid, at least in terms of modeling decisions. The model in CHN (and playoffstatus, which is not meaningfully different) is based on a number of fairly strong assumptions, some of which are hidden, and that bears examining.
If it's meant to be an exercise in generating a valid distribution of outcomes without regard to predictive accuracy of empirical results, perhaps with the aim of starting conversation and giving fans something to gas about on weekdays, there's absolutely nothing wrong with it. If it's meant to be an effective predictor of tournament qualification and end of season results, it is questionable. I don't have the empirical data to know if it's *right* and I don't feel like doing that analysis, but the odds don't pass the sniff test. At this point, the predictions are so confident at the top end of the distribution that someone really would need to produce empirical outcomes showing the model's effectiveness before I'd believe those numbers. This doesn't mean I don't understand probability. It means I do understand modeling choices.
Failing to update KRACH along the way is a modeling choice everybody has talked about - that's actually a bad one because you're basically starting with a strong prior and failing to update it in any way. I understand the computational cost issues in play here but there's got to be a way to do that efficiently. If not, you could use something that has a lot of KRACH's desirable properties without the computational complexity (maybe Elo is better for this?) and then see how the models compare. That'll at least get you some sense of how much impact this decision has on your distributions.
Personally, I suspect the bigger problem with the predictive model CHN is using is it treats KRACH as 100% accurate - all of the variance between KRACH's predictions and actual empirical results are missing from the model itself. Excluding that input makes the model over-confident with respect to empirical predictors, and I think that's something you could address. At the very least you can externally validate the underlying assumption. There's plenty of data at this point - you can actually just do single point KRACH predictions and compare that with the distribution of empirical outcomes. There's no need to restrict to tournament games. If there's a lot of divergence (and early results in this thread suggest that this is, in fact, the case) the KRACH based monte carlo predictors will do a pretty bad job of predicting empirical reality.
One way to think about this point is to compare it to models of the presidential election (NO POLITICS - this is about modeling decisions). In 2016 a lot of models had these hugely, almost impossibly confident predictions of a Clinton victory. Several reputable polling-based predictive models had less than a 5% chance of Trump winning, while 538 had a 10% chance of Trump winning while losing the popular vote. If you look at the recaps one key reason was because those models took a fairly naive approach to modeling empirical error in polling predictors. Specifically, they failed to account for correlated polling errors across states with similar demographic characteristics. 538 took some (IMHO valid) criticism that their adjustments were being too strongly applied or that they weren't accounting for the error terms in those estimates, but in an empirical model I think that's a better class of mistake to make than just saying "this thing that happens every election doesn't happen in my model because I said so."
Back to hockey, one could imagine the same thing happening with KRACH. If KRACH systematically over-states the odds that highly rated Team A will beat lower ranked Team B, you'll get over-confident predictions for any team with a sufficiently strong record relative to its competition. Assuming jfeath17's data is correct, that is precisely what appears to be happening. This is not a flaw in KRACH, necessarily, because KRACH is meant to provide a ranked set rather than absolute determination of odds of victory. However, when using KRACH as a forward looking predictor you really do need to adjust for that variance if your model is to be empirically accurate.
This doesn't mean the pure KRACH model is useless - it's interesting, and it gives some baseline for discussion and adaptation, and it lets us talk about hockey (and math) on a Tuesday.
OK Bearlover, what you are really arguing then is that KRACH itself is worthless for any use, and you are making a completely specious argument based on your intuition, not any actual examination of data or the methods used in the CHN model about likelihood of winning the Conference tournament.
So I'll make this objective with data. I'm curious (genuinely, not snidely) whether you agree with what KRACH says for comparisons against particular opponents? I ask, because if the individual comparisons are correct, then that 59% number is correct, since it's just math at that point.
We'll use standings as of today and assume all seeds hold so as to simplify this model. We'd have the following matchups:
Quarters: (For simplicity, let's call this one a single game, not best of three.)
We play #8 Yale. Yale's KRACH is 98.3. Our KRACH is 512.9.
-This means that we are beating Yale 84% of the time. (Not sure how to do the math, but this should also imply that our chances of winning a best of three are even higher?)
Semis
We play #4 Harvard. HVD KRACH is 136.9. Our KRACH is 512.9.
-This means that we are beating HVD 79% of the time.
Championship
Champ #2 Union Union KRACH is 142.5. Our KRACH is 512.9.
-This means that we are beating Union 78% of the time.
===> Odds of winning tournament are 52%. (84% x 79% x 78%)
This is essentially what the model did that got 59%. (The difference is due to possibilities of upsets and that they ran monte carlo, not a single calculation. )
Bearlover, the question is- do you disagree with the individual KRACH comparisons?
-A
p.s. someone correct me if I got the KRACH math wrong.
Quote from: Tom LentoBack to hockey, one could imagine the same thing happening with KRACH. If KRACH systematically over-states the odds that highly rated Team A will beat lower ranked Team B, you'll get over-confident predictions for any team with a sufficiently strong record relative to its competition. Assuming jfeath17's data is correct, that is precisely what appears to be happening. This is not a flaw in KRACH, necessarily, because KRACH is meant to provide a ranked set rather than absolute determination of odds of victory. However, when using KRACH as a forward looking predictor you really do need to adjust for that variance if your model is to be empirically accurate.
This is where I'll again repeat that - if you have a better model - please feel free to share.
Quote from: Tom LentoOne way to think about this point is to compare it to models of the presidential election (NO POLITICS - this is about modeling decisions). In 2016 a lot of models had these hugely, almost impossibly confident predictions of a Clinton victory. Several reputable polling-based predictive models had less than a 5% chance of Trump winning, while 538 had a 10% chance of Trump winning while losing the popular vote. If you look at the recaps one key reason was because those models took a fairly naive approach to modeling empirical error in polling predictors. Specifically, they failed to account for correlated polling errors across states with similar demographic characteristics. 538 took some (IMHO valid) criticism that their adjustments were being too strongly applied or that they weren't accounting for the error terms in those estimates, but in an empirical model I think that's a better class of mistake to make than just saying "this thing that happens every election doesn't happen in my model because I said so."
538's last polls only forecast was 29% Trump. (https://projects.fivethirtyeight.com/election-night-forecast-2016/) And they admit that their problem was not enough late state polling. So they couldn't have seen the state results that gave him the Electoral College.
So if you don't have the data, you can't get accuracy.
"Getting back to hockey", you have the same problem. Does anyone really think you can input the data on injuries, players in the Olympics, etc.?
This whole discussion is "worthless" unless someone can put up a better way.
We all can come up with problems with the "science", but unless someone is willing to put their money where their mouth is(or fingers are), we can carry on the discussion forever without anything changing.
Finally, it's interesting all the discussions that happen once we start winning again.
For that I'm happy.
Quote from: adamwQuote from: Tom LentoBack to hockey, one could imagine the same thing happening with KRACH. If KRACH systematically over-states the odds that highly rated Team A will beat lower ranked Team B, you'll get over-confident predictions for any team with a sufficiently strong record relative to its competition. Assuming jfeath17's data is correct, that is precisely what appears to be happening. This is not a flaw in KRACH, necessarily, because KRACH is meant to provide a ranked set rather than absolute determination of odds of victory. However, when using KRACH as a forward looking predictor you really do need to adjust for that variance if your model is to be empirically accurate.
This is where I'll again repeat that - if you have a better model - please feel free to share.
I don't have a better model handy because making a better model requires a lot of effort and I'm not currently unemployed (or employed in a place where I get paid to do this kind of thing). If I ever take a few months off work trying to build something like this would be super fun, although as a follower of the game trying to make the advanced stats more useful for me is probably what I'd do first.
That said, you can account for the variance against empirical reality by measuring it, adding uncertainty to the model (perhaps via weighting KRACH-predicted outcomes), and backtesting to validate.
More generally, though, I think you can start simpler by seeing how far off of empirical reality the model predictions have been. If you're 95% accurate, why bother? If you're way off, how much does the model improve by adjusting each individual assumption? Does adding the error variance into the simulation help? Or is the main issue the lack of KRACH updating? Or should you adjust KRACh weight by other factors (corsi, PDO, whatever)?
Quote from: abmarksOK Bearlover, what you are really arguing then is that KRACH itself is worthless for any use, and you are making a completely specious argument based on your intuition, not any actual examination of data or the methods used in the CHN model about likelihood of winning the Conference tournament.
Huh? KRACH is the best tool we have for ranking/seeding teams. It's just a poor predictive tool. Tom Lento explained this better than I could have, so please refer to his post. To say KRACH is not a good predictive tool because it does not account for the (significant) natural variance leading up to a specified point in a hockey season (which it shouldn't/can't, because it's not meant to be predictive) is not "specious" and is based on an "examination of the methods used in the CHN model." I'm not sure at this point what is confusing about what I am saying, to be totally honest. So, no, those underlying predictive numbers you cited are not correct, because they're taking as certain the outputs of a model that ranks teams in a very random sport based on 25-ish very random events. We are not 80% to bear Harvard and Union.
Quote from: SwampySome things I'd want to add to the discussion:
2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?
Let's say we know that in the long run, A beats B 75% of the time. So, over 100 games, A wins 75.
What the P(A winning) does NOT tell you is which of those 100 games A wins. A could go 0-10, then 75-5, then 0-10 over the course of those 100.
Taking that back to the topic at hand, short term results (ie the 1 game result in a tournament) are going to vary a lot vs. the long-term percentage.
Quote from: Tom LentoThat said, you can account for the variance against empirical reality by measuring it, adding uncertainty to the model (perhaps via weighting KRACH-predicted outcomes), and backtesting to validate.
More generally, though, I think you can start simpler by seeing how far off of empirical reality the model predictions have been. If you're 95% accurate, why bother? If you're way off, how much does the model improve by adjusting each individual assumption? Does adding the error variance into the simulation help? Or is the main issue the lack of KRACH updating? Or should you adjust KRACh weight by other factors (corsi, PDO, whatever)?
If I had any idea how to do this correctly, it would already have been done.
if you can improve the model lets get it working for horse racing as that would provide you the time to tweak the hockey model once we get rich..
Quote from: Jim HylaQuote from: Tom LentoOne way to think about this point is to compare it to models of the presidential election (NO POLITICS - this is about modeling decisions). In 2016 a lot of models had these hugely, almost impossibly confident predictions of a Clinton victory. Several reputable polling-based predictive models had less than a 5% chance of Trump winning, while 538 had a 10% chance of Trump winning while losing the popular vote. If you look at the recaps one key reason was because those models took a fairly naive approach to modeling empirical error in polling predictors. Specifically, they failed to account for correlated polling errors across states with similar demographic characteristics. 538 took some (IMHO valid) criticism that their adjustments were being too strongly applied or that they weren't accounting for the error terms in those estimates, but in an empirical model I think that's a better class of mistake to make than just saying "this thing that happens every election doesn't happen in my model because I said so."
538's last polls only forecast was 29% Trump. (https://projects.fivethirtyeight.com/election-night-forecast-2016/) And they admit that their problem was not enough late state polling. So they couldn't have seen the state results that gave him the Electoral College.
So if you don't have the data, you can't get accuracy.
"Getting back to hockey", you have the same problem. Does anyone really think you can input the data on injuries, players in the Olympics, etc.?
This whole discussion is "worthless" unless someone can put up a better way.
We all can come up with problems with the "science", but unless someone is willing to put their money where their mouth is(or fingers are), we can carry on the discussion forever without anything changing.
Finally, it's interesting all the discussions that happen once we start winning again.
For that I'm happy.
Note all of the caveats in my statement about 538s prediction. Their model gave Trump a 10% chance of winning the election while losing the popular vote, and a 29% chance of winning overall. The Princeton Election Consortium gave Trump a 2% chance of winning the election at all at one point late in the race.
The way 538 approaches these problems is to tune a model based on parameters that explain variance from empirical reality, and then to back-test that model against the actual results. If you look at their CARMElo ratings for the NBA they basically incorporate factors which might contribute to fatigue and injury (travel, back to back games) in ways that empirically affect performance, without worrying as much about whether or not Steph Curry the individual player will get hurt.
The same thing applies in hockey. The fact that the variance in single game outcomes is larger in hockey than basketball makes the problem harder, of course.
I can't really tell Adam exactly how to do this - I'm not that familiar with monte carlo and I don't have enough direct experience in this domain to do more than provide vague suggestions, and unless I take the time to get my hands dirty with it I won't be able to speak intelligently about which approaches to consider and which to discard.
Quote from: adamwQuote from: Tom LentoThat said, you can account for the variance against empirical reality by measuring it, adding uncertainty to the model (perhaps via weighting KRACH-predicted outcomes), and backtesting to validate.
More generally, though, I think you can start simpler by seeing how far off of empirical reality the model predictions have been. If you're 95% accurate, why bother? If you're way off, how much does the model improve by adjusting each individual assumption? Does adding the error variance into the simulation help? Or is the main issue the lack of KRACH updating? Or should you adjust KRACh weight by other factors (corsi, PDO, whatever)?
If I had any idea how to do this correctly, it would already have been done.
Yeah, that's the hard part. There's a ton of literature on evaluating predictive models, but I haven't done anything even adjacent to this field for years so I wouldn't know what to recommend as an intro. When my work involved statistical modeling it wasn't in these domains anyway, so I don't have answers off the top of my head either. :(
Just to be clear, I like the models, and it's fun (for me, at least) to think of alternatives. If I stumble across a relevant approach for you I'll pass it along. Thanks for putting them up for us! :)
The hard part is everyone complaining about what is out there, including Adam's, and no one has an answer, or is willing to help.
I think it's a lot of fun to look at these models, but I don't have a clue about what to do (easily) to improve them, so I enjoy what's there and keep my mouth shut about complaining they aren't good enough.
It seems we go through this every spring. At least every spring where it means something to us and the post-season.
Adam takes a lot of crap for no good reason, he's trying a lot harder than many others.
Now if he could only fix the app on my iPhone, so it wouldn't screw up so often, that would be nice.........:-D::bolt::
Quote from: Jim HylaThe hard part is everyone complaining about what is out there, including Adam's, and no one has an answer, or is willing to help.
I think it's a lot of fun to look at these models, but I don't have a clue about what to do (easily) to improve them, so I enjoy what's there and keep my mouth shut about complaining they aren't good enough.
It seems we go through this every spring. At least every spring where it means something to us and the post-season.
Adam takes a lot of crap for no good reason, he's trying a lot harder than many others.
Now if he could only fix the app on my iPhone, so it wouldn't screw up so often, that would be nice.........:-D::bolt::
I have nothing against Adam and I love CHN. But that doesn't mean we should be quiet about predictions that are based on flawed assumptions. I also think it's better to have no prediction model at all than to have one that is based on flawed assumptions. Coverage of the 2016 election would have been vastly improved had flawed models like HuffPost's not existed. America would have known that for almost the entirety of the race Hillary was only a slight favorite, that the electoral college favored Trump, that Comey's letter very likely cost Clinton the election. Instead, the media, in part because of models like HuffPost's and others', covered Hillary's victory as a foregone conclusion. Obviously the stakes aren't as high here, but no one is helped by a model that wrongly portrays Cornell's odds against Union as 80%, or its odds of winning the ECAC as 60%.
Quote from: abmarksQuote from: SwampySome things I'd want to add to the discussion:
2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?
Let's say we know that in the long run, A beats B 75% of the time. So, over 100 games, A wins 75.
What the P(A winning) does NOT tell you is which of those 100 games A wins. A could go 0-10, then 75-5, then 0-10 over the course of those 100.
Taking that back to the topic at hand, short term results (ie the 1 game result in a tournament) are going to vary a lot vs. the long-term percentage.
I understand this but was talking about variance in several other senses. I'll explain them here.
WARNING: THE FOLLOWING IS QUITE WONKISH.AssumptionsAssume two teams, Team C and Team H, belong to a 12-team league in which teams play each other twice during the season. So each team plays 22 league games. Also assume teams earn 0 points in the league standings for a loss, 1 for a tie, and 2 for a win.
Estimation VarianceFor the moment, ignore ties. Any data-based estimate of a team's chances of winning a game can be thought of as a function. If
pC is the probability Team C wins a game, then let
pC be the estimate of that probability. So that:
(1)
pC = f(data)In other words, the estimated probability is a function of whatever data are used in the estimate. When we say "data," this includes the
number of data points (sample size) used to make the estimate.
Now, if we know the mathematical properties of
f() we may be able to derive, mathematically, an expression for the variance of
pC,
var(pC). Call this the
estimation variance, a measure if the estimate's the precision.
If we do not know the estimating function's mathematical properties, we still may be able to estimate
var(pC) using simulation and resampling techniques (https://www.wikiwand.com/en/Resampling_(statistics)).
Game and Game-Series EstimatesThink of a single game as an experiment with two possible outcomes: "success" and "failure." For simplicity, assume we actually know the
real probability of each, so we don't have to use estimates like (1). To think about this, just consider Team C for now.
Let:
p = probability Team C wins
q = probability Team C loses =
1 - pFurthermore, to convert the results into a number, define a random variable,
X = 1 for a win and 0 for a loss. This is well known as a Bernoulli Trial (https://www.wikiwand.com/en/Bernoulli_trial), and X has a Bernoulli Distribution (https://www.wikiwand.com/en/Bernoulli_distribution). The variance of
X is given by:
(2)
var(X) = pqIn the present context, call this
"game variance" since it is the variance related to the outcome of a single game.
We can also think of a
"series variance", which is the variance associated with a team winning a series. To simplify the math, let's disregard the fact that some series end after a team has won the majority of games in the series (e.g., 2 out of 3), and just think of the number of wins in a series. Define a second random variable,
Yn as the number of wins in a series of
n games. If each game has the same probabilities of its outcome, then
Yn is the sum of
n X's. In other words, it has a binomial distribution (https://www.wikiwand.com/en/Binomial_distribution), the variance of which equals:
(3)
var(Yn) = np(1-p)In both (2) and (3) the variance depends on the value of
p. If
p = 0, the variance is 0, and similarly for
p = 1. The variance is at its maximum, 0.25, when
p = 0.5.
It's important to note here that the variance depends on the underlying, real probabilities and is not a matter of estimation.
Comments on jfeath17's chart[list=1]
- The chart shows a relation between the Krach and actual game outcomes. Because of the properties of Bernoulli and Binomial distributions, the variance necessarily decreases as p moves away from 0.5 and closer to 1.0. So we would expect better predictions to the right of the graph. But the graph is almost a straight line up to about p = 0.85 and then drops off slightly. Maybe this is due to a weakness in the Krach, which is not intended to predict outcomes. Or maybe that's why they play the game.
- The chart would be improved with confidence bands (https://www.wikiwand.com/en/Confidence_and_prediction_bands), which are sensitive to variance and graphically show how confident one should be about the fitted line.
Notice though that confidence intervals plotted around a curve like this, which is based on empirical data, themselves estimated from other data (as in Equation 1), have two sources of variance: Estimation Variance and Game variance.
Perfomance VarianceIn addition to the above, we should consider the variance of a given team's performance. Some teams are reliable; others are erratic. This can be best explained with an example.
Suppose every one of the 10 "other" teams always scores exactly 3 goals in every game. Then if
O is the number of goals one of these "other" teams scores, the expected number of goals is 3 (
E- = 3[/i]), and the variance is zero (var
- = 0[/i]).
Similarly, assume Team C always scores 4 goals when it plays. Then if C is a random variable equal to the number of goals Team C scores, E[C] = 4 and var[C] = 0.
We can see right away that over the season Team C will always win over the ten "other" teams, so just from playing them it will accumulate 40 points (10 teams, 2 games per team, 2 points per win).
But now consider Team H, which is more erratic. Let H be the number of goals it scores in any given game. Like Team C let Team H's expected number of goals be 4: E[H] = 4. But unlike Team C, var[H] will not be zero.
Instead, suppose H has the following probability mass distribution: P[H = 2] = 0.10, P[H = 3] = 0.15, P[H = 4] = 0.50, P[H = 5] = 0.15, and P[H = 6] = 0.10. So here we can see different results when Team H plays its 20 games against the 10 "other" teams: the expected number of losses is 2, the expected number of ties is 3, and the expected number of wins is 15. So when Team H plays the other teams, the expected number of points is only 33, unlike Team C's 40!
What about when Team C and Team H play each other? Even though both have the same expected number of goals, the variance of Team H means it will be expected to lose to Team C 25% of the time, tie Team C 50% of the time, and beat Team C 25% of the time. In each of their 2 games against each other during the regular season, 2 points is at stake. So Team H can expect 0.5 points from a tie (1 point x 0.5 probability) and 0.5 points from a win (2 points x 0.25 probability), or 1 point in total. Similarly for Team C. With both teams playing each other twice during the season, each expects to get 2 points. This makes sense, because they're evenly matched.
But in terms of total points in the league, Team C expects to have 42 points at season's end, but Team H expects only 35 points. Which is how things should be, because Team H sucks.
Notice here that the only difference between the two teams is their respective variances, but it makes a big difference. If we look more closely at games against the 10 "other" teams, we are much more confident that Team C will beat them, whereas we expect Team H to lose to some of them. This is why performance variance is also important in thinking about which teams are likely to win particular games. Again, here there's no estimation issue. We know what the probabilities really are, yet variance affects the outcome.
Technical Suggestion
Jfeath17 asked for suggestions regarding the graphical analysis. For this kind of work I highly recommend the R Project's (https://www.r-project.org/) free, open-source statistical software used in conjunction with the RStudio (https://www.rstudio.com/) GUI interface. It would allow easy addition of things like confidence bands in the probability plots, weighting of recent time-series data, etc.
H sucks...sorry wrote this before I read your whole post
Quote from: nshapiroH sucks..
I believe you are referring to Team A.
Quote from: TrotskyQuote from: nshapiroH sucks..
I believe you are referring to Team A.
The party of the first part, hereafter referred to as the "Party of the First Part"...
No, I don't like that part.
Which part?
The first part.
Quote from: Jeff Hopkins '82Quote from: TrotskyQuote from: nshapiroH sucks..
I believe you are referring to Team A.
The party of the first part, hereafter referred to as the "Party of the First Part"...
No, I don't like that part.
Which part?
The first part.
You can't fool me. There is no sanity clause.
All I can say is I'm glad I never took statistics!
jfeath17, do you have an equation for that logistic regression? If you do, adamw could plug the KRACH-generated winning percentage into it get an empirical winning % for the pairwise probability matrix. adamw, for the Monte Carlo with 20k samples, I estimate this will add a couple million extra computations to the model. I'm pretty sure that's not a big deal.
There may be confidence questions with jfeath17's model (ideally, we'd want more than 2 years of data), but it answers the question that's been raised: it takes the KRACH reported winning percentage and turns into the winning percentage that actually happened.
Quote from: TrotskyQuote from: Jeff Hopkins '82Quote from: TrotskyQuote from: nshapiroH sucks..
I believe you are referring to Team A.
The party of the first part, hereafter referred to as the "Party of the First Part"...
No, I don't like that part.
Which part?
The first part.
You can't fool me. There is no sanity clause.
+1
Quote from: Jim HylaAdam takes a lot of crap for no good reason, he's trying a lot harder than many others.
Thank you, sir.
Quote from: Jim HylaNow if he could only fix the app on my iPhone, so it wouldn't screw up so often, that would be nice.........:-D::bolt::
For the record, the correct person to address this issue to is much closer to you, vis-a-vis this forum, than I am. :)
Quote from: Swampy....
But now consider Team H, which is more erratic. Let H be the number of goals it scores in any given game. Like Team C let Team H's expected number of goals be 4: E[H] = 4. But unlike Team C, var[H] will not be zero.
Instead, suppose H has the following probability mass distribution: P[H = 2] = 0.10, P[H = 3] = 0.15, P[H = 4] = 0.50, P[H = 5] = 0.15, and P[H = 6] = 0.10. So here we can see different results when Team H plays its 20 games against the 10 "other" teams: the expected number of losses is 2, the expected number of ties is 3, and the expected number of wins is 15. So when Team H plays the other teams, the expected number of points is only 33, unlike Team C's 40!
....
I like where all this headed. I just don't know how to implement. But philosophically, it looks sound to me. Anything added to the model should be based on sound statistical principle like this that is repeatable and set, and not vague "tweaking" based on assumptions.
Quote from: BearLoverI have nothing against Adam and I love CHN. But that doesn't mean we should be quiet about predictions that are based on flawed assumptions. I also think it's better to have no prediction model at all than to have one that is based on flawed assumptions. Coverage of the 2016 election would have been vastly improved had flawed models like HuffPost's not existed. America would have known that for almost the entirety of the race Hillary was only a slight favorite, that the electoral college favored Trump, that Comey's letter very likely cost Clinton the election. Instead, the media, in part because of models like HuffPost's and others', covered Hillary's victory as a foregone conclusion. Obviously the stakes aren't as high here, but no one is helped by a model that wrongly portrays Cornell's odds against Union as 80%, or its odds of winning the ECAC as 60%.
Thanks - and feel free to complain all you want. But I disagree the model is flawed. It might be incomplete, but I wouldn't call it flawed. There's nothing flawed about how the KRACH computes itself. The non-538 presidential race models were flawed, because they made really poor assumptions.
Quote from: adamwQuote from: Jim HylaNow if he could only fix the app on my iPhone, so it wouldn't screw up so often, that would be nice.........:-D::bolt::
For the record, the correct person to address this issue to is much closer to you, vis-a-vis this forum, than I am. :)
I know that, but I had to say something bad, didn't I?::nut::
ECAC end of 2017-18 regular season
1. Cornell
2. Union
3, Clarkson
4. Harvard
5. (t) Dartmouth gets 12 seed first round
5. (t) Colgate gets 11 seed
7. Princeton
8. Yale
9. Quinnipiac
10. Brown
11. Rensselaer
12. St. Lawrence
After the first round (where 1-4 have a bye) the tournament is reseeded. We get the lowest survivor. If higher seeds win the first and second weeks, we'd play 8 Yale in Ithaca and then 4 Harvard Friday (early game) in Lake Placid. What is the likeliest first round upset?
Quote from: billhowardWhat is the likeliest first round upset?
Brown or RPI (are tied for "likeliest").
8. Yale
9. Quinnipiac
10. Brown
11. RPI
12. St. Lawrence
We will face one of these teams in the QF.
Quote from: SwampyQuote from: billhowardWhat is the likeliest first round upset?
Brown or RPI (are tied for "likeliest").
If RPI beats Colton Point, I'll donate the refund they owe me for first round tickets. But I'll donate it to our Onion bruised Pep Band.
Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
Quote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
Clarkson has won 2 of their last 12 games. I'd give Harvard, Union and maybe even Colgate a better shot at winning the tournament.
Quote from: CU2007Quote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
Clarkson has won 2 of their last 12 games. I'd give Harvard, Union and maybe even Colgate a better shot at winning the tournament.
Let's hope we never can prove you're right.
Quote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
Quote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Quote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
Quote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
It's perfectly fine for a model that doesn't bother with recent trends.
Which is a big weakness, sure, but that doesn't mean it's necessarily worth tossing out.
Quote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
Why? If someone wants to discuss something that you think is faulty, why do you think you have the right to tell them to stop.
You have the right to point out that you think it's faulty, YOU DO NOT HAVE THE ABILITY TO MAKE THEM STOP.
THEY ARE NOT HURTING YOU. PLEASE LEAVE THEM ALONE AND LET THEM HAVE THEIR OWN FUN.
Quote from: DafatoneQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
It's perfectly fine for a model that doesn't bother with recent trends.
Which is a big weakness, sure, but that doesn't mean it's necessarily worth tossing out.
The main critique of the model has nothing to do with it not accounting for recent trends--scroll up for a longer explanation, but the most glaring problem with the model is that it assumes the #3 team in the KRACH/PWR/RPI is the *actual* third-best team in the country, despite the incredibly high amount of randomness in a 30-game season. I believe the model is assigning Cornell (and everyone else) win probabilities based off them being ranked X after an infinite number or games rather than after 29 games. There was a study some number of years ago that found that several thousands of baseball games (way more than just a 162-game season) are necessary to determine who the best team is. The study answered a bit of a different question in a different sport, but the idea is the same: there is way too much randomness in a small season to say with a high degree of certainty how good a team really is.
Quote from: Jim HylaQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
Why? If someone wants to discuss something that you think is faulty, why do you think you have the right to tell them to stop.
You have the right to point out that you think it's faulty, YOU DO NOT HAVE THE ABILITY TO MAKE THEM STOP.
THEY ARE NOT HURTING YOU. PLEASE LEAVE THEM ALONE AND LET THEM HAVE THEIR OWN FUN.
The capital letters in my post were supposed to be funny...
Quote from: BearLoverQuote from: DafatoneQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
It's perfectly fine for a model that doesn't bother with recent trends.
Which is a big weakness, sure, but that doesn't mean it's necessarily worth tossing out.
The main critique of the model has nothing to do with it not accounting for recent trends--scroll up for a longer explanation, but the most glaring problem with the model is that it assumes the #3 team in the KRACH/PWR/RPI is the *actual* third-best team in the country, despite the incredibly high amount of randomness in a 30-game season. I believe the model is assigning Cornell (and everyone else) win probabilities based off them being ranked X after an infinite number or games rather than after 29 games. There was a study some number of years ago that found that several thousands of baseball games (way more than just a 162-game season) are necessary to determine who the best team is. The study answered a bit of a different question in a different sport, but the idea is the same: there is way too much randomness in a small season to say with a high degree of certainty who the best teams are.
An entire branch of mathematics has been beavering away at this ever since Laplace and Gauss. There are tools to determine the
actual quality of models. I've seen nobody post any objective evaluation of the model in question.
Until somebody does the math this is just somebody saying "well that doesn't feel right." Math doesn't give a shit about your feelings. Do the work, cite somebody who does, or go to the water wings side of the pool.
Quote from: TrotskyQuote from: BearLoverQuote from: DafatoneQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
It's perfectly fine for a model that doesn't bother with recent trends.
Which is a big weakness, sure, but that doesn't mean it's necessarily worth tossing out.
The main critique of the model has nothing to do with it not accounting for recent trends--scroll up for a longer explanation, but the most glaring problem with the model is that it assumes the #3 team in the KRACH/PWR/RPI is the *actual* third-best team in the country, despite the incredibly high amount of randomness in a 30-game season. I believe the model is assigning Cornell (and everyone else) win probabilities based off them being ranked X after an infinite number or games rather than after 29 games. There was a study some number of years ago that found that several thousands of baseball games (way more than just a 162-game season) are necessary to determine who the best team is. The study answered a bit of a different question in a different sport, but the idea is the same: there is way too much randomness in a small season to say with a high degree of certainty who the best teams are.
An entire branch of mathematics has been beavering away at this ever since Laplace and Gauss. There are tools to determine the actual quality of models. I've seen nobody post any objective evaluation of the model in question.
Until somebody does the math this is just somebody saying "well that doesn't feel right." Math doesn't give a shit about your feelings. Do the work, cite somebody who does, or go to the water wings side of the pool.
jfeath17 did some math earlier in this thread, you should check it out
Quote from: TrotskyMath doesn't give a shit about your feelings. ...
How do you know this? Do you have some sort of mathematical model that you've tested with empirical data? ::demented::
Quote from: BearLoverQuote from: TrotskyQuote from: BearLoverQuote from: DafatoneQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
It's perfectly fine for a model that doesn't bother with recent trends.
Which is a big weakness, sure, but that doesn't mean it's necessarily worth tossing out.
The main critique of the model has nothing to do with it not accounting for recent trends--scroll up for a longer explanation, but the most glaring problem with the model is that it assumes the #3 team in the KRACH/PWR/RPI is the *actual* third-best team in the country, despite the incredibly high amount of randomness in a 30-game season. I believe the model is assigning Cornell (and everyone else) win probabilities based off them being ranked X after an infinite number or games rather than after 29 games. There was a study some number of years ago that found that several thousands of baseball games (way more than just a 162-game season) are necessary to determine who the best team is. The study answered a bit of a different question in a different sport, but the idea is the same: there is way too much randomness in a small season to say with a high degree of certainty who the best teams are.
An entire branch of mathematics has been beavering away at this ever since Laplace and Gauss. There are tools to determine the actual quality of models. I've seen nobody post any objective evaluation of the model in question.
Until somebody does the math this is just somebody saying "well that doesn't feel right." Math doesn't give a shit about your feelings. Do the work, cite somebody who does, or go to the water wings side of the pool.
jfeath17 did some math earlier in this thread, you should check it out
jfeath17's curve is interesting, but what we really need to do is figure out a way to estimate, for each team, the variance of KRACH as a function of results and number of games played. If we're going to assume gaussian distributions, we may need to work with a log-transformed KRACH value. This would allow us to make a meaningful improvement of the CHN prediction model without throwing out the basic structure of it. If i had a lot of free time, i would be willing to work on this, but i'm pretty busy.
Quote from: BearLoverQuote from: Jim HylaQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
Why? If someone wants to discuss something that you think is faulty, why do you think you have the right to tell them to stop.
You have the right to point out that you think it's faulty, YOU DO NOT HAVE THE ABILITY TO MAKE THEM STOP.
THEY ARE NOT HURTING YOU. PLEASE LEAVE THEM ALONE AND LET THEM HAVE THEIR OWN FUN.
The capital letters in my post were supposed to be funny...
Really, you're not kidding me now, are you. Based upon your past posts, I would never expect you to be funny about this. It's hard for me to believe, but if you say so, I'll go with it. You should learn to use proper emojis, that's what they're for. You see, we can't see your facial expression when you're posting.
Quote from: Jim HylaQuote from: BearLoverQuote from: Jim HylaQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
Why? If someone wants to discuss something that you think is faulty, why do you think you have the right to tell them to stop.
You have the right to point out that you think it's faulty, YOU DO NOT HAVE THE ABILITY TO MAKE THEM STOP.
THEY ARE NOT HURTING YOU. PLEASE LEAVE THEM ALONE AND LET THEM HAVE THEIR OWN FUN.
The capital letters in my post were supposed to be funny...
Really, you're not kidding me now, are you. Based upon your past posts, I would never expect you to be funny about this. It's hard for me to believe, but if you say so, I'll go with it. You should learn to use proper emojis, that's what they're for. You see, we can't see your facial expression when you're posting.
The caps were supposed to indicate playful yelling. I've noticed a lot of hostility being read into my posts here but I guess that comes with the territory of disagreeing with everybody.
Quote from: BearLoverQuote from: Jim HylaQuote from: BearLoverQuote from: Jim HylaQuote from: BearLoverQuote from: ugarteQuote from: BearLoverQuote from: KGR11Updated probabilities for ECAC teams with byes to win the ECAC championship from the pairwise probability matrix:
Cornell: 55%
Clarkson: 22%
Union: 10%
Harvard: 6%
The average team in the quarterfinal has a 12.5% probability to win the championship; Cornell is 4.4x more likely than the average QF team to win it all. That's a testament how great Cornell's season has been (or how bad the season's been for the rest of the ECAC).
It's also a testament to how bad this prediction model is. STOP CITING THIS PREDICTION MODEL
please take a xanax. i also think the prediction model wildly overstates our chances but it literally doesn't matter.
Every sports discussion literally doesn't matter, but if we're going to discuss sports we could at least stop basing those discussions off the same faulty model that people keep citing every five minutes.
Why? If someone wants to discuss something that you think is faulty, why do you think you have the right to tell them to stop.
You have the right to point out that you think it's faulty, YOU DO NOT HAVE THE ABILITY TO MAKE THEM STOP.
THEY ARE NOT HURTING YOU. PLEASE LEAVE THEM ALONE AND LET THEM HAVE THEIR OWN FUN.
The capital letters in my post were supposed to be funny...
Really, you're not kidding me now, are you. Based upon your past posts, I would never expect you to be funny about this. It's hard for me to believe, but if you say so, I'll go with it. You should learn to use proper emojis, that's what they're for. You see, we can't see your facial expression when you're posting.
The caps were supposed to indicate playful yelling. I've noticed a lot of hostility being read into my posts here but I guess that comes with the territory of disagreeing with everybody.
I'll say it again, if after being so negative on a point, you mean to be funny, then you need to make it clear.
Second, a large part of the "hostility" is your insistence that people who want to use a particular model stop doing it.
You've made your point that you don't feel it's valid. So if it makes someone else feel good to use it, LET IT GO (and I'm not trying to be funny). Let them have their fun. What harm does it do?
My purpose in posting the percentages was to add new context to the discussion of KRACH's AQ results. BearLover doesn't buy the probability attributed to Cornell to win the AQ and I wanted to outline how it compares to other bye teams' AQ probabilities. I don't think this changes anyone's opinion, but it's an interesting metric to show how KRACH judges the top 4 ECAC teams.
Of course, part of the reason Cornell's probability is so high is that they face the easiest path as far as ECAC standings go. They'd be less likely to win if the tournament didn't reseed (I believe KRACH and intuition agree on this).
Can we please create a separate thread for people to argue about mathematical models and statistics?
Quote from: KGR11My purpose in posting the percentages was to add new context to the discussion of KRACH's AQ results. BearLover doesn't buy the probability attributed to Cornell to win the AQ and I wanted to outline how it compares to other bye teams' AQ probabilities. I don't think this changes anyone's opinion, but it's an interesting metric to show how KRACH judges the top 4 ECAC teams.
Of course, part of the reason Cornell's probability is so high is that they face the easiest path as far as ECAC standings go. They'd be less likely to win if the tournament didn't reseed (I believe KRACH and intuition agree on this).
Okay, that's fair, and you make a good point about the last part. Though assuming there is no more than one upset in the first round, I don't think we have an easier path in the quarters than any of the other bye teams, since we get Q/Y. If one of the top four gets upset in the quarters, though, we would have an easier path.
In our six games this year against Union, Clarkson, and Harvard, we have a -1 goal differential (discounting Angello's empty-netter on the road vs. Harvard), and a -20 shot differential. We went 3-2-1 in those games, and each of those three teams have good goalies. So it's hard to figure we are more than a slight favorite against any of them.
Quote from: BearLoverQuote from: KGR11My purpose in posting the percentages was to add new context to the discussion of KRACH's AQ results. BearLover doesn't buy the probability attributed to Cornell to win the AQ and I wanted to outline how it compares to other bye teams' AQ probabilities. I don't think this changes anyone's opinion, but it's an interesting metric to show how KRACH judges the top 4 ECAC teams.
Of course, part of the reason Cornell's probability is so high is that they face the easiest path as far as ECAC standings go. They'd be less likely to win if the tournament didn't reseed (I believe KRACH and intuition agree on this).
Okay, that's fair, and you make a good point about the last part. Though assuming there is no more than one upset in the first round, I don't think we have an easier path in the quarters than any of the other bye teams, since we get Q/Y. If one of the top four gets upset in the quarters, though, we would have an easier path.
In our six games this year against Union, Clarkson, and Harvard, we have a -1 goal differential (discounting Angello's empty-netter on the road vs. Harvard), and a -20 shot differential. We went 3-2-1 in those games, and each of those three teams have good goalies. So it's hard to figure we are more than a slight favorite against any of them.
Or to look at it through a different lens, we're 3-1 vs Harvard and union with the only loss being by one goal on the road. We're 0-1-1 vs Clarkson with that loss being fairly lopsided, but Clarkson has fallen off a cliff.
That's not to guarantee anything, of course.
Quote from: BearLoverThough assuming there is no more than one upset in the first round, I don't think we have an easier path in the quarters than any of the other bye teams, since we get Q/Y.
That last part isn't correct. We get the lowest remaining seed. There could be just one upset--say St. Lawrence over Dartmouth or RPI over Colgate--in which case we would play St. Lawrence or RPI.
Quote from: andyw2100Quote from: BearLoverThough assuming there is no more than one upset in the first round, I don't think we have an easier path in the quarters than any of the other bye teams, since we get Q/Y.
That last part isn't correct. We get the lowest remaining seed. There could be just one upset--say St. Lawrence over Dartmouth or RPI over Colgate--in which case we would play St. Lawrence or RPI.
::doh::
The beauty of it is, what anyone thinks of Cornell's chances vs. one team or another doesn't matter. KRACH is what it is. And there is nothing better that perfectly captures PAST results. There are many "flaws" if you will to the model when it comes to projecting odds of winning future games - but I don't think they're flaws. They're just incomplete. All of the reasons stated are valid. But, as someone said, you'd need to come to the issue with actual valid math, and a better algorithm, before getting all hot and bothered about it. Until then, saying that you "feel" that 52% chance is "flawed" is just as "flawed" of an argument as anything else.
Polls are a shit-ton more flawed than this is - doesn't stop Jim from posting them :) ... I've given him grief about it in the past, but all in fun. Would never tell him to stop.
Feel free to point out things all you want. But until you have a better model, and are willing to program it, then jeebus h. criminy, let people discuss it. It does a pretty fair job of giving you a portrait of what could happen. I think everyone here (unlike many other places) is smart enough to know to take it with some grain of salt. But it's as good a guideline as you've got.
Quote from: BearLoverQuote from: andyw2100Quote from: BearLoverThough assuming there is no more than one upset in the first round, I don't think we have an easier path in the quarters than any of the other bye teams, since we get Q/Y.
That last part isn't correct. We get the lowest remaining seed. There could be just one upset--say St. Lawrence over Dartmouth or RPI over Colgate--in which case we would play St. Lawrence or RPI.
::doh::
I can't tell if the head-smack is for me, because I'm missing something, or if it's an acknowledgment of the fact that you weren't thinking completely clearly when you made the initial post.
Quote from: andyw2100Quote from: BearLoverQuote from: andyw2100Quote from: BearLoverThough assuming there is no more than one upset in the first round, I don't think we have an easier path in the quarters than any of the other bye teams, since we get Q/Y.
That last part isn't correct. We get the lowest remaining seed. There could be just one upset--say St. Lawrence over Dartmouth or RPI over Colgate--in which case we would play St. Lawrence or RPI.
::doh::
I can't tell if the head-smack is for me, because I'm missing something, or if it's an acknowledgment of the fact that you weren't thinking completely clearly when you made the initial post.
Decide if you are right or not and give BearLover credit for the appropriate response.
Quote from: adamwThe beauty of it is, what anyone thinks of Cornell's chances vs. one team or another doesn't matter. KRACH is what it is. And there is nothing better that perfectly captures PAST results. There are many "flaws" if you will to the model when it comes to projecting odds of winning future games - but I don't think they're flaws. They're just incomplete. All of the reasons stated are valid. But, as someone said, you'd need to come to the issue with actual valid math, and a better algorithm, before getting all hot and bothered about it. Until then, saying that you "feel" that 52% chance is "flawed" is just as "flawed" of an argument as anything else.
Polls are a shit-ton more flawed than this is - doesn't stop Jim from posting them :) ... I've given him grief about it in the past, but all in fun. Would never tell him to stop.
Feel free to point out things all you want. But until you have a better model, and are willing to program it, then jeebus h. criminy, let people discuss it. It does a pretty fair job of giving you a portrait of what could happen. I think everyone here (unlike many other places) is smart enough to know to take it with some grain of salt. But it's as good a guideline as you've got.
Agreed. KRACH does an awesome job of ranking teams based on games played. For forecasting games, I think jfeath17's work could take it to the next level. In her logistic regression, the closer the KRACH winning percentage is to 100, the greater the difference between the KRACH winning percentage and the outcome winning percentage. I think this makes a lot of sense: if a team has a perfect record, they can still lose future games (example: 2007 Patriots and 2015 Kentucky basketball), so a team with a nearly perfect record should also have a lower probability of winning future games.
I think the biggest challenge for jfeath17 is that there's only 2 years of data. I think the next step is gather data for a couple of years and see how stable that logistic regression is year-to-year. jfeath17, I'd be interested to see a more detailed procedure that you used. That way, if you ended up stepping back, someone else could try this.
Quote from: KGR11Quote from: adamwThe beauty of it is, what anyone thinks of Cornell's chances vs. one team or another doesn't matter. KRACH is what it is. And there is nothing better that perfectly captures PAST results. There are many "flaws" if you will to the model when it comes to projecting odds of winning future games - but I don't think they're flaws. They're just incomplete. All of the reasons stated are valid. But, as someone said, you'd need to come to the issue with actual valid math, and a better algorithm, before getting all hot and bothered about it. Until then, saying that you "feel" that 52% chance is "flawed" is just as "flawed" of an argument as anything else.
Polls are a shit-ton more flawed than this is - doesn't stop Jim from posting them :) ... I've given him grief about it in the past, but all in fun. Would never tell him to stop.
Feel free to point out things all you want. But until you have a better model, and are willing to program it, then jeebus h. criminy, let people discuss it. It does a pretty fair job of giving you a portrait of what could happen. I think everyone here (unlike many other places) is smart enough to know to take it with some grain of salt. But it's as good a guideline as you've got.
Agreed. KRACH does an awesome job of ranking teams based on games played. For forecasting games, I think jfeath17's work could take it to the next level. In her logistic regression, the closer the KRACH winning percentage is to 100, the greater the difference between the KRACH winning percentage and the outcome winning percentage. I think this makes a lot of sense: if a team has a perfect record, they can still lose future games (example: 2007 Patriots and 2015 Kentucky basketball), so a team with a nearly perfect record should also have a lower probability of winning future games.
I think the biggest challenge for jfeath17 is that there's only 2 years of data. I think the next step is gather data for a couple of years and see how stable that logistic regression is year-to-year. jfeath17, I'd be interested to see a more detailed procedure that you used. That way, if you ended up stepping back, someone else could try this.
Two things are going on here. One is regression toward the mean (https://www.wikiwand.com/en/Regression_toward_the_mean), which is a valid observation because certain statistics necessarily behave this way. The other is the idea that if something happens once the probability of it happening again is lower (https://www.youtube.com/watch?v=DBSAeqdcZAM).This is a fallacy.
Quote from: SwampyTwo things are going on here. One is regression toward the mean (https://www.wikiwand.com/en/Regression_toward_the_mean), which is a valid observation because certain statistics necessarily behave this way. The other is the idea that if something happens once the probability of it happening again is lower (https://www.youtube.com/watch?v=DBSAeqdcZAM).This is a fallacy.
Isn't the problem with regression towards the mean, knowing what the mean is? It's not the same for every team.
Just another call by the way - begging for any all of you who have ideas, to come with me and work on an enhanced model for next year. I'd be more than happy to publish it. Preferably more than one of you, so you can peer review each other :)
Quote from: adamwQuote from: SwampyTwo things are going on here. One is regression toward the mean (https://www.wikiwand.com/en/Regression_toward_the_mean), which is a valid observation because certain statistics necessarily behave this way. The other is the idea that if something happens once the probability of it happening again is lower (https://www.youtube.com/watch?v=DBSAeqdcZAM).This is a fallacy.
Isn't the problem with regression towards the mean, knowing what the mean is? It's not the same for every team.
Just another call by the way - begging for any all of you who have ideas, to come with me and work on an enhanced model for next year. I'd be more than happy to publish it. Preferably more than one of you, so you can peer review each other :)
Well, probability and statistics has different levels of reality. The most obvious is observed empirical data. But there's also an assumption of an underlying, unobserved-but-real process that has certain probabilistic outcomes. But this applies to individual teams as well as to all teams in combination. Unless an individual team's true mean is a perfect season, which implies its probability of winning every individual game = 1.0, then since individual teams will regress towards their
own means, an undefeated team will regress towards its own mean, which is < 1.0.
I don't like to explain this kind of stuff by saying things like, "assume Team X were to replay the season over 10,000 times" because it misrepresents what's actually going on with the math, and it concretizes what's actually an abstract, mathematical conceptualization. But let's do this for now.
Assume the actual probability distribution of Cornell's 1970 team going undefeated has an expected value of 0.95. In other words, if the team could replay the season an infinite number of times, 95% of the time it would go undefeated.
Since the quality of opponents varies each game, the probabilities of winning the individual games vary too. But for any given game there are two probabilities of interest. If Cornell is playing Harvard, for example, there's the probability Cornell will beat Harvard. If these given Cornell and Harvard teams were to play each other an infinite number of times, there's a certain underlying probability that Cornell would beat Harvard, but unless the distribution function of that probability has zero variance, Harvard has a non-zero probability of winning sometimes. (See my earlier post on variance.) The mean of the distribution function, its expected value, is the expected percentage of the time that Cornell would win. Suppose it really is 0.9, but after the first 10,000 games Cornell hasn't lost yet. Then, statistical theory says the tendency going forward would be for Cornell to lose because the mean really 0.9 and not 1.0, and one can prove mathematically that outcomes of probabilistic processes regress towards the mean.
The other probability is the probability of winning x games out of N games played. If x=N, then it's the probability of being undefeated at game N. The same logic applies. Knowing the "true" probabilities of winning an individual games against given opponents, we have probability distributions of winning Game 1, Game 2, etc. From these, we can construct a new variable, the probability of winning x games out of N. Again, unless this probability is 100%, then regression towards the mean implies that there's a higher likelihood an undefeated team will lose rather than win. This is because the expected value of the number of wins as of Game N, i.e. the mean, is < 1 but the number of wins up to that point = 1.
Quote from: adamwBut, as someone said, you'd need to come to the issue with actual valid math, and a better algorithm, before getting all hot and bothered about it. Until then, saying that you "feel" that 52% chance is "flawed" is just as "flawed" of an argument as anything else.
I think this discussion is getting old too, but since some people keep saying those criticizing the model are doing so based on "feel," I just want to say that we really aren't. (a) jfeath17 already showed KRACH overstates the chances of higher-ranked teams winning an individual game. (b) When combining several artificially inflated individual probabilities together (Cornell's chances of winning the quarters, semis, and finals) to form one joint probability (Cornell winning the ECAC tournament), you end up with a very, very overly inflated likelihood (the 55% chance of Cornell winning the ECAC). (c) There are no betting odds for any NHL game that come close to the odds this model is assigning many games every weekend.
Quote from: KGR11There may be confidence questions with jfeath17's model (ideally, we'd want more than 2 years of data), but it answers the question that's been raised: it takes the KRACH reported winning percentage and turns into the winning percentage that actually happened.
Right, changing the KRACH-inferred winning percentage to empirically based winning percentages would fix this problem with the model.
To make the model even more accurate would require throwing out KRACH or any ranking system that looks at only wins and losses, and instead measuring a team by goal differential, or better yet, shot differential (and adjusting for strength of schedule), but that's beyond the scope of my specific gripe with the model. (This isn't to say that ranking teams for tournament seeding/qualification purposes should look at anything other than wins/losses--KRACH is still the best at that.)
Quote from: andyw2100I can't tell if the head-smack is for me, because I'm missing something, or if it's an acknowledgment of the fact that you weren't thinking completely clearly when you made the initial post.
Me being dumb.
Quote from: BearLoverRight, changing the KRACH-inferred winning percentage to empirically based winning percentages would fix this problem with the model.
Is KRACH not empirically based?
Quote from: BearLoverTo make the model even more accurate would require throwing out KRACH or any ranking system that looks at only wins and losses, and instead measuring a team by goal differential, or better yet, shot differential (and adjusting for strength of schedule), but that's beyond the scope of my specific gripe with the model. (This isn't to say that ranking teams for tournament seeding/qualification purposes should look at anything other than wins/losses--KRACH is still the best at that.)
It is not certain that looking at things beyond wins and losses is any better. Goal differential has major flaws, and might not mean much. Shot differential has its own issues, but could be a decent factor. Honestly, I'm not all that interested in things like goal and shot differential.
I wonder, is there anything like KRACH in any competitive team sport that anyone here believes does a good job predicting outcomes of individual games?
Quote from: SwampyI wonder, is there anything like KRACH in any competitive team sport that anyone here believes does a good job predicting outcomes of individual games?
SRS is used by Sports Reference and its family of sites - which I also work for. It's essentially KRACH but with score differential taken into account. I don't necessarily think that's better or worse. I've suggested they add something that gives less weight to the difference as it increases. It's being considered. Otherwise, I don't know.
it would be interesting to run a KRACH computation for last year's full NHL regular season, for example, and then see if the numbers pass people's gut checks or not.
Quote from: SwampyAssume the actual probability distribution of Cornell's 1970 team going undefeated has an expected value of 0.95. In other words, if the team could replay the season an infinite number of times, 95% of the time it would go undefeated.
Why would you assume such a high number?
To achieve an expected value for the season of .95, you would have to assume a probability of winning each game is over .998.
If we assume the probability of Cornell '70 winning any game is .95, then the probability of an undefeated season is .226, and (don't pillory me please) that
feels better.
Quote from: BearLoverI think this discussion is getting old too, but since some people keep saying those criticizing the model are doing so based on "feel," I just want to say that we really aren't. (a) jfeath17 already showed KRACH overstates the chances of higher-ranked teams winning an individual game. (b) When combining several artificially inflated individual probabilities together (Cornell's chances of winning the quarters, semis, and finals) to form one joint probability (Cornell winning the ECAC tournament), you end up with a very, very overly inflated likelihood (the 55% chance of Cornell winning the ECAC). (c) There are no betting odds for any NHL game that come close to the odds this model is assigning many games every weekend.
Just to be clear, jfeath17's analysis shows that KRACH overstates the winning percentage for higher ranked teams if the KRACH winning % is greater than 65%. Otherwise, KRACH slightly underestimates the favorite team. Because of Cornell's impressive record, this nuance would only impact games against teams as good or better than Clarkson. It'd be interesting to see if that holds true for future seasons.
Using jfeath17's work would imply that the "true" probability of a game's outcome is based on the historical results of games with a similar KRACH difference between opponents. This assumption cannot be correct because each team is different, but how incorrect it is is dependent on the variability.
The current probability matrix says "What happens if teams maintain their current winning percentage going forward (adjusted for strength of schedule)?"
Using jfeath17's methodology, it would says "What happens if teams perform as well as the historical average team with similar KRACH ratings?"
Both are great questions with meaningful answers, but for forecasting, I would trust a matrix with jfeath17's adjustment slightly more than the current probability matrix (with the
BIG assumption that the variation in results isn't ridiculous).
Quote from: nshapiroQuote from: SwampyAssume the actual probability distribution of Cornell's 1970 team going undefeated has an expected value of 0.95. In other words, if the team could replay the season an infinite number of times, 95% of the time it would go undefeated.
Why would you assume such a high number?
To achieve an expected value for the season of .95, you would have to assume a probability of winning each game is over .998.
If we assume the probability of Cornell '70 winning any game is .95, then the probability of an undefeated season is .226, and (don't pillory me please) that feels better.
t
It was for expository, illustrative purposes only. Tha's why I said, "Assume ... ."
In general, when we talk about using any instrument for prediction, we're making very strong
ceteris paribus assumptions. We're assuming or predicting that
nothing that can affect the outcome changes.
Ironically, this may be legitimate for sports if we use fewer years rather than more. The more years we use, the greater the possibility of relevant differences -- rules changes, differences in conditioning, emphasis on speed vs size, injuries, etc.
Quote from: SwampyIronically, this may be legitimate for sports if we use fewer years rather than more. The more years we use, the greater the possibility of relevant differences -- rules changes, differences in conditioning, emphasis on speed vs size, injuries, etc.
Over time, I would expect that effect to be somewhat tempered by other multi-year factors such as recruiting parity and a more even distribution of injuries.
Quote from: KGR11Quote from: BearLoverI think this discussion is getting old too, but since some people keep saying those criticizing the model are doing so based on "feel," I just want to say that we really aren't. (a) jfeath17 already showed KRACH overstates the chances of higher-ranked teams winning an individual game. (b) When combining several artificially inflated individual probabilities together (Cornell's chances of winning the quarters, semis, and finals) to form one joint probability (Cornell winning the ECAC tournament), you end up with a very, very overly inflated likelihood (the 55% chance of Cornell winning the ECAC). (c) There are no betting odds for any NHL game that come close to the odds this model is assigning many games every weekend.
Just to be clear, jfeath17's analysis shows that KRACH overstates the winning percentage for higher ranked teams if the KRACH winning % is greater than 65%. Otherwise, KRACH slightly underestimates the favorite team. Because of Cornell's impressive record, this nuance would only impact games against teams as good or better than Clarkson. It'd be interesting to see if that holds true for future seasons.
Using jfeath17's work would imply that the "true" probability of a game's outcome is based on the historical results of games with a similar KRACH difference between opponents. This assumption cannot be correct because each team is different, but how incorrect it is is dependent on the variability.
The current probability matrix says "What happens if teams maintain their current winning percentage going forward (adjusted for strength of schedule)?"
Using jfeath17's methodology, it would says "What happens if teams perform as well as the historical average team with similar KRACH ratings?"
Both are great questions with meaningful answers, but for forecasting, I would trust a matrix with jfeath17's adjustment slightly more than the current probability matrix (with the BIG assumption that the variation in results isn't ridiculous).
Well said. Some people are taking jfeath's work as gospel, when in fact all we got was a simple chart showing krach win% v actual for specific, arbitrary bands. Noone double-checked the work, either.
Quote from: abmarksit would be interesting to run a KRACH computation for last year's full NHL regular season, for example, and then see if the numbers pass people's gut checks or not.
This is honestly an unnecessary exercise. For past results, it's hard to improve on KRACH. The KRACH ratings, if you played the schedule that already happened, would come out to the actual results. That's the whole point of KRACH's existence.
Just for youse guys - I was able to work up NHL KRACH - for sh**s and giggles.
Rating RRWP W-L-T Pct Ratio SOS
1 TBL 189.7 .6555 40-19-6 .6615 1.955 97.0
2 VEG 189.4 .6552 39-20-4 .6508 1.864 101.7
3 NSH 177.5 .6401 35-18-10 .6349 1.739 102.0
4 BOS 154.6 .6076 36-21-5 .6210 1.638 94.4
5 WPG 147.0 .5953 35-24-4 .5873 1.423 103.3
6 TOR 127.3 .5603 33-25-8 .5606 1.276 99.8
7 MIN 126.9 .5595 33-26-5 .5547 1.246 101.9
8 PIT 124.6 .5548 34-27-4 .5538 1.241 100.4
9 DAL 122.6 .5509 32-26-6 .5469 1.207 101.6
10 LAK 119.2 .5438 34-29-2 .5385 1.167 102.2
11 COL 118.5 .5424 33-28-2 .5397 1.172 101.1
12 PHI 118.2 .5418 32-25-7 .5547 1.246 94.9
13 WSH 117.4 .5402 33-27-4 .5469 1.207 97.3
14 SJS 111.0 .5261 31-27-7 .5308 1.131 98.1
15 STL 110.7 .5254 32-30-3 .5154 1.063 104.1
16 CGY 109.1 .5218 30-28-7 .5154 1.063 102.6
17 ANA 104.9 .5121 27-26-11 .5078 1.032 101.7
18 NJD 100.8 .5020 29-27-8 .5156 1.065 94.7
19 FLA 100.6 .5016 28-28-5 .5000 1.000 100.6
20 CBJ 92.8 .4812 26-28-10 .4844 .939 98.7
21 NYI 80.5 .4458 26-33-5 .4453 .803 100.2
22 NYR 78.3 .4389 25-32-7 .4453 .803 97.5
23 CAR 77.8 .4375 25-33-6 .4375 .778 100.1
24 CHI 75.3 .4295 26-36-2 .4219 .730 103.2
25 EDM 70.6 .4135 24-36-4 .4062 .684 103.1
26 VAN 68.4 .4061 24-37-3 .3984 .662 103.3
27 MTL 65.7 .3963 22-34-7 .4048 .680 96.6
28 DET 62.1 .3828 22-36-5 .3889 .636 97.6
29 OTT 59.1 .3711 19-35-8 .3710 .590 100.2
30 ARI 52.4 .3431 18-39-6 .3333 .500 104.8
31 BUF 46.8 .3177 20-43-1 .3203 .471 99.2
The range is obviously much more narrow, which would make odds much lower even for top teams.
Compare to current NCAA range ... 533 to 15
That really quantifies how much Buffalo sucks.
Quote from: adamwQuote from: abmarksit would be interesting to run a KRACH computation for last year's full NHL regular season, for example, and then see if the numbers pass people's gut checks or not.
This is honestly an unnecessary exercise. For past results, it's hard to improve on KRACH. The KRACH ratings, if you played the schedule that already happened, would come out to the actual results. That's the whole point of KRACH's existence.
How about this exercise. Number all the NC$$ games in chron order. Calculate KRACH from the odd number games. Now compare how the even numbered games turned out against KRACH "predictions."
I know, I know, I know that KRACH reviews a data set and is not designed to be predictive. But... can you do that for shits and giggles? I'm not even sure how we'd interpret the results. What constitutes a reliable or unreliable percentage of accuracy? I mean, hopefully it's over 50%. Hopefully it's better than just taking whoever has the better winning percentage excluding games against each other.
Another method: start on game 1 and just march through the list constantly recalculating KRACH and using that as the prediction against the next game. Or since obviously KRACH gets better as the season goes on, iterate through say the first 10% of games and only start predicting after that. Now get your accuracy score. That's
truly abusing KRACH as predictive. :)
Quote from: adamwIs KRACH not empirically based?
The future win probabilities inferred from KRACH aren't, because they're not verifiable by observation/experience.
Quote from: adamwIt is not certain that looking at things beyond wins and losses is any better. Goal differential has major flaws, and might not mean much. Shot differential has its own issues, but could be a decent factor. Honestly, I'm not all that interested in things like goal and shot differential.
I think in the hockey analytics world it actually
is pretty certain that looking at things beyond wins and losses is better. (http://grantland.com/the-triangle/the-nhls-analytics-awakening/)
I think it's helpful to think of it this way: only looking at game outcomes is a very small sample size. Goals, of which there are several in the average game, is a bigger sample. Shots is the biggest sample of all. And, in fact, shot differential, in serving as the best proxy we have for possession, does a tremendous job (https://deadspin.com/this-wonderful-graphic-proves-that-in-the-nhl-puck-pos-470045959) of measuring the strength of a team and thereby predicting the outcome of a hockey game.
This discussion often comes up on here when Cornell has a better record than its possession numbers would suggest and half of ELynah thinks the team will regress and the other half thinks they won't. Which necessarily leads to a discussion of whether possession is the be-all-end-all in college like it is in the pros. I won't rehash the arguments here, but shot differential is still so heavily correlated with wins in college that I highly doubt there exists a more predictive stat over the course of a regular season than shot differential.
Quote from: abmarksWell said. Some people are taking jfeath's work as gospel, when in fact all we got was a simple chart showing krach win% v actual for specific, arbitrary bands. Noone double-checked the work, either.
jfeath's work is just a very preliminary exercise that confirms what many of us suspected when we looked at some of the numbers these prediction models were pumping out. It also comports with NHL betting odds. (https://www.oddsshark.com/nhl/odds) NHL betting odds almost never give a team a less than a 1-in-3 chance of winning. Yet a few weeks ago the KRACH-based model was giving Cornell an 80% chance of beating Union and Harvard!
Quote from: TrotskyQuote from: adamwQuote from: abmarksit would be interesting to run a KRACH computation for last year's full NHL regular season, for example, and then see if the numbers pass people's gut checks or not.
This is honestly an unnecessary exercise. For past results, it's hard to improve on KRACH. The KRACH ratings, if you played the schedule that already happened, would come out to the actual results. That's the whole point of KRACH's existence.
How about this exercise. Number all the NC$$ games in chron order. Calculate KRACH from the odd number games. Now compare how the even numbered games turned out against KRACH "predictions."
I know, I know, I know that KRACH reviews a data set and is not designed to be predictive. But... can you do that for shits and giggles? I'm not even sure how we'd interpret the results. What constitutes a reliable or unreliable percentage of accuracy? I mean, hopefully it's over 50%. Hopefully it's better than just taking whoever has the better winning percentage excluding games against each other.
Another method: start on game 1 and just march through the list constantly recalculating KRACH and using that as the prediction against the next game. Or since obviously KRACH gets better as the season goes on, iterate through say the first 10% of games and only start predicting after that. Now get your accuracy score. That's truly abusing KRACH as predictive. :)
I think your second method is essentially what jfeath did, right?
Sometimes I think it's pronounced krock.
Quote from: BearLoverQuote from: adamwIs KRACH not empirically based?
The future win probabilities inferred from KRACH aren't, because they're not verifiable by observation/experience.
What future probabilities of any kind are verifiable?
Quote from: BearLoverQuote from: adamwIt is not certain that looking at things beyond wins and losses is any better. Goal differential has major flaws, and might not mean much. Shot differential has its own issues, but could be a decent factor. Honestly, I'm not all that interested in things like goal and shot differential.
I think in the hockey analytics world it actually is pretty certain that looking at things beyond wins and losses is better. (http://grantland.com/the-triangle/the-nhls-analytics-awakening/)
There is no need to quote me articles about analytics. I deal with NHL analytics all day for my "real" job. The analytics community has also, finally, thank goodness, moved beyond its original rudimentary hypotheses about how hockey works. Shot differential as a proxy for possession was a nice tool in the toolbelt, but had/has a long way to go to create real understanding. There is shot quality, location data, rolling score effects, etc... finally being taken into consideration, and of course there is a lot in hockey that simply can't be measured yet. So while shot differential displayed some correlation to better predicting wins/losses than past wins and losses, it's really very rudimentary and there's plenty more to do.
However, you glossed over the fact that I said "goal differential" first. I don't know of any model that takes into account shot differential in ranking systems. Goal differential is another thing. There have been plenty of them that do. And that debate has gone on forever. My point was that goal differential has numerous flaws when it comes to hockey team ratings, which is why it's perfectly valid to ignore it when it comes to ratings systems, and probably predictive models. I also clearly said that shot differential could be a "decent factor" but has issues. So I'm not sure why the need to inform me that looking beyond wins and losses may be better. Well aware. No one ever said otherwise.
Quote from: BearLoverI think it's helpful to think of it this way: only looking at game outcomes is a very small sample size. Goals, of which there are several in the average game, is a bigger sample. Shots is the biggest sample of all. And, in fact, shot differential, in serving as the best proxy we have for possession, does a tremendous job (https://deadspin.com/this-wonderful-graphic-proves-that-in-the-nhl-puck-pos-470045959) of measuring the strength of a team and thereby predicting the outcome of a hockey game.
Please stop with the Analytic-splaining ... Believe me, we all understand about sample sizes. Again, goal differential in hockey is flawed. The greater sample size there is not necessarily an improvement. And I would not call shot differential metrics doing a "tremendous job" ... It does a better job. Not a tremendous job. There are more factors. But sure, on the team level, it holds some weight. If it can be incorporated into predictive models, then great. But it's not a panacea.
Quote from: BearLoverIt also comports with NHL betting odds. (https://www.oddsshark.com/nhl/odds) NHL betting odds almost never give a team a less than a 1-in-3 chance of winning. Yet a few weeks ago the KRACH-based model was giving Cornell an 80% chance of beating Union and Harvard!
There's nothing more flawed than quoting betting odds, which bear no resemblance to anything except where money goes. I'm pretty sure the NHL KRACH figures I posted earlier demonstrate that the variance between NHL teams is far smaller than college teams. So comparing betting odds of NHL games to college possibilities is silly. Of course betting odds are never that wide on NHL games.
Again - we all get that there are better ways to do things. But I don't understand all the bellyaching about it. Many of your solutions have plenty of issues themselves. Come up with a model, and lay out the math, have it reviewed for problems, and I'll be more than happy to put it together. I haven't seen anyone be willing to do that yet.
QuoteWhat future probabilities of any kind are verifiable?
What I mean to say is that the probabilities derived purely from KRACH are not back-tested against actual results, in the way 538's and a model like this one's (https://www.theglobeandmail.com/sports/hockey/2017-18-nhl-predictions-intro/article37609229/) are. The KRACH-based model instead takes what has happened in the past and extrapolates it into the future, and no one even checks to see how much it misses by.
Quote from: adamwThere is no need to quote me articles about analytics. I deal with NHL analytics all day for my "real" job. The analytics community has also, finally, thank goodness, moved beyond its original rudimentary hypotheses about how hockey works. Shot differential as a proxy for possession was a nice tool in the toolbelt, but had/has a long way to go to create real understanding. There is shot quality, location data, rolling score effects, etc... finally being taken into consideration, and of course there is a lot in hockey that simply can't be measured yet. So while shot differential displayed some correlation to better predicting wins/losses than past wins and losses, it's really very rudimentary and there's plenty more to do.
I agree with all of this. But just because shots as a statistic is rudimentary doesn't mean it isn't the best tool we have right now.
Quote from: adamwHowever, you glossed over the fact that I said "goal differential" first. I don't know of any model that takes into account shot differential in ranking systems. Goal differential is another thing. There have been plenty of them that do. And that debate has gone on forever. My point was that goal differential has numerous flaws when it comes to hockey team ratings, which is why it's perfectly valid to ignore it when it comes to ratings systems, and probably predictive models. I also clearly said that shot differential could be a "decent factor" but has issues. So I'm not sure why the need to inform me that looking beyond wins and losses may be better. Well aware. No one ever said otherwise.
Again, you're conflating "is rudimentary/has its own sets of issues" with "is worse." Goals and shots aren't perfect, but they're better predictors than wins.
Quote from: adamwPlease stop with the Analytic-splaining ...
Sorry about the analytic-splaining, but I can't recall an analytics study/article in the past six or so years I've been following this that concluded wins/losses is a better metric of future success than goals, and I don't know if I even recall an article/study that concluded goals were a better metric than shots. In fact, it seems every article/study leads with the assumption (http://www.sloansportsconference.com/wp-content/uploads/2012/02/NHL-Expected-Goals-Brian-Macdonald.pdf) that Corsi/Fenwick is the best predictor we currently have, and goes from there. So you writing that you are not interested in a model that looks at goals/shots rather than wins suggested to me you aren't as familiar with current prediction models. I was wrong about your lack of familiarity, so you are welcome to post things that would back up your disdain for shot differential as a predictive stat relative to win% and goal-differential.
Quote from: adamwThere's nothing more flawed than quoting betting odds, which bear no resemblance to anything except where money goes.
I posted betting odds because I wasn't aware of an actual NHL prediction model that gave probabilities for individual games. I've since found one (https://www.theglobeandmail.com/sports/hockey/nhl-predictions-2017-2018/article37590570/), and it turns out I was wrong about the upper bounds of hockey probabilities: while the majority of its predictions are closer to 50% than those of KRACH, this model yields probabilities for certain games that are as high as 80%. The model seems to care less about a team's entire body of work and instead weights factors such as recent performance and starting goalie quality very highly--though it doesn't appear to release its entire methodology. The model claims to be about 60% accurate, which is about as good as it gets for NHL prediction models. On the other hand, the fact that this model is deriving these relatively lopsided probabilities not from total record but from all these other stats doesn't really help the case for a KRACH model, which looks only at past win% (and in fact there are a lot of probabilities from this model that show a matchup between teams with similar records as lopsided).
Quote from: adamwI'm pretty sure the NHL KRACH figures I posted earlier demonstrate that the variance between NHL teams is far smaller than college teams.
Yes, this is true. But is the gap between Cornell and Union/Harvard as big as the largest gap between any two NHL teams?
Quote from: KGR11Quote from: TrotskyAnother method: start on game 1 and just march through the list constantly recalculating KRACH and using that as the prediction against the next game. Or since obviously KRACH gets better as the season goes on, iterate through say the first 10% of games and only start predicting after that. Now get your accuracy score. That's truly abusing KRACH as predictive. :)
I think your second method is essentially what jfeath did, right?
Yes, that is basically what I did except I stepped through week by week and started in January. I now have updated data which includes 4 seasons and steps through on a daily basis starting in January.
Quote from: BearLoverQuote from: adamwIs KRACH not empirically based?
The future win probabilities inferred from KRACH aren't, because they're not verifiable by observation/experience.
What future probabilities of any kind are verifiable?
While you can't perfectly verify a prediction model, you can get an idea of its performance by separating the past data into training and testing sets. The fact that we are trying to predict probabilities and not simple classification does make it much more difficult to evaluate the performance. For classification problems the predictor is either right or wrong so it is easy to state a accuracy percentage. We however cannot directly observe the outcome probability of some matchup but only outcome of one trial of this matchup. This brings us to what I am attempting to do. By looking at the outcomes of many games with a similar krach predicted winning percentage we can come up with an estimate for the actual winning percentage of a team in this matchup.
My methodology for this was to use a gaussian weighted average of the games centered at varying krach probabilities. I calculated the winning percentage using these weights. I also used the weights to come up with average krach probability (this doesn't necessarily line up with the center of the gaussian particularly at the endpoints where all the games are to one side or the other). This is basically the logical extrapolation of the binning that was suggested earlier in the thread. The binning was actually the first analysis I did but the data didn't look good. I think the major improvement here is not that I am using the gaussian to come up with weights (that is probably overkill), but that I am using the weighted average of the krach probabilities rather than the center of the bin. What this trend line looks like can be changed significantly by changing the std dev of the gaussian (effectively changing the bin size). Basically the larger the bin the more underfit and the smaller the more overfit.
I also sought to measure the performance of KRACH in another way by looking at the R
2 (Coefficient of Determination) The Wikipedia page is a pretty good explanation of this. The R^2 value can be looked at as the percentage of variance in the dependent variable (game outcomes) that can be explained by the independent variable (krach probability, etc..). These probabilities are all very low which makes sense since there is a lot of variability in the outcome of hockey games.
Independent Variable | R^2
--------------------------------------------------
Krach Probability | 0.023
Logistic Regression | 0.100
Linear Fit on Gaussian Average (0.1) | 0.112
(y=.749x+.126)
Another improvement I made was to include the inverse of each game (prob = 1-prob and swap wins and losses). This improves the fit around 0.5 since it is a little nonsensical if the matchup of two equal KRACH teams is not 0.5 in a model only dependent on KRACH.
One final point which I think has been established at this point, but I want to make sure we are all on the same page. Predictive models are going to have some subjectivity built into them. It is great that KRACH has no subjectivity and is a mathematically pure ranking for its goal of NCAA seeding since that needs to be "fair" and should be based on the actual outcomes of the games. However when creating a predictive model, we unfortunately do not have the luxury of a mathematically pure system. There are parameters and methods that must be chosen both when designing a predictor and measuring the performance. It is the designers goal to choose these such that the predictor is not over/underfit or have any bias's built in.
Quote from: BearLoverAgain, you're conflating "is rudimentary/has its own sets of issues" with "is worse." Goals and shots aren't perfect, but they're better predictors than wins.
Actually, I think you're the only one conflating anything, because I never said "worse" - so I'm not sure where this is coming from.
This thread is very enjoyable to me, and I want to hear all of these things. And I'm pretty sure I've been clear that KRACH can be improved upon. And I agree with what you say, in general. What has rubbed me the wrong way with your posts is your unnecessary (to me) vehemence against KRACH, the twisting of what I've said, and the high-level of self-confidence in what you're saying. A little humility is helpful here because no one really knows.
Quote from: BearLoverSorry about the analytic-splaining, but I can't recall an analytics study/article in the past six or so years I've been following this that concluded wins/losses is a better metric of future success than goals, and I don't know if I even recall an article/study that concluded goals were a better metric than shots. In fact, it seems every article/study leads with the assumption (http://www.sloansportsconference.com/wp-content/uploads/2012/02/NHL-Expected-Goals-Brian-Macdonald.pdf) that Corsi/Fenwick is the best predictor we currently have, and goes from there. So you writing that you are not interested in a model that looks at goals/shots rather than wins suggested to me you aren't as familiar with current prediction models. I was wrong about your lack of familiarity, so you are welcome to post things that would back up your disdain for shot differential as a predictive stat relative to win% and goal-differential.
The problem is you continuing to insist I said things that I never said. You are claiming that I am against shots/goals differential, when I never said anything of the sort. I just said it has its own issues. Particularly goal differential. Is a 6-2 win more indicative of quality than 6-4? Not necessarily in my opinion, and I think studies of such are flawed. Of course, there's empty-net goals to take into account, which is unique to hockey. Of course, those can be weeded out.
The older Analytics studies have not just been improved upon - in some cases, they have been downright contradicted. There is too much more involved in hockey than to be so confident in what these things are telling us so far. I've had issues with these studies from day one. And thankfully now they are being improved upon and we are finding out different things. Heck, Bill James spent most of his career telling everyone that clutch didn't exist, and now he spent the last year telling everyone that we should taken into account situational hitting when it comes to voting for the MVP Award.
So again, all I'm saying is, there is room for all of this in the discussion - but the insistence that you "know" "for a fact" that you are right with this, and the KRACH is just a load of garbage, is where it rubs me the wrong way.
Quote from: BearLoverI posted betting odds because I wasn't aware of an actual NHL prediction model that gave probabilities for individual games. I've since found one (https://www.theglobeandmail.com/sports/hockey/nhl-predictions-2017-2018/article37590570/), and it turns out I was wrong about the upper bounds of hockey probabilities
My site, hockey-reference.com, publishes Win Probabilities every day. As I mentioned upthread, it's based upon SRS, which is similar to KRACH, but takes into account goal differential.[/quote]
Quote from: BearLoverYes, this is true. But is the gap between Cornell and Union/Harvard as big as the largest gap between any two NHL teams?
Doubtful, but I don't think anyone here said it was. We also, however, don't know that, either way.
Quote from: jfeath17One final point which I think has been established at this point, but I want to make sure we are all on the same page. Predictive models are going to have some subjectivity built into them. It is great that KRACH has no subjectivity and is a mathematically pure ranking for its goal of NCAA seeding since that needs to be "fair" and should be based on the actual outcomes of the games. However when creating a predictive model, we unfortunately do not have the luxury of a mathematically pure system. There are parameters and methods that must be chosen both when designing a predictor and measuring the performance. It is the designers goal to choose these such that the predictor is not over/underfit or have any bias's built in.
Absolutely. That is why I think it's a mistake to be so vehement about some better model, and so dismissive of what the current model is doing. There is something better, I'm sure, but exactly what is debatable.
BTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
Quote from: jfeath17While you can't perfectly verify a prediction model, you can get an idea of its performance by separating the past data into training and testing sets. The fact that we are trying to predict probabilities and not simple classification does make it much more difficult to evaluate the performance. For classification problems the predictor is either right or wrong so it is easy to state a accuracy percentage. We however cannot directly observe the outcome probability of some matchup but only outcome of one trial of this matchup. This brings us to what I am attempting to do. By looking at the outcomes of many games with a similar krach predicted winning percentage we can come up with an estimate for the actual winning percentage of a team in this matchup.
My methodology for this was to use a gaussian weighted average of the games centered at varying krach probabilities. I calculated the winning percentage using these weights. I also used the weights to come up with average krach probability (this doesn't necessarily line up with the center of the gaussian particularly at the endpoints where all the games are to one side or the other). This is basically the logical extrapolation of the binning that was suggested earlier in the thread. The binning was actually the first analysis I did but the data didn't look good. I think the major improvement here is not that I am using the gaussian to come up with weights (that is probably overkill), but that I am using the weighted average of the krach probabilities rather than the center of the bin. What this trend line looks like can be changed significantly by changing the std dev of the gaussian (effectively changing the bin size). Basically the larger the bin the more underfit and the smaller the more overfit.
I also sought to measure the performance of KRACH in another way by looking at the R2 (Coefficient of Determination) The Wikipedia page is a pretty good explanation of this. The R^2 value can be looked at as the percentage of variance in the dependent variable (game outcomes) that can be explained by the independent variable (krach probability, etc..). These probabilities are all very low which makes sense since there is a lot of variability in the outcome of hockey games.
Independent Variable | R^2
--------------------------------------------------
Krach Probability | 0.023
Logistic Regression | 0.100
Linear Fit on Gaussian Average (0.1) | 0.112
(y=.749x+.126)
Another improvement I made was to include the inverse of each game (prob = 1-prob and swap wins and losses). This improves the fit around 0.5 since it is a little nonsensical if the matchup of two equal KRACH teams is not 0.5 in a model only dependent on KRACH.
One final point which I think has been established at this point, but I want to make sure we are all on the same page. Predictive models are going to have some subjectivity built into them. It is great that KRACH has no subjectivity and is a mathematically pure ranking for its goal of NCAA seeding since that needs to be "fair" and should be based on the actual outcomes of the games. However when creating a predictive model, we unfortunately do not have the luxury of a mathematically pure system. There are parameters and methods that must be chosen both when designing a predictor and measuring the performance. It is the designers goal to choose these such that the predictor is not over/underfit or have any bias's built in.
Many thanks to jfeath for this awesome work. One think I'd be curious to know is how the R-squared for the KRACH, Logistic Regression, and Linear Fit on Gaussian Average varies from year to year. This could give an indication of the extent (if any) that KRACH is a lesser predictor than your outputs.
Quote from: adamwBTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
My one sentence summary is that the current KRACH probabilities are biased towards the higher ranked team, a very simple modification which would greatly increase the accuracy is to change the formula to P(A Winning) = .749*(KRACH_A/(KRACH_A+KRACH_B)) + .126
Quote from: KGR11Many thanks to jfeath for this awesome work. One think I'd be curious to know is how the R-squared for the KRACH, Logistic Regression, and Linear Fit on Gaussian Average varies from year to year. This could give an indication of the extent (if any) that KRACH is a lesser predictor than your outputs.
I can try this out to just as a sanity check to make sure the models aren't overfitting. I may even do it on a completely new season to be sure.
Quote from: jfeath17Quote from: adamwBTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
My one sentence summary is that the current KRACH probabilities are biased towards the higher ranked team, a very simple modification which would greatly increase the accuracy is to change the formula to P(A Winning) = .749*(KRACH_A/(KRACH_A+KRACH_B)) + .126
Wait, that's English? :) ... If you want to work with me on something going forward, feel free to drop me a line. adamw@collegehockeynews.com (same for anyone else who has chimed in here with something concrete to offer)
Quote from: adamwQuote from: jfeath17Quote from: adamwBTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
My one sentence summary is that the current KRACH probabilities are biased towards the higher ranked team, a very simple modification which would greatly increase the accuracy is to change the formula to P(A Winning) = .749*(KRACH_A/(KRACH_A+KRACH_B)) + .126
Wait, that's English? :) ... If you want to work with me on something going forward, feel free to drop me a line. adamw@collegehockeynews.com (same for anyone else who has chimed in here with something concrete to offer)
Basically, what's being suggested here is to use KRACH for 75% of the probability and split the other 25% equally. I was actual thinking of suggesting a lesser dampening effect, such as using 90% KRACH and then adding that to 5% for each team, just based upon the feeling that even the weakest team should have at least a 5% chance - but that's just based upon gut feel, no data analysis.
Nevertheless, I do appreciate and enjoy the KRACH model, and do believe it's the fairest way of ranking teams.
Quote from: adamwWhat future probabilities of any kind are verifiable?
Well, if we have a fair coin and flip it repeatedly, we can calculate the probabilities before they happen.
This hinges on the coin being "fair." We can decide this by measuring its dimensions, the smoothness of its surfaces, and the uniformity of its density and metallurgic composition.
Of course, none of these measurements can be without some error. But if the errors are not biased high or low, this shouldn't matter.
We believe dimensions, smoothness of surface, and uniformity of density and metallurgy are important because we understand gravity, the mechanics of flat bodies, etc. We understand such things they've been studied in a wide variety of contexts, not only coin flipping, and these understandings have been corroborated by experimental results. Moreover, laboratory experiments are "closed" in the sense that researchers control not only the conditions for observations but also can intervene actively in experiments to create desired conditions for observation (e.g. throw coins with varying strength).
In contrast, there are two problems with hockey games. 1) We do not have a credible theory of hockey that explains why teams win or lose. 2) Games occur in "open" settings, where what goes on is not under experimental control and contingent on things that themselves may have any influence.
Quote from: jkahnQuote from: adamwQuote from: jfeath17Quote from: adamwBTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
My one sentence summary is that the current KRACH probabilities are biased towards the higher ranked team, a very simple modification which would greatly increase the accuracy is to change the formula to P(A Winning) = .749*(KRACH_A/(KRACH_A+KRACH_B)) + .126
Wait, that's English? :) ... If you want to work with me on something going forward, feel free to drop me a line. adamw@collegehockeynews.com (same for anyone else who has chimed in here with something concrete to offer)
Basically, what's being suggested here is to use KRACH for 75% of the probability and split the other 25% equally. I was actual thinking of suggesting a lesser dampening effect, such as using 90% KRACH and then adding that to 5% for each team, just based upon the feeling that even the weakest team should have at least a 5% chance - but that's just based upon gut feel, no data analysis.
Nevertheless, I do appreciate and enjoy the KRACH model, and do believe it's the fairest way of ranking teams.
You are implying that there was a subjective choice of dampening effect?
Isn't the suggested formula there because it's the equation that defines the result of the logistic regression between KRACH and actual win% (when KRACH is grouped into buckets)?
Quote from: abmarksQuote from: jkahnQuote from: adamwQuote from: jfeath17Quote from: adamwBTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
My one sentence summary is that the current KRACH probabilities are biased towards the higher ranked team, a very simple modification which would greatly increase the accuracy is to change the formula to P(A Winning) = .749*(KRACH_A/(KRACH_A+KRACH_B)) + .126
Wait, that's English? :) ... If you want to work with me on something going forward, feel free to drop me a line. adamw@collegehockeynews.com (same for anyone else who has chimed in here with something concrete to offer)
Basically, what's being suggested here is to use KRACH for 75% of the probability and split the other 25% equally. I was actual thinking of suggesting a lesser dampening effect, such as using 90% KRACH and then adding that to 5% for each team, just based upon the feeling that even the weakest team should have at least a 5% chance - but that's just based upon gut feel, no data analysis.
Nevertheless, I do appreciate and enjoy the KRACH model, and do believe it's the fairest way of ranking teams.
You are implying that there was a subjective choice of dampening effect?
Isn't the suggested formula there because it's the equation that defines the result of the logistic regression between KRACH and actual win% (when KRACH is grouped into buckets)?
No, all I'm saying is that my gut feel is subjective.
Quote from: jkahnQuote from: adamwQuote from: jfeath17Quote from: adamwBTW - I can't read your charts, so I have no idea what they're telling me. If there's an English translation, feel free.
My one sentence summary is that the current KRACH probabilities are biased towards the higher ranked team, a very simple modification which would greatly increase the accuracy is to change the formula to P(A Winning) = .749*(KRACH_A/(KRACH_A+KRACH_B)) + .126
Wait, that's English? :) ... If you want to work with me on something going forward, feel free to drop me a line. adamw@collegehockeynews.com (same for anyone else who has chimed in here with something concrete to offer)
Basically, what's being suggested here is to use KRACH for 75% of the probability and split the other 25% equally. I was actual thinking of suggesting a lesser dampening effect, such as using 90% KRACH and then adding that to 5% for each team, just based upon the feeling that even the weakest team should have at least a 5% chance - but that's just based upon gut feel, no data analysis.
Nevertheless, I do appreciate and enjoy the KRACH model, and do believe it's the fairest way of ranking teams.
Yes this exactly. Much better of a simplification than mine. :)
Quote from: adamwActually, I think you're the only one conflating anything, because I never said "worse" - so I'm not sure where this is coming from.
Quote from: adamwIt is not certain that looking at things beyond wins and losses is any better. Goal differential has major flaws, and might not mean much. Shot differential has its own issues, but could be a decent factor. Honestly, I'm not all that interested in things like goal and shot differential.
So you wrote an entire post taking issue with how I put words in your mouth that you said shot differential/goal differential is a worse predictor than wins/losses, when in actuality you were merely saying both are equally flawed? Then just replace "worse" in my post with "just as bad" and my points still stand.
Quote from: adamwWhat has rubbed me the wrong way with your posts is your unnecessary (to me) vehemence against KRACH, the twisting of what I've said, and the high-level of self-confidence in what you're saying. A little humility is helpful here because no one really knows.
Oh I'm fully aware I don't really know anything. I don't have a background in statistics or mathematical modeling (or swimsuit modeling). I just watch a lot of hockey and read a lot about hockey, and nothing I've seen or read suggests a predictive model based entirely on win % over a 30-game season is going to be very accurate. You/CHN have done a convincing job arguing that KRACH is built to be (almost?) the best model for measuring past success. It isn't built to predict the future, though.
Quote from: SwampyQuote from: abmarksQuote from: SwampySome things I'd want to add to the discussion:
2. Exactly how does variance play out in these methods. If Team A plays Team B, does the P[Team A or Team B wins] = 1.0? Suppose Team A has P[winning] = 0.6, and Team B has 0.4, but Team A is erratic (I'm looking at you Clarkson), while Team B is not. Does A's greater variance show up in the prediction?
Let's say we know that in the long run, A beats B 75% of the time. So, over 100 games, A wins 75.
What the P(A winning) does NOT tell you is which of those 100 games A wins. A could go 0-10, then 75-5, then 0-10 over the course of those 100.
Taking that back to the topic at hand, short term results (ie the 1 game result in a tournament) are going to vary a lot vs. the long-term percentage.
I understand this but was talking about variance in several other senses. I'll explain them here. WARNING: THE FOLLOWING IS QUITE WONKISH.
Assumptions
Assume two teams, Team C and Team H, belong to a 12-team league in which teams play each other twice during the season. So each team plays 22 league games. Also assume teams earn 0 points in the league standings for a loss, 1 for a tie, and 2 for a win.
Estimation Variance
For the moment, ignore ties. Any data-based estimate of a team's chances of winning a game can be thought of as a function. If pC is the probability Team C wins a game, then let pC be the estimate of that probability. So that:
(1) pC = f(data)
In other words, the estimated probability is a function of whatever data are used in the estimate. When we say "data," this includes the number of data points (sample size) used to make the estimate.
Now, if we know the mathematical properties of f() we may be able to derive, mathematically, an expression for the variance of pC, var(pC). Call this the estimation variance, a measure if the estimate's the precision.
If we do not know the estimating function's mathematical properties, we still may be able to estimate var(pC) using simulation and resampling techniques (https://www.wikiwand.com/en/Resampling_(statistics)).
Game and Game-Series Estimates
Think of a single game as an experiment with two possible outcomes: "success" and "failure." For simplicity, assume we actually know the real probability of each, so we don't have to use estimates like (1). To think about this, just consider Team C for now.
Let:
p = probability Team C wins
q = probability Team C loses = 1 - p
Furthermore, to convert the results into a number, define a random variable, X = 1 for a win and 0 for a loss. This is well known as a Bernoulli Trial (https://www.wikiwand.com/en/Bernoulli_trial), and X has a Bernoulli Distribution (https://www.wikiwand.com/en/Bernoulli_distribution). The variance of X is given by:
(2) var(X) = pq
In the present context, call this "game variance" since it is the variance related to the outcome of a single game.
We can also think of a "series variance", which is the variance associated with a team winning a series. To simplify the math, let's disregard the fact that some series end after a team has won the majority of games in the series (e.g., 2 out of 3), and just think of the number of wins in a series. Define a second random variable, Yn as the number of wins in a series of n games. If each game has the same probabilities of its outcome, then Yn is the sum of n X's. In other words, it has a binomial distribution (https://www.wikiwand.com/en/Binomial_distribution), the variance of which equals:
(3) var(Yn) = np(1-p)
In both (2) and (3) the variance depends on the value of p. If p = 0, the variance is 0, and similarly for p = 1. The variance is at its maximum, 0.25, when p = 0.5.
It's important to note here that the variance depends on the underlying, real probabilities and is not a matter of estimation.
Comments on jfeath17's chart
[list=1]
- The chart shows a relation between the Krach and actual game outcomes. Because of the properties of Bernoulli and Binomial distributions, the variance necessarily decreases as p moves away from 0.5 and closer to 1.0. So we would expect better predictions to the right of the graph. But the graph is almost a straight line up to about p = 0.85 and then drops off slightly. Maybe this is due to a weakness in the Krach, which is not intended to predict outcomes. Or maybe that's why they play the game.
- The chart would be improved with confidence bands (https://www.wikiwand.com/en/Confidence_and_prediction_bands), which are sensitive to variance and graphically show how confident one should be about the fitted line.
Notice though that confidence intervals plotted around a curve like this, which is based on empirical data, themselves estimated from other data (as in Equation 1), have two sources of variance: Estimation Variance and Game variance.
Perfomance Variance
In addition to the above, we should consider the variance of a given team's performance. Some teams are reliable; others are erratic. This can be best explained with an example.
Suppose every one of the 10 "other" teams always scores exactly 3 goals in every game. Then if O is the number of goals one of these "other" teams scores, the expected number of goals is 3 (E- = 3[/i]), and the variance is zero (var
- = 0[/i]).
Similarly, assume Team C always scores 4 goals when it plays. Then if C is a random variable equal to the number of goals Team C scores, E[C] = 4 and var[C] = 0.
We can see right away that over the season Team C will always win over the ten "other" teams, so just from playing them it will accumulate 40 points (10 teams, 2 games per team, 2 points per win).
But now consider Team H, which is more erratic. Let H be the number of goals it scores in any given game. Like Team C let Team H's expected number of goals be 4: E[H] = 4. But unlike Team C, var[H] will not be zero.
Instead, suppose H has the following probability mass distribution: P[H = 2] = 0.10, P[H = 3] = 0.15, P[H = 4] = 0.50, P[H = 5] = 0.15, and P[H = 6] = 0.10. So here we can see different results when Team H plays its 20 games against the 10 "other" teams: the expected number of losses is 2, the expected number of ties is 3, and the expected number of wins is 15. So when Team H plays the other teams, the expected number of points is only 33, unlike Team C's 40!
What about when Team C and Team H play each other? Even though both have the same expected number of goals, the variance of Team H means it will be expected to lose to Team C 25% of the time, tie Team C 50% of the time, and beat Team C 25% of the time. In each of their 2 games against each other during the regular season, 2 points is at stake. So Team H can expect 0.5 points from a tie (1 point x 0.5 probability) and 0.5 points from a win (2 points x 0.25 probability), or 1 point in total. Similarly for Team C. With both teams playing each other twice during the season, each expects to get 2 points. This makes sense, because they're evenly matched.
But in terms of total points in the league, Team C expects to have 42 points at season's end, but Team H expects only 35 points. Which is how things should be, because Team H sucks.
Notice here that the only difference between the two teams is their respective variances, but it makes a big difference. If we look more closely at games against the 10 "other" teams, we are much more confident that Team C will beat them, whereas we expect Team H to lose to some of them. This is why performance variance is also important in thinking about which teams are likely to win particular games. Again, here there's no estimation issue. We know what the probabilities really are, yet variance affects the outcome.
Technical Suggestion
Jfeath17 asked for suggestions regarding the graphical analysis. For this kind of work I highly recommend the R Project's (https://www.r-project.org/) free, open-source statistical software used in conjunction with the RStudio (https://www.rstudio.com/) GUI interface. It would allow easy addition of things like confidence bands in the probability plots, weighting of recent time-series data, etc.
So I think I missed the depths of this discussion when it happened, although I did comment on another thread. But I was wondering if either of you happen to have this collected in a slightly more organized form than a forum post, like an RMarkdown document?
I think there's an obvious improvement on predictions, written up in https://arxiv.org/abs/2001.04226 , which starts with the posterior probability distribution for the Bradley-Terry ratings, rather than the maximum likelihood estimate, which is what KRACH is. But in the example looked at there, it doesn't change the probabilities very much.
Its a complex thing for sure.
You get the data over the course of a yr and then create a system that could explain the results, but then the next yr teams have changed by 20-40% and you can throw many of the numbers out the window.
then you have to decide how much back-2-back games matter vs home ice vs injuries matter
Look at horse racing with 100 yrs of stats and results measured and people cant even pick winners at better than 30%