Friday, July 18th, 2025

Slack
ELynah
Database
Schedule
Standings
Roster
FAQ

CHN
USCHO
TBRW?
Cornell Hockey Association
Travel Guide
News Archives
Cornell Hockey
ECAC Hockey
Ivy Hockey

NCAA
1967 1970

ECAC
1967 1968 1969 1970 1973 1980 1986 1996 1997 2003 2005 2010 2024

IVY
1966 1967 1968 1969 1970 1971 1972 1973 1977 1978 1983 1984 1985 1996 1997 2002 2003 2004 2005 2012 2014 2018 2019 2020 2023 2024

Cleary Bedpan
2002 2003 2005 2018 2019 2020

Ned Harkness Cup
2003 2005 2008 2013

Brendon
Iles
Pokulok
Schafer
Syphilis

Uncertainty in KRACH probability estmates (thread drift)

Posted by jtwcornell91

Previous Thread•Next Thread

Forum List•Message List•New Topic•Search

Uncertainty in KRACH probability estmates (thread drift)

Posted by: jtwcornell91 (Moderator)

Date: February 26, 2020 08:13AM

jtwcornell91

KGR11
Here's an example of the issue with using KRACH to predict final pairwise:

Cornell (KRACH Rating: 526) is playing St Lawrence (KRACH Rating: 11) in a few weeks. My understanding is that the ratings can be used to come up with a pseudo-record between the two teams (Cornell with 526 wins, St Lawrence with 11). The Monte Carlo simulation uses this record to determine how often Cornell wins. In this case, the model predicts they win 98% of the time.

jfeath's regression analysis from 2 years ago shows that a team with a KRACH winning percentage of 100% theoretically wins about 83.4% of the time, 14.5% lower than what KRACH states for the Cornell-St Lawrence game.

I think it makes sense for 83.4% to be an upper bound on winning percentage. Any goalie can have an incredible/incredibly bad day. Also, the fact that there are ties in hockey means that the winning percentage should be more weighted to 50% than a sport where you can't have ties.

Ideally, jfeath's regression analysis would be an in-between step in the Pairwise probability matrix to convert the KRACH winning percentages (which show what happened to date) to predictive winning percentages.

Even if the Bradley-Terry model is "correct" (whatever that means) there are two potential problems with using KRACH to predict the outcome of a mismatch:

One, KRACH is a maximum-likelihood estimate of a team's Bradley-Terry strength, whereas any estimate of the Bradley-Terry parameters based on a finite amount of data has some uncertainty in it. Ordinarily that's not such a big deal for assigning probabilities to the outcome of one game: the ratio of Cornell's strength to Clarkson's might be higher or lower than our best guess, but that means we might have over- or under-estimated it, and so the uncertainty probably washes out. But when the best guess is something like 50-to-1, that uncertainty can make a big difference in a more careful estimate of the probabilities. As an oversimplified version, suppose the "correct" odds might be 100-to-1 or 25-to-1, but we don't know which. Then the probability of an upset would be the average of 1.0% and 3.8%, which is 2.4% or about 40-to-1 against, not 50-to-1. I.e., the uncertainty naturally biases our expectation of the true probability away from the extremes, because having maybe somewhat overestimated the magnitude of the upset is a bigger effect than having maybe somewhat underestimated it. This is the issue we addressed in this paper, with a specific example discussed on this forum of the Cornell-Quinnipiac quarterfinal series from a few years back: [dx.doi.org] [arxiv.org]

Two, the maximum-likelihood analysis doesn't take into account any prior expectations about the possible discrepancies in teams' strengths, which means it's equivalent to making your prior information completely noninformative. This is a well-known effect which leads to undefeated teams having infinite KRACH ratings, and it's why Ken Butler put the "fictitious games" into KRACH for a while (the maximum likelihood estimates with fictitious games turn out to be the maximum a posteriori estimates with a particular prior distribution). But this is almost always a pretty small effect by this point in the season, so we don't generally worry about it. (BTW, the basic problem is older than hockey, since LaPlace was working on it circa 1800. What's your best guess probability that an event will happen, given that it's never happened in some number of chances? If you use the fraction of times you've already seen it as an estimate, you get zero, but you probably don't want to say it's literally impossible. The Bayes-Laplace rule of succession is basically what you get if you at two extra "fictitious trials", one where it occurred and one where it didn't.)

Ties are a huge pain in the ass, and complicate everything, so it's often easier to pretend they don't exist (or rather that past ties are half wins and half losses and future ties are something we don't talk about), especially since they become impossible once the playoffs start.

I finally got around to addressing this point: does Bradley-Terry really think Cornell is a 2-to-1 favorite over Clarkson? Right now, using the ratio of KRACH ratings, I get a probability 67.9% for Cornell to beat Clarkson:

Cornell probabilities to beat Clarkson (MAP)
0.679106211601309

But there's uncertainty in that, since we don't have an infinite set of results to base it on, so the actual relative strengths could quite plausibly be 1:1 or 4:1 rather than 2:1. If I make a Gaussian approximation (in the log odds ratio) to our uncertainty and average over the posterior probability (black curve) I get (depending on how I do the estimate, and on the randomness of some of the Monte Carlo estimation methods) somewhere between 65.9% and 66.7%:

Cornell probabilities to beat Clarkson (Gaussian)
0.6630671498583337
Cornell probabilities to beat Clarkson (Gaussian avg)
[0.66238181 0.66241547 0.66248589 0.66414951]
Cornell probabilities to beat Clarkson (Gaussian MC)
[0.6669  0.65965 0.6591  0.6632 ]

But that approximation is not the actual posterior uncertainty according to the model, it's just an approximation that's easy to draw Monte Carlo samples from. We can try to correct for this with a method called importance sampling, but there's a bit of Monte Carlo variability (colored histograms). If we average over that corrected posterior estimate, we get something between 66.6% and 68.4%:

Cornell probabilities to beat Clarkson (importance sampling avg)
[0.67634776 0.67805747 0.67706685 0.68029994]
Cornell probabilities to beat Clarkson (importance sampling MC)
[0.66640855 0.68385622 0.66816863 0.68092229]

So it really does look like 2-to-1 is a good representation of our state of knowledge.

___________________________
JTW

@jtwcornell91@hostux.social

Edited 2 time(s). Last edit at 02/26/2020 08:39AM by jtwcornell91.