# Uncertainty in KRACH probability estmates (thread drift)

Posted by jtwcornell91

**Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: jtwcornell91**(Moderator)

**Date:**February 26, 2020 08:13AM

jtwcornell91

KGR11

Here's an example of the issue with using KRACH to predict final pairwise:

Cornell (KRACH Rating: 526) is playing St Lawrence (KRACH Rating: 11) in a few weeks. My understanding is that the ratings can be used to come up with a pseudo-record between the two teams (Cornell with 526 wins, St Lawrence with 11). The Monte Carlo simulation uses this record to determine how often Cornell wins. In this case, the model predicts they win 98% of the time.

jfeath's regression analysis from 2 years ago shows that a team with a KRACH winning percentage of 100% theoretically wins about 83.4% of the time, 14.5% lower than what KRACH states for the Cornell-St Lawrence game.

I think it makes sense for 83.4% to be an upper bound on winning percentage. Any goalie can have an incredible/incredibly bad day. Also, the fact that there are ties in hockey means that the winning percentage should be more weighted to 50% than a sport where you can't have ties.

Ideally, jfeath's regression analysis would be an in-between step in the Pairwise probability matrix to convert the KRACH winning percentages (which show what happened to date) to predictive winning percentages.

Even if the Bradley-Terry model is "correct" (whatever that means) there are two potential problems with using KRACH to predict the outcome of a mismatch:

One, KRACH is a maximum-likelihood estimate of a team's Bradley-Terry strength, whereas any estimate of the Bradley-Terry parameters based on a finite amount of data has some uncertainty in it. Ordinarily that's not such a big deal for assigning probabilities to the outcome of one game: the ratio of Cornell's strength to Clarkson's might be higher or lower than our best guess, but that means we might have over- or under-estimated it, and so the uncertainty probably washes out. But when the best guess is something like 50-to-1, that uncertainty can make a big difference in a more careful estimate of the probabilities. As an oversimplified version, suppose the "correct" odds might be 100-to-1 or 25-to-1, but we don't know which. Then the probability of an upset would be the average of 1.0% and 3.8%, which is 2.4% or about 40-to-1 against, not 50-to-1. I.e., the uncertainty naturally biases our expectation of the true probability away from the extremes, because having maybe somewhat overestimated the magnitude of the upset is a bigger effect than having maybe somewhat underestimated it. This is the issue we addressed in this paper, with a specific example discussed on this forum of the Cornell-Quinnipiac quarterfinal series from a few years back: [dx.doi.org] [arxiv.org]

Two, the maximum-likelihood analysis doesn't take into account any prior expectations about the possible discrepancies in teams' strengths, which means it's equivalent to making your prior information completely noninformative. This is a well-known effect which leads to undefeated teams having infinite KRACH ratings, and it's why Ken Butler put the "fictitious games" into KRACH for a while (the maximum likelihood estimates with fictitious games turn out to be the maximum a posteriori estimates with a particular prior distribution). But this is almost always a pretty small effect by this point in the season, so we don't generally worry about it. (BTW, the basic problem is older than hockey, since LaPlace was working on it circa 1800. What's your best guess probability that an event will happen, given that it's never happened in some number of chances? If you use the fraction of times you've already seen it as an estimate, you get zero, but you probably don't want to say it's literally impossible. The Bayes-Laplace rule of succession is basically what you get if you at two extra "fictitious trials", one where it occurred and one where it didn't.)

Ties are a huge pain in the ass, and complicate everything, so it's often easier to pretend they don't exist (or rather that past ties are half wins and half losses and future ties are something we don't talk about), especially since they become impossible once the playoffs start.

I finally got around to addressing this point: does Bradley-Terry really think Cornell is a 2-to-1 favorite over Clarkson? Right now, using the ratio of KRACH ratings, I get a probability 67.9% for Cornell to beat Clarkson:

Cornell probabilities to beat Clarkson (MAP) 0.679106211601309But there's uncertainty in that, since we don't have an infinite set of results to base it on, so the actual relative strengths could quite plausibly be 1:1 or 4:1 rather than 2:1. If I make a Gaussian approximation (in the log odds ratio) to our uncertainty and average over the posterior probability (black curve) I get (depending on how I do the estimate, and on the randomness of some of the Monte Carlo estimation methods) somewhere between 65.9% and 66.7%:

Cornell probabilities to beat Clarkson (Gaussian) 0.6630671498583337 Cornell probabilities to beat Clarkson (Gaussian avg) [0.66238181 0.66241547 0.66248589 0.66414951] Cornell probabilities to beat Clarkson (Gaussian MC) [0.6669 0.65965 0.6591 0.6632 ]But that approximation is not the actual posterior uncertainty according to the model, it's just an approximation that's easy to draw Monte Carlo samples from. We can try to correct for this with a method called importance sampling, but there's a bit of Monte Carlo variability (colored histograms). If we average over that corrected posterior estimate, we get something between 66.6% and 68.4%:

Cornell probabilities to beat Clarkson (importance sampling avg) [0.67634776 0.67805747 0.67706685 0.68029994] Cornell probabilities to beat Clarkson (importance sampling MC) [0.66640855 0.68385622 0.66816863 0.68092229]So it really does look like 2-to-1 is a good representation of our state of knowledge.

Edited 2 time(s). Last edit at 02/26/2020 08:39AM by jtwcornell91.

**Re: Bracketology for 2020 NCAAs**

**Posted by: jtwcornell91**(Moderator)

**Date:**February 26, 2020 08:16AM

But all of this is assuming we have no prior information about how good or bad any teams could be relative to each other, and it has built into it the usual problems with undefeated teams being modelled as infinitely better than everyone else. If we put in a prior equivalent to two "fictitious games" (i.e. before calculating the KRACH ratings, add to the actual results one win and one loss for each team against an average team with a KRACH of 100), we find a best estimate of Cornell's and Clarkson's ratings which translates to a 65.3% chance of Cornell beating Clarkson:

Cornell probabilities to beat Clarkson (MAP) 0.6527192780965341If we use a Gaussian approximation to model the uncertainty in the estimate, we come up with a 63.3% to 64.4% chance:

Cornell probabilities to beat Clarkson (Gaussian) 0.6406057998515924 Cornell probabilities to beat Clarkson (Gaussian avg) [0.64105389 0.64044367 0.64099546 0.63969837] Cornell probabilities to beat Clarkson (Gaussian MC) [0.6437 0.6353 0.63825 0.63345]and finally, if we try to correct the Gaussian approximation, we get estimates beteen 63.3% and 65.6%:

Cornell probabilities to beat Clarkson (importance sampling avg) [0.65578108 0.65157989 0.65081472 0.64830749] Cornell probabilities to beat Clarkson (importance sampling MC) [0.65913595 0.65544068 0.64221379 0.63335444]

**Re: Bracketology for 2020 NCAAs**

**Posted by: jtwcornell91**(Moderator)

**Date:**February 26, 2020 08:21AM

If we ask the same question about Cornell and St. Lawrence, whose 45:1 KRACH ratio translates into a 97.8% chance of a Cornell victory (or really, an expectation that Cornell would take 97.8% of the points in the long run if they played SLU a huge number of times)

Cornell probabilities to beat SLU (MAP) 0.9776358688353626Modelling the uncertainty with a Gaussian approximation makes this look more like 97%

Cornell probabilities to beat SLU (Gaussian) 0.9711439093854904 Cornell probabilities to beat SLU (Gaussian avg) [0.97104034 0.97112873 0.97100314 0.97119117] Cornell probabilities to beat SLU (Gaussian MC) [0.972 0.969 0.96945 0.9733 ]But correcting for the approximation brings it back up to 98% or so:

Cornell probabilities to beat SLU (importance sampling avg) [0.97851686 0.97889054 0.97832162 0.97876575] Cornell probabilities to beat SLU (importance sampling MC) [0.97899249 0.97783017 0.97480149 0.97925968]

**Re: Bracketology for 2020 NCAAs**

**Posted by: jtwcornell91**(Moderator)

**Date:**February 26, 2020 08:28AM

Putting in the two-fictitious-games prior to reflect our supposition that there's probably not that much disparity gives us a probability of 96.3% from the difference of maximum a posteriori log-strengths

Cornell probabilities to beat SLU (MAP) 0.9632300988035607Averaging over the posterior uncertainty makes this 95.5% or so:

Cornell probabilities to beat SLU (Gaussian) 0.9553660102182032 Cornell probabilities to beat SLU (Gaussian avg) [0.95535942 0.95527023 0.95513885 0.9550743 ] Cornell probabilities to beat SLU (Gaussian MC) [0.9577 0.95395 0.95435 0.95705]Going from 97.8% to 96% doesn't seem like a big deal, but it's basically changed our estimate of the odds from 45:1 to 25:1.And correcting with importance sampling makes it more like 96.2% to 96.5%:Cornell probabilities to beat SLU (importance sampling avg)

[0.96409882 0.96431912 0.96420754 0.9634074 ]

Cornell probabilities to beat SLU (importance sampling MC)

[0.96474495 0.96275461 0.96386177 0.96205632]

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: Trotsky**(---.dc.dc.cox.net)

**Date:**February 26, 2020 09:13AM

Having an

*actual*genius on this forum is as useful as it is personally demoralizing.
Edited 1 time(s). Last edit at 02/26/2020 09:13AM by Trotsky.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: ugarte**(---.177.169.163.IPYX-102276-ZYO.zip.zayo.com)

**Date:**February 26, 2020 10:21AM

save yourself by deeming it "boring" and then you don't have to admit that you can't read it without feeling stupid.Trotsky

Having anactualgenius on this forum is as useful as it is personally demoralizing.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: Swampy**(---.ri.ri.cox.net)

**Date:**February 26, 2020 10:48AM

ugarte

save yourself by deeming it "boring" and then you don't have to admit that you can't read it without feeling stupid.Trotsky

Having anactualgenius on this forum is as useful as it is personally demoralizing.

I hardly do anything without feeling stupid.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: Trotsky**(---.dc.dc.cox.net)

**Date:**February 26, 2020 11:21AM

Too late. Meeting John was the worst thing that ever happened to my ego. There was a time when I actually considered myself smarter than a door stop.ugarte

save yourself by deeming it "boring" and then you don't have to admit that you can't read it without feeling stupid.Trotsky

Having anactualgenius on this forum is as useful as it is personally demoralizing.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: ursusminor**(---.washdc.dsl-w.verizon.net)

**Date:**February 26, 2020 12:46PM

Trotsky

Too late. Meeting John was the worst thing that ever happened to my ego. There was a time when I actually considered myself smarter than a door stop.ugarte

Trotsky

Having anactualgenius on this forum is as useful as it is personally demoralizing.

Don't your alter ego's 1st and 2nd laws actually indicate that the smart money is on Clarkson?

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: Trotsky**(---.dc.dc.cox.net)

**Date:**February 26, 2020 12:47PM

They caution not to wager on us. That is a different thing. If I don't bet I don't collapse the wave function.ursusminor

Trotsky

Too late. Meeting John was the worst thing that ever happened to my ego. There was a time when I actually considered myself smarter than a door stop.ugarte

Trotsky

Having anactualgenius on this forum is as useful as it is personally demoralizing.

Don't your alter ego's 1st and 2nd laws actually indicate that the smart money is on Clarkson?

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: BearLover**(198.232.50.---)

**Date:**February 28, 2020 12:43PM

Sorry if I'm misunderstanding (much of your posts went way over my head), but I'd think the plausible range of relative strengths should be between 1:2 and 3:1, or something to that effect. I think it is far more likely that Clarkson is a slightly better team than Cornell than it is that Cornell is much better than Clarkson. Does using those numbers change your outputs at all?jtwcornell91

I finally got around to addressing this point: does Bradley-Terry really think Cornell is a 2-to-1 favorite over Clarkson? Right now, using the ratio of KRACH ratings, I get a probability 67.9% for Cornell to beat Clarkson.... But there's uncertainty in that, since we don't have an infinite set of results to base it on, so the actual relative strengths could quite plausibly be 1:1 or 4:1 rather than 2:1.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: upprdeck**(---.fs.cornell.edu)

**Date:**February 28, 2020 01:57PM

just because wagering is what makes all this fun

Cornell -3735 tonight

Clarkson -435

Cornell -3735 tonight

Clarkson -435

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: toddlose**(76.117.252.---)

**Date:**February 28, 2020 02:15PM

upprdeck

just because wagering is what makes all this fun

Cornell -3735 tonight

Clarkson -435

Is there an early line on our game tmrw?

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: upprdeck**(---.fs.cornell.edu)

**Date:**February 28, 2020 02:17PM

not that i can find. just tonights games.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: jtwcornell91**(Moderator)

**Date:**March 11, 2020 06:40PM

BearLover

Sorry if I'm misunderstanding (much of your posts went way over my head), but I'd think the plausible range of relative strengths should be between 1:2 and 3:1, or something to that effect. I think it is far more likely that Clarkson is a slightly better team than Cornell than it is that Cornell is much better than Clarkson. Does using those numbers change your outputs at all?jtwcornell91

I finally got around to addressing this point: does Bradley-Terry really think Cornell is a 2-to-1 favorite over Clarkson? Right now, using the ratio of KRACH ratings, I get a probability 67.9% for Cornell to beat Clarkson.... But there's uncertainty in that, since we don't have an infinite set of results to base it on, so the actual relative strengths could quite plausibly be 1:1 or 4:1 rather than 2:1.

So you might intuitively expect that, but the posterior probability, which mostly comes from the likelihood function that relates win probability to probabilities of all the outcomes in the season, doesn't work out that way. The easiest approximation to use enforces that the posterior plausibility for the odds ratio, on a log scale, will be a symmetric bell curve (Gaussian), and so 1:2 is as far away from 2:1 as 8:1 is. Of course, that's just an approximation to the model; the "true" posterior is harder to compute, since you have to "marginalize" (integrate) over the 58 other dimensions you don't care about (e.g., how are Cornell or Clarkson expected to do against Minnesota). (In fact, one of the reasons we use the Gaussian approximation is that this 58-dimensional integral can be done analytically.) We can try to correct for the Gaussian approximation using a technique called importance sampling, which is where the noisy colorful step plots are coming from. It's kind of rough, but you can sort of see that the exact posterior is actually skewed a bit

**away**from 1:1 so in fact more "extreme" odds ratios are slightly

**more**likely than you'd expect from the Gaussian approximation.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: jtwcornell91**(Moderator)

**Date:**March 11, 2020 06:49PM

Now, of course, this is within the context of the Bradley-Terry model, and you could say the probability relationships assumed by that model (odds ratio for A:C = odds ratio for A:B times odds ratio for B:C) are not right. But you see similar effects in a binomial experiment, where you're making the same comparison over and over again to try to estimate a single probability. The posterior on the Cornell-Clarkson odds ratio (using the results from when I made the original post, not the current ones) is pretty close to what you'd get if, instead of the whole season, the only results you had were nine games between the two teams, of which Cornell won 6 and Clarkson won 3. Then, with basically only the assumption that each game is an independent comparison with the same probability that Cornell will win (which might not be literally true if the two teams really played 9 times, but is more or less the case in the real season), you get a posterior for the win probability of a very specific form (the beta distribution), which in terms of odds ratio, comes out to the red dashed curve on the plot below. (In this simple model, we do have the exact posterior available to us.) We see that it's pretty close to the Gaussian approximation (blue dash-dot curve), but again skews slightly

**away**from 50:50, just as in the Bradley-Terry case. (The Gaussian approximation to the marginal Bradley-Terry posterior is shown in light grey for comparison.)**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: jtwcornell91**(Moderator)

**Date:**March 11, 2020 06:55PM

For completeness, here are the same posteriors plotted in terms of win probability rather than odds ratio.

One other objection is that all of these posteriors use the Haldane prior (no fictitious games), and you might put on a prior that reflects the reasonable supposition that all log-odds-ratios are not equally likely a priori. But that doesn't change the fundamental nature of the underlying beta distribution. For instance, if you assumed a priori that any win probability was equally likely (the Bayes-Laplace prior for the binomial experiment) then the posterior would be a beta(7,4) distribution rather than the beta(6,3) plotted here, and it would still be skewed slightly to the right on a log-odds-ratio plot, although the peak itself would be at a lower value (7:4=1.75:1 rather than 6:3=2:1).

One other objection is that all of these posteriors use the Haldane prior (no fictitious games), and you might put on a prior that reflects the reasonable supposition that all log-odds-ratios are not equally likely a priori. But that doesn't change the fundamental nature of the underlying beta distribution. For instance, if you assumed a priori that any win probability was equally likely (the Bayes-Laplace prior for the binomial experiment) then the posterior would be a beta(7,4) distribution rather than the beta(6,3) plotted here, and it would still be skewed slightly to the right on a log-odds-ratio plot, although the peak itself would be at a lower value (7:4=1.75:1 rather than 6:3=2:1).

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: ugarte**(---.sub-174-202-0.myvzw.com)

**Date:**March 11, 2020 08:35PM

John, stop fucking with me. The world is ending and I want to feel smart at the end.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: scoop85**(---.hvc.res.rr.com)

**Date:**March 11, 2020 09:01PM

ugarte

John, stop fucking with me. The world is ending and I want to feel smart at the end.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: ugarte**(---.sub-174-202-0.myvzw.com)

**Date:**March 11, 2020 09:03PM

hahaha the bananas are dancingscoop85

ugarte

John, stop fucking with me. The world is ending and I want to feel smart at the end.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: scoop85**(---.hvc.res.rr.com)

**Date:**March 11, 2020 09:05PM

ugarte

hahaha the bananas are dancingscoop85

ugarte

John, stop fucking with me. The world is ending and I want to feel smart at the end.

Been a looooong time since I did me some dancing bananas and it seemed just right!

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: jtwcornell91**(Moderator)

**Date:**March 12, 2020 12:49PM

ugarte

John, stop fucking with me. The world is ending and I want to feel smart at the end.

We're all just seeking enlightenment.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: Trotsky**(---.dc.dc.cox.net)

**Date:**March 12, 2020 12:51PM

Speak for yourself.jtwcornell91

ugarte

John, stop fucking with me. The world is ending and I want to feel smart at the end.

We're all just seeking enlightenment.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: ugarte**(---.177.169.163.IPYX-102276-ZYO.zip.zayo.com)

**Date:**March 12, 2020 12:56PM

i don't know who that is but ... uh ... do you know the provenance of that phraseTrotsky

Speak for yourself.jtwcornell91

ugarte

John, stop fucking with me. The world is ending and I want to feel smart at the end.

We're all just seeking enlightenment.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: Trotsky**(---.dc.dc.cox.net)

**Date:**March 12, 2020 01:20PM

ugarte

i don't know who that is but ... uh ... do you know the provenance of that phrase

Of course.

**Re: Uncertainty in KRACH probability estmates (thread drift)**

**Posted by: adamw**(---.phlapa.fios.verizon.net)

**Date:**March 13, 2020 12:32AM

I'd just like to say - I should've accepted the bet on whether North Dakota, Minnesota State, Cornell would really finish 1-2-3 in the Pairwise.

It stayed that way for six weeks, and right to the bitter end. Never budged.

(and it wouldn't have budged, either)

It stayed that way for six weeks, and right to the bitter end. Never budged.

(and it wouldn't have budged, either)

Sorry, only registered users may post in this forum.