Mathematical Models

Started by Jeff Hopkins '82, February 26, 2018, 07:13:37 PM

Previous topic - Next topic

Jeff Hopkins '82

For all of those who want to piss and moan about mathematical models, predictions, and similar subjects, do it here.

RichH

This is the funniest thread.

Jeff Hopkins '82

Quote from: RichHThis is the funniest thread.

Irony is funny, isn't it?

CU77

KRACH win probabilities would be more accurate with an additional win and loss for each team against a fictitious "average" team. This is explained by Wes Colley in Section 3 of this white paper on his ranking method (which is similar to KRACH):
http://www.colleyrankings.com/matrate.pdf

jfeath17

The Dealing With Perfection section on this page discusses this. It did used to be used in KRACH, but it was found that there are better ways to handle it. http://elynah.com/tbrw/tbrw.cgi?krach

jtwcornell91

Quote from: jfeath17The Dealing With Perfection section on this page discusses this. It did used to be used in KRACH, but it was found that there are better ways to handle it. http://elynah.com/tbrw/tbrw.cgi?krach

I don't know that I'd say better.  Ken originally put in the fictitious games to give everyone finite ratings, but we later worked out a workaround to allow the computation to run: http://www.arxiv.org/abs/math.ST/0412232 .  There's still the question of whether it makes sense to let the data tell you to expect perfection, which has been a subject of debate for a long time.  (The add-a-win-and-a-loss trick is a version of the Bayes-Laplace rule of succession, which was spelled out in 1814.)  For that matter it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.  How mismatched are teams likely to be.  My student and I recently found that for Major League Baseball, for example, the level of parity is rather high, corresponding to something like 50 fictitious games: http://arxiv.org/abs/1712.05879

abmarks

Quote from: jtwcornell91
Quote from: jfeath17The Dealing With Perfection section on this page discusses this. It did used to be used in KRACH, but it was found that there are better ways to handle it. http://elynah.com/tbrw/tbrw.cgi?krach

I don't know that I'd say better.  Ken originally put in the fictitious games to give everyone finite ratings, but we later worked out a workaround to allow the computation to run: http://www.arxiv.org/abs/math.ST/0412232 .  There's still the question of whether it makes sense to let the data tell you to expect perfection, which has been a subject of debate for a long time.  (The add-a-win-and-a-loss trick is a version of the Bayes-Laplace rule of succession, which was spelled out in 1814.)  For that matter it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.  How mismatched are teams likely to be.  My student and I recently found that for Major League Baseball, for example, the level of parity is rather high, corresponding to something like 50 fictitious games: http://arxiv.org/abs/1712.05879

JTW- I read the paper about MLB and I have a question about table 5.  Is the error cited the error for a specific team? Is it the error found for all individual teams? Or is it the error for all games and all teams?

Meaning:  Is the April 15 error of 8.82 specific to a chosen single team (say the Yanks), or is it saying that when yo ulook at all teams individually you found 8.82 games of error across the board?  etc.

THanks!

CU77

Quote from: jtwcornell91it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.
Right, but two fictitious games is the minimum. This corresponds to a flat prior on win probability for each team. Less than two, and the prior sags in the middle and rises at the ends. This predicts that winning percentages will cluster around zero and one; in real life, this never ever happens; wining percentages (at the end of the season) always cluster around 0.5. Reproducing this in KRACH requires more than two fictitious games. And as you found in baseball, it can be a lot more than two.

jtwcornell91

Quote from: abmarks
Quote from: jtwcornell91
Quote from: jfeath17The Dealing With Perfection section on this page discusses this. It did used to be used in KRACH, but it was found that there are better ways to handle it. http://elynah.com/tbrw/tbrw.cgi?krach

I don't know that I'd say better.  Ken originally put in the fictitious games to give everyone finite ratings, but we later worked out a workaround to allow the computation to run: http://www.arxiv.org/abs/math.ST/0412232 .  There's still the question of whether it makes sense to let the data tell you to expect perfection, which has been a subject of debate for a long time.  (The add-a-win-and-a-loss trick is a version of the Bayes-Laplace rule of succession, which was spelled out in 1814.)  For that matter it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.  How mismatched are teams likely to be.  My student and I recently found that for Major League Baseball, for example, the level of parity is rather high, corresponding to something like 50 fictitious games: http://arxiv.org/abs/1712.05879

JTW- I read the paper about MLB and I have a question about table 5.  Is the error cited the error for a specific team? Is it the error found for all individual teams? Or is it the error for all games and all teams?

Meaning:  Is the April 15 error of 8.82 specific to a chosen single team (say the Yanks), or is it saying that when yo ulook at all teams individually you found 8.82 games of error across the board?  etc.

THanks!

It's the average of the errors for all the teams.  See equation (25) on the same page.  (The sd column shows the spread--measured as a standard deviation--of the errors among the teams, as defined in equation (26).)

jtwcornell91

Quote from: CU77
Quote from: jtwcornell91it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.
Right, but two fictitious games is the minimum. This corresponds to a flat prior on win probability for each team. Less than two, and the prior sags in the middle and rises at the ends. This predicts that winning percentages will cluster around zero and one; in real life, this never ever happens; wining percentages (at the end of the season) always cluster around 0.5. Reproducing this in KRACH requires more than two fictitious games. And as you found in baseball, it can be a lot more than two.

I don't think there's necessarily a minimum.  In the simpler case of Bernoulli trials with a single probability, the Jeffreys prior splits the difference between the Bayes-Laplace uniform-in-probability and the Haldane uniform-in-log-odds-ratio, and corresponds to one rather than two "fictitious trials".  The distribution is peaked at p=0 and 1, but if you change variables to ln(p/(1-p)) it's peaked at zero (i.e., p=1/2).

If you consider a situation where there's no parity among the competitors (people drawn at random off the street arm-wrestling or something), you might imagine that if you grab a pair of people and have them compete again and again, one person is likely to win a high percentage of the time.  Winning percentages don't cluster at zero and one because schedules are constructed to match competitively comparable teams.  Consider this year's Olympic women's hockey: if they hadn't divided the eight teams into a "strong group" and a "weak group", you'd have had two team (USA and CAN) that won about 92% of their games, four teams with winning percentages close to 50% (FIN, OAR, SUI and SWE), and two teams with long-term winning percentahes around 8% (JPN and COR).  Something similar might have happened with Men's DI around 2000 when the MAAC teams became full DI, but their winning percentages were inflated because they mostly played each other.  (Compare winning percentage to RRWP in http://www.elynah.com/tbrw/tbrw.cgi?2000/rankings )

Also, note that the quantity that has a uniform prior with two fictitious games is not the expected winning percentage against a balanced schedule, but the expected winning percentage against an average team.  So in the extreme Haldane-like situation where the team's KRACH strengths are drawn from a uniform-in-log prior, you'd expect to get a situation where each team is infinitely better than the teams below them and infinitely worse than the teams above them.  That gives you a uniform distribution of winning percentages (the kth best team out of n beats the n-k teams below them and loses to the k-1 teams above, for a winning percentage of (n-k)/(n-1) = 1 - (k-1)/(n-1)), not one that clusters around 0 and 1.

Trotsky

Quote from: jtwcornell91
Quote from: CU77
Quote from: jtwcornell91it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.
Right, but two fictitious games is the minimum. This corresponds to a flat prior on win probability for each team. Less than two, and the prior sags in the middle and rises at the ends. This predicts that winning percentages will cluster around zero and one; in real life, this never ever happens; wining percentages (at the end of the season) always cluster around 0.5. Reproducing this in KRACH requires more than two fictitious games. And as you found in baseball, it can be a lot more than two.

I don't think there's necessarily a minimum.  In the simpler case of Bernoulli trials with a single probability, the Jeffreys prior splits the difference between the Bayes-Laplace uniform-in-probability and the Haldane uniform-in-log-odds-ratio, and corresponds to one rather than two "fictitious trials".  The distribution is peaked at p=0 and 1, but if you change variables to ln(p/(1-p)) it's peaked at zero (i.e., p=1/2).

If you consider a situation where there's no parity among the competitors (people drawn at random off the street arm-wrestling or something), you might imagine that if you grab a pair of people and have them compete again and again, one person is likely to win a high percentage of the time.  Winning percentages don't cluster at zero and one because schedules are constructed to match competitively comparable teams.  Consider this year's Olympic women's hockey: if they hadn't divided the eight teams into a "strong group" and a "weak group", you'd have had two team (USA and CAN) that won about 92% of their games, four teams with winning percentages close to 50% (FIN, OAR, SUI and SWE), and two teams with long-term winning percentahes around 8% (JPN and COR).  Something similar might have happened with Men's DI around 2000 when the MAAC teams became full DI, but their winning percentages were inflated because they mostly played each other.  (Compare winning percentage to RRWP in http://www.elynah.com/tbrw/tbrw.cgi?2000/rankings )

Also, note that the quantity that has a uniform prior with two fictitious games is not the expected winning percentage against a balanced schedule, but the expected winning percentage against an average team.  So in the extreme Haldane-like situation where the team's KRACH strengths are drawn from a uniform-in-log prior, you'd expect to get a situation where each team is infinitely better than the teams below them and infinitely worse than the teams above them.  That gives you a uniform distribution of winning percentages (the kth best team out of n beats the n-k teams below them and loses to the k-1 teams above, for a winning percentage of (n-k)/(n-1) = 1 - (k-1)/(n-1)), not one that clusters around 0 and 1.

tldr.

Swampy

Quote from: jtwcornell91
Quote from: CU77
Quote from: jtwcornell91it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.
Right, but two fictitious games is the minimum. This corresponds to a flat prior on win probability for each team. Less than two, and the prior sags in the middle and rises at the ends. This predicts that winning percentages will cluster around zero and one; in real life, this never ever happens; wining percentages (at the end of the season) always cluster around 0.5. Reproducing this in KRACH requires more than two fictitious games. And as you found in baseball, it can be a lot more than two.

I don't think there's necessarily a minimum.  In the simpler case of Bernoulli trials with a single probability, the Jeffreys prior splits the difference between the Bayes-Laplace uniform-in-probability and the Haldane uniform-in-log-odds-ratio, and corresponds to one rather than two "fictitious trials".  The distribution is peaked at p=0 and 1, but if you change variables to ln(p/(1-p)) it's peaked at zero (i.e., p=1/2).

If you consider a situation where there's no parity among the competitors (people drawn at random off the street arm-wrestling or something), you might imagine that if you grab a pair of people and have them compete again and again, one person is likely to win a high percentage of the time.  Winning percentages don't cluster at zero and one because schedules are constructed to match competitively comparable teams.  Consider this year's Olympic women's hockey: if they hadn't divided the eight teams into a "strong group" and a "weak group", you'd have had two team (USA and CAN) that won about 92% of their games, four teams with winning percentages close to 50% (FIN, OAR, SUI and SWE), and two teams with long-term winning percentahes around 8% (JPN and COR).  Something similar might have happened with Men's DI around 2000 when the MAAC teams became full DI, but their winning percentages were inflated because they mostly played each other.  (Compare winning percentage to RRWP in http://www.elynah.com/tbrw/tbrw.cgi?2000/rankings )

Also, note that the quantity that has a uniform prior with two fictitious games is not the expected winning percentage against a balanced schedule, but the expected winning percentage against an average team.  So in the extreme Haldane-like situation where the team's KRACH strengths are drawn from a uniform-in-log prior, you'd expect to get a situation where each team is infinitely better than the teams below them and infinitely worse than the teams above them.  That gives you a uniform distribution of winning percentages (the kth best team out of n beats the n-k teams below them and loses to the k-1 teams above, for a winning percentage of (n-k)/(n-1) = 1 - ((k-1)/(n-1)), not one that clusters around 0 and 1.

FYP. (-5%)

Trotsky

Quote from: SwampyThat gives you a uniform distribution of winning percentages (the kth best team out of n beats the n-k teams below them and loses to the k-1 teams above, for a winning percentage of (n-k)/(n-1) = 1 - ((k-1)/(n-1)), not one that clusters around 0 and 1.

FYP. (-5%)[/quote]

Wrong.  Read it again.  And don't tug on Superman's cape.

Swampy

Quote from: Trotsky
Quote from: jtwcornell91
Quote from: CU77
Quote from: jtwcornell91it's not clear what the appropriate number of "fictitious games" is.  You might guess two, one or even zero, but it's basically a measure of how much parity you expect.
Right, but two fictitious games is the minimum. This corresponds to a flat prior on win probability for each team. Less than two, and the prior sags in the middle and rises at the ends. This predicts that winning percentages will cluster around zero and one; in real life, this never ever happens; wining percentages (at the end of the season) always cluster around 0.5. Reproducing this in KRACH requires more than two fictitious games. And as you found in baseball, it can be a lot more than two.

I don't think there's necessarily a minimum.  In the simpler case of Bernoulli trials with a single probability, the Jeffreys prior splits the difference between the Bayes-Laplace uniform-in-probability and the Haldane uniform-in-log-odds-ratio, and corresponds to one rather than two "fictitious trials".  The distribution is peaked at p=0 and 1, but if you change variables to ln(p/(1-p)) it's peaked at zero (i.e., p=1/2).

If you consider a situation where there's no parity among the competitors (people drawn at random off the street arm-wrestling or something), you might imagine that if you grab a pair of people and have them compete again and again, one person is likely to win a high percentage of the time.  Winning percentages don't cluster at zero and one because schedules are constructed to match competitively comparable teams.  Consider this year's Olympic women's hockey: if they hadn't divided the eight teams into a "strong group" and a "weak group", you'd have had two team (USA and CAN) that won about 92% of their games, four teams with winning percentages close to 50% (FIN, OAR, SUI and SWE), and two teams with long-term winning percentahes around 8% (JPN and COR).  Something similar might have happened with Men's DI around 2000 when the MAAC teams became full DI, but their winning percentages were inflated because they mostly played each other.  (Compare winning percentage to RRWP in http://www.elynah.com/tbrw/tbrw.cgi?2000/rankings )

Also, note that the quantity that has a uniform prior with two fictitious games is not the expected winning percentage against a balanced schedule, but the expected winning percentage against an average team.  So in the extreme Haldane-like situation where the team's KRACH strengths are drawn from a uniform-in-log prior, you'd expect to get a situation where each team is infinitely better than the teams below them and infinitely worse than the teams above them.  That gives you a uniform distribution of winning percentages (the kth best team out of n beats the n-k teams below them and loses to the k-1 teams above, for a winning percentage of (n-k)/(n-1) = 1 - (k-1)/(n-1)), not one that clusters around 0 and 1.

tldr.

Unfortunately, jtw's comments do make mathematical sense. I must admit, I'm either unfamiliar with or have forgotten (when you get to me as old as me, you'll understand) Haldane, but it's certainly true that a log-odds transformation converts probabilities from a range of [0,1] to [-oo, +oo], with the probability distribution for two teams having an equal chance of winning having a peak a 0. The first part of that sentence does have an error in that the peak is at p=.5, not p=0 and 1.

In my earlier comments on this subject I deliberately simplified things by tacitly assuming each team's probability of winning is independent of their opponent. Obviously this is wrong, and jtw is correct in pointing out that any good predictor of outcomes should take into account the two teams as a pair rather than as the winner being independent who the opponent is. Even more, I'd say it needs to take into account the styles of play in any given match-up. If two teams emphasize defense, but one is better at it than the other, then the better one would be more likely to win. OTOH, if the opponent is a high-scoring team with so-so defense, the game's more likely to be a toss-up.

Oh, BTW, thanks for the link. Everyone on this list knows Harvard sucks, but it's nice every once in a while to be reminded that even popular culture knows how pretentious the assholes that go to Harvard can be.

Swampy

Quote from: Trotsky
Quote from: SwampyThat gives you a uniform distribution of winning percentages (the kth best team out of n beats the n-k teams below them and loses to the k-1 teams above, for a winning percentage of (n-k)/(n-1) = 1 - ((k-1)/(n-1)), not one that clusters around 0 and 1.

FYP. (-5%)

Wrong.  Read it again.  And don't tug on Superman's cape.[/quote]

Wait. I just counted parentheses: your original has two left parentheses and three right parentheses. How can this be correct, Kal El?