RPI primer (brackets for dummies): Why #2/#3 Cornell may seed lower

Newman · February 28, 2005, 05:24:00 PM

The covariance in KRACH is an interesting problem. I think it's pretty clear that goal scoring shouldn't be a factor; hockey is about win or lose, not racking up the goals - otherwise Cornell would be ranked pretty darn low. The Bradley-Terry underlying KRACH is a logit model, so it shouldn't be difficult to dig into this and find how to do it. It might also be interesting to expand KRACH to a random parameters model, on the premise that teams can sometimes play well or poorly. I'm not sure there's enough information in a season of wins and losses to calculate good parameters, but it's certainly worth some investigation. (As if I need any more projects.)

As for the "[Pairwise] For Dummies" concept that started this thread, I'd offer this basically non-math way to think about the pairwise rankings: The difference between your team (not just Cornell, but basically every team) ranking as number 4, 14, or 24 in the pairwise is based on the quality and skill of your team. The difference between 4 and 5 (a la, "earning a one seed"), or between 14 and 15 (a la, "making the tournament") is based more on the vagaries of the records of other teams, especially teams with records near .500, even teams which your team did not play this year. This is a principle reason why people complain about this system.

billhoward · February 28, 2005, 05:44:10 PM

Perhaps goals scored should not be a factor, nor margin of victory. But one might allow a bonus for margin of victory up to say 3 or 4 goals and that would decrease the desire to roll up the score to move up the rankings.

The very best sports mathmeticians are not, I bet, on East Hill or at MIT (no offense), but probably in Vegas, because their livelihoods ride on it. And they consider MOV in their rankings of teams but they also shut off victories that are more than X touchdowns or Y basketball points. I think you should not count empty net goals in MOV.

MOV calculations would hurt Cornell unless there was some percentage-MOV, meaning a 2-0 win is more powerful than a 4-2 win.

No one else has mentioned, I don't believe, that Cornell has so few out of conference games it can directly manipulate: two weaklings to start the year, Michigan State (two), then Florida Classic (two more). So for instance that one loss to BC has (I think) an effect on our rankings greater than 1/29 of the games we play.

jkahn · February 28, 2005, 05:48:32 PM

[Q]elliotb Wrote:

The TUC business is essentially a "weighted" winning percentage, where a game gets a weight of 1 if it's against a TUC and a weight of 0 otherwise. A natural alternative, which would eliminate the problem of teams popping in and out of TUC status would be to make the weights a continuous function of the opponent's RPI.

- Elliot[/q]

While I agree that weighted is better than 1 or 0, I personally don't like the TUC part of the PWR. Is losing two games to Mich. Tech but winning two from Wisc. (2-0 vs. TUCs) really better than losing two to Wisconsin and winning two from Mich. Tech (0-2 vs. TUCs). Perhaps if we need to include a concept like TUC, it should be each team's wins vs. TUCs and losses vs. non-TUCs. Just as a win vs. a good team is better than a win vs. a weak team, it seems like a loss to a bad team should carry more weight than a loss to a good team (preferably using weighting as described above, but using 1-RPI for losses). However, as the ECAC has more non-TUCs than say the WCHA, there would be more chances for non-TUC losses, so this would not in general be good for the ECAC (although it would be excellent for us this year).

Will · February 28, 2005, 06:01:18 PM

[Q]billhoward Wrote:

two weaklings to start the year, Michigan State (two), then Florida Classic (two more). [/q]

You forgot our Thanksgiving "weakling", Canisius.

LarryW · February 28, 2005, 06:17:05 PM

I am not saying goals should be a factor in the rating. But if you want to ask the question, "What does it mean that Podunck U is ranked #10 with a rating of 369.9 and State U is ranked #11 with a rating of 361.6," then you need to know something about whether that ratio is meaningful. If those two teams had played 1000 games against each other, you might conclude that the difference was significant (in the scientific sense). If those two teams have each played 10 games, against very different competition, you might conclude differently. If those two teams are North Dakota and Ohio State in the 2004-05 season, you might want to quantify if that difference has any significance or not.

Now, how do you do that? I'm suggesting that a game won 6-1 might have more information in it, than one won 2-1. Maybe. Not in a "determine the rating" kind of way, but maybe in a "How definitive are the ratings" kind of way. Writing that, I see a very slippery slope here, so perhaps not. But, I throw it out there anyway.

elliotb · February 28, 2005, 06:32:26 PM

[Q]LarryW Wrote:

To measure a variance requires you to define the extent to which the initial knowledge is imperfect. But, to what extent is the W-L-T(1-0-0.5) info of a given game imperfect? 20%, 10%, does it depend on the score?[/q]

What you're suggesting is considerably more complicated than what I had in mind. I was taking for granted that each game is a 1/0 outcome -- plus ties, of course. The Bradley-Terry model is based on that assumption, so if you question that, you're off thinking about different models.

My point was just that within the context of the existing model you can calculate standard errors for the parameter estimates and use them to get things like confidence intervals for the KRACH ratings. Even something as simple as a graph of 95% confidence intervals for each team's KRACH rating would give a nice picture of which teams are truly different and which are statistically indistinguishable.

billhoward · February 28, 2005, 09:41:23 PM

Wait, wasn't Bradley-Terry the GOP/Bible Belt bill trying to keep Planned Parenthood from advising girls under 18 ... in the belief that with no contraception, they'd stay celibate?

billhoward · February 28, 2005, 09:45:15 PM

All this discussion might, just might, in some small way move the NCAA to think about more sophisticated statistical means of comparing teams. A columnist trolling these forums gets a column idea ... the column gets printed ... someone helps read the big words to them and evetually the NCAA gets religion.

Once you start to use a statistical rating as part of the bid-selection process, you (NCAA) kind of has to ask whether there's a better statistical tool. But like Detroit automakers, the NIH (not invented here, so it can't be any good) syndrome probably holds forth.

Dart~Ben · February 28, 2005, 10:20:14 PM

A Vegas spread is not based on what the bookie thinks the margin will be. It's based on what the bookie thinks will get even betting on both teams. The difference might be small, but there is a difference.

ugarte · February 28, 2005, 11:46:49 PM

[Q]billhoward Wrote:

All this discussion might, just might, in some small way move the NCAA to think about more sophisticated statistical means of comparing teams. A columnist trolling these forums gets a column idea ... the column gets printed ... someone helps read the big words to them and evetually the NCAA gets religion.

Once you start to use a statistical rating as part of the bid-selection process, you (NCAA) kind of has to ask whether there's a better statistical tool. But like Detroit automakers, the NIH (not invented here, so it can't be any good) syndrome probably holds forth. [/q]I don't know what you are talking about, Bill. Adam has been begging the NCAA to use KRACH from his USCHO pulpit for a long time. If you'll notice, there is a link to KRACH on the site AND a current analysis singing its praises.

I am giving myself a three day moratorium on responding to your posts before you start thinking I am obsessed.

Newman · March 01, 2005, 01:23:18 AM

The english version:

KRACH is not a holy grail of rankings. It may be slightly better than pairwise, but a season of about 30-35 games per team doesn't conclusively prove who's the best, or what the overall rankings should be. So when there are apparent "major" aberrations, such as Dartmouth ranked 9th in Pairwise but 24th in KRACH, these differences are not really that huge, since the differences between adjacent teams in the rankings are miniscule. In fact, any objective ranking criteria that would get the teams in an order that is anything close to reasonable would probably be mathematically unrejectable.

Put another way, the existence of KRACH does not prove Pairwise as invalid. There is a season length, probably on the order of hundreds of games per team with plenty of inter-conference play, when KRACH would invalidate Pairwise, but the actual college hockey season doesn't meet that criteria.

This is, in part, why we have playoffs. Every college hockey team but one (the cellar-dweller of Hockey East) makes their conference playoffs, and thus has one last opportunity to win out and be national champions. Therefore, people should treat the playoffs as we did when we got Mankato in the first round two years ago; the system may hand you tough opponents or easy ones, but you've got to be able to beat any team on any day.

The mathematical version:
(If you don't know what the standard error of the estimate is, you won't lose anything by stopping now)
I've figured out the necessary transformations to get a covariance matrix for KRACH. Since KRACH is a purely ratios system, it isn't possible to get a variance/covariance matrix for the whole league at once. This is solved for the scores themselves by arbitrarily setting the average to 100, but it doesn't work so neatly for variance, so I set Cornell's variance to zero and let all other teams vary compared to us. I've posted an excel workbook at http://pubweb.northwestern.edu/~jpn714/KRACH_Analysis_022805.xls that has most of the data, including covariance tables and hessians.

To sum up the results of the analysis, all at a 95% confidence level:

> Teams 1-15 in KRACH have a statistical claim on being #1 at a 95% confidence level (which, ignoring the fact the KRACH has little to do with tournament selection, makes having a 16 team tournament rather an auspicious size).
> Cornell has a statistically significant better rating than all teams from #30 SLU downwards at a 95% confidence level
> The standard error of the estimates are relatively consistent across teams, although it's smaller for WCHA teams, probably due to longer schedules and more in-conference play. WCHA covariances with other teams are also notably smaller.
> Generally the 95% confidence interval will allow teams to move up or down roughly 15 ranks on the list, or a little more towards the middle of the pack, although this isn't constant across teams.
> It looks like the most important games for a team are the ones played against others close to them in rank, whether in conference or not.


 KRACH with 95% Confidence Intervals

Rank   Team         KRACH   Lower    Upper
1   ColoradoCollege     954.75   218.30   4175.70
2   Denver              794.61   188.06   3357.42
3   Minnesota           601.30   147.85   2445.53
4   Wisconsin           557.28   132.53   2343.28
5   BostonCollege       520.47   135.92   1992.94
6   Michigan            519.19   129.15   2087.12
7   Cornell         479.60   n/a   n/a
8   NewHampshire        417.49   109.83   1587.00
9   BostonUniversity    397.14   105.27   1498.20
10   NorthDakota         370.03   91.44   1497.41
11   OhioState           361.70   89.71   1458.27
12   MassLowell          315.65   82.24   1211.51
13   Maine               284.07   78.32   1030.37
14   Harvard             279.72   78.48   996.93
15   NorthernMichigan    276.75   71.24   1075.04
16   MinnesotaDuluth     227.32   57.31   901.66
17   Northeastern        206.09   55.53   764.77
18   AlaskaAnchorage     204.62   48.98   854.76
19   Colgate             204.19   58.17   716.74
20   MichiganState       203.56   55.36   748.57
21   MinnesotaState      201.81   49.60   821.04
22   Vermont             186.25   54.23   639.72
23   StCloudState        184.26   46.08   736.74
24   Dartmouth           183.34   51.41   653.83
25   NebraskaOmaha       164.22   42.21   638.91
26   BowlingGreen        159.41   41.18   617.09
27   MichiganTech        144.17   34.60   600.75
28   Miami               136.63   35.54   525.18
29   AlaskaFairbanks     126.27   31.57   505.11
30   StLawrence          114.36   33.29   392.78
31   Brown               107.66   29.86   388.17
32   BemidjiState        92.67   22.91   374.87
33   WesternMichigan     92.28   23.47   362.91
34   FerrisState         82.78   21.24   322.61
35   Massachusetts       81.78   20.80   321.50
36   LakeSuperior        81.40   21.30   311.13
37   AlabamaHuntsville   78.64   18.85   328.06
38   Providence          69.97   18.05   271.27
39   Clarkson            51.62   14.43   184.71
40   NotreDame           50.32   12.33   205.38
41   Union               48.10   13.20   175.24
42   Rensselaer          44.57   12.38   160.40
43   Niagara             43.21   10.89   171.40
44   Merrimack           41.68   10.56   164.50
45   WayneState          38.77   9.96   150.96
46   Princeton           34.47   8.93   133.06
47   AirForce            21.23   4.99   90.36
48   Yale                18.40   4.23   80.05
49   Quinnipiac          16.78   3.77   74.81
50   HolyCross           15.78   3.66   68.10
51   Canisius            14.76   3.35   65.06
52   Mercyhurst          12.45   2.79   55.60
53   RobertMorris        12.34   2.83   53.76
54   SacredHeart         10.36   2.28   47.14
55   Connecticut         8.04   1.85   35.00
56   Bentley             4.94   1.06   22.93
57   Army                3.96   0.81   19.33
58   AmericanIntl        2.58   0.51   13.10

puff · March 01, 2005, 01:38:22 AM

Impressive looking, but to be completely honest its been too long since i took stats freshman year. It all just kinda blew over my head. Hopefully given some more time i'll sort this out, i feel slow on the uptake these days::snore::

-tewinks
::stupid::

Beeeej · March 01, 2005, 08:42:28 AM

What I got out of Newman's post:

Blah blah blah blah blah blah
Blah blah blah blah blah
Blah blah blah blah blah

Can't you talk about something really useful, like the motion practice aspects or fourth amendment implications of KRACH, or something? :-O

Beeeej

DeltaOne81 · March 01, 2005, 09:52:09 AM

What I got got out of the last part of Beeeej's post:
blah blah blah blah
blah blah blah blah
amendment blah blah KRACH
blah blah

:-P ::nut::

ninian '72 · March 01, 2005, 10:28:55 AM

Once you start to consider such factors as goal differentials, you introduce a host of new assumptions. And before you can start to sort these out, you need to decide what the various rating systems are intended to do. Are they an index of some abstract notion of team strength or quality? Or perhaps the likelihood of getting a W on any given night? Whatever the purpose, there are a lot of contextual subtleties that won't be picked up by any index as now constructed or by any attempt to incorporate information on goal differences. Hypothetical example: Two Western teams play a bombs-away goal fest and end with a score of 8-7. Cornell cranks out its usual 2-1 type of ECACHL win the same night. If these two sets of teams played each other ten times, how likely would it be for the outcomes (winning team) to be the same? I'd guess that the Western teams would be more likely to have .500 records in such a series, while Cornell would be more likely to have a record better than that, because the one goal differential is the result of the system they play. To the extent that there are differences in teams' style of play and that these differences result in different margins of victory, it's not clear that incorporating goals for/against information isn't going to do anything other than add another systematic source of error to the pot. Perhaps adding an interaction of "goals for/against ratio" x win% to a logit model might help, but I'm not sure this goes far enough.