PWR Rank vs. KRACH Rank

Started by Scersk '97, February 12, 2005, 05:59:52 PM

Previous topic - Next topic

Scersk '97

Oh, and since I mentioned it in the other thread, I'll post my simplistic comparison of the two ranking schemes at this moment.  For the record, I'm a huge believer in KRACH as a more accurate representation of team strength.  So I feel, for example, that Wisconsin being undervalued in PWR is really going to screw someone in the tournament:


As of 2/12/05 4:50 PM CST:

PWR                     KRACH                   Delta
=====================================================
12  Dartmouth           24  Dartmouth           -12
27  Alabama-Huntsville  34  Alabama-Huntsville  -7
24  Western Michigan    31  Western Michigan    -7
26  Bemidji State       32  Bemidji State       -6
20  Vermont             26  Vermont             -6
7   Harvard             12  Harvard             -5
2   Boston College      5   Boston College      -3
22  Brown               25  Brown               -3
4   Cornell             7   Cornell             -3
20  Nebraska-Omaha      23  Nebraska-Omaha      -3
19  Northeastern        21  Northeastern        -2
15  Colgate             16  Colgate             -1
9   Ohio State          10  Ohio State          -1
2   Colorado College    2   Colorado College     0
1   Denver              1   Denver               0
6   Michigan            6   Michigan             0
17  Michigan State      17  Michigan State       0
4   Minnesota           4   Minnesota            0
11  North Dakota        11  North Dakota         0
15  Northern Michigan   15  Northern Michigan    0
28  Bowling Green       27  Bowling Green        1
14  Mass.-Lowell        13  Mass.-Lowell         1
10  Boston University   8   Boston University    2
17  Maine               14  Maine                3
12  New Hampshire       9   New Hampshire        3
22  Minnesota State     18  Minnesota State      4
24  St. Cloud State     19  St. Cloud State      5
8   Wisconsin           3   Wisconsin            5


Well, someone besides just Wisconsin.  To my mind, there's just no way to deny that the WCHA has been dominant as a conference this year, and the way that the RPI is set up absolutely screws them for beating up on each other.  (Something we usually complain about in the ECAC.)

Oh, and way to go Big Green!

Robb

Very interesting way to look at it - nice!

I pretty much don't want to face ANY of the likely WCHA teams this year.  The way things are shaking out, it doesn't look like there are going to be any low-seeded WCHA teams (like the Mankato patsies of '03) except for UND.  I don't care what their ranking is - 7 titles says you don't want to face them in a 1-and-done.

Let's Go RED!

Chris 02

You might want to adjust this to compensate for ties in the PWR.  

Scersk '97

Yep, better make sure that my "simplistic" comparison is hyper-accurate.  ::rolleyes::

Ken \'70

From a common sense standpoint PWR seems more accurate to me than KRACH.  First, it's possible to understand why PWR comes up with the ranking it does.  Even though you may grasp KRACH's method and agree with it's logic, the window into why any two teams relate the way they do is not obvious.  This transparency, alone, is enough to give PWR a significant advantage.

Instead of just simulation of how two teams relate because of schedules and their own results, PWR provides direct data in the form of COP. It creates a virtual "league of the best" in the form of TUC.  Most potently, it includes H2H which is the most direct evidence of two team's relative strength.

Look at WI vs Cornell for example.  KRACH flips them.  The logic for WI being "better", per KRACH, is SOS being better accounted for in that it values WI SOS to be twice that of Cornell's. In KRACH's theoretical world WI does, or will do, better against all good teams, and in round robin play will do better against the same teams.

But Cornell does have a better record against good teams (the top 50% of the KRACH list is virtually the same as TUC).  And Cornell does have a better record against common opponents.  

Why model reality when it's there to see in the first place?  KRACH instead of RPI? Fine.  KRACH instead of PWR? Silly.

Trotsky

[Q]Ken '70 Wrote:
Why model reality when it's there to see in the first place?[/q]

This is pithy and it captures the absurdity of a lot of abstruse analysis, but I don't think it applies here.  PWR and RPI are not models of reality, like scientific theories.  They are not explanatory nor, even when applied correctly, predictive.  They are standings for teams which don't play each other in a balanced schedule.

Although transparency is appealing when following the horserace, that has no bearing on how well the ranking system measures past performance and relative difficulty of opposition.

The optimal solution will (I'm an optimist) be (1) sound mathematically, (2) intuitive, and (3) transparent.  But until we get there, that is the prioritization.

Ken \'70

[Q]Trotsky Wrote:

 The optimal solution will (I'm an optimist) be (1) sound mathematically, (2) intuitive, and (3) transparent.  But until we get there, that is the prioritization.[/q]


Then we agree...COP, TUC (KRACH-ified if you want), and H2H are mathematically sound, intuitive and transparent.  RPI is the bone of contention for some people, but not here.  I yielded to the KRACH heads a long time ago, and it's OK with me if only one of four selection components doesn't meet 2 of your 3 your criteria since the rest do.  

jtwcornell91

[Q]Ken '70 Wrote:
 From a common sense standpoint PWR seems more accurate to me than KRACH.  First, it's possible to understand why PWR comes up with the ranking it does.  Even though you may grasp KRACH's method and agree with it's logic, the window into why any two teams relate the way they do is not obvious.  This transparency, alone, is enough to give PWR a significant advantage.
[/Q]

PWR is a hodgepodge of criteria thrown together to handle coaches' and fans' grouses about being ranked below another team when anecdotal evidence makes them sound better.  By the same logic, why should we bother using RPI at all when it's not "obvious" why one team ends up above another based on record and strength of schedule?

[Q]
Instead of just simulation of how two teams relate because of schedules and their own results, PWR provides direct data in the form of COP. It creates a virtual "league of the best" in the form of TUC.  Most potently, it includes H2H which is the most direct evidence of two team's relative strength.
[/Q]

Common opponents and head-to-head results are a more direct way to compare teams than games against the rest of the NCAA, but they deal with a much smaller sample.  How you perform in a few games can override your performance for the whole rest of the season.  And the problem with the TUC criterion is that it compares straight winning percentage, and it's possible to have very different schedule strengths within the subset of TUCs.  Just look at the PWCs from 1999, when Quinnipiac had the 5th best record in the nation vs Teams Under Consideration: http://slack.net/~whelan/tbrw/1999/details.990321

[Q]
Look at WI vs Cornell for example.  KRACH flips them.  The logic for WI being "better", per KRACH, is SOS being better accounted for in that it values WI SOS to be twice that of Cornell's. In KRACH's theoretical world WI does, or will do, better against all good teams, and in round robin play will do better against the same teams.
[/Q]

Well, the actual logic is that the set of KRACH ratings constructed using all the games played by all the teams in the NCAA gives the best fit to actual results.  The breakdown into SOS and winning ratio is just an illustrative tool.

[Q]But Cornell does have a better record against good teams (the top 50% of the KRACH list is virtually the same as TUC).[/Q]

Except that Cornell actually has a worse record against teams with a RRWP above .500 (which is the obvious way to define TUC using KRACH instead of RPI): http://slack.net/~whelan/tbrw/2005/cgi-bin/rankings.cgi?dispPWR=true;PWCdetails=true;PCTweight=25;OPPweight=50;OOPweight=25;topqual=15;homebon=.0010;neutbon=.0020;roadbon=.0030;rpifudge=playoff;PWCtb=RPI;PWCtbwt=1;PWCh2hwt=1;PWCh2h=per%20game;PWCtucwt=1;TUCdefcrit=rrwp;TUCdefrel=ge;TUCdefcut=.500;PWCtuccrit=pct;PWCtucomit=true;PWClastwt=0;PWClastnum=16;PWClastcrit=pct;PWCcomwt=1;PWCcommingm=1;PWCcommintm=1;PWCcomcrit=pct;scoresel=current;scores=#cCrWi  Cornell is 8-4-3 and Wisconsin is 16-7-1.

And if we keep the current definition of a TUC but consider the strength of the TUCs actually played, Wisconsin also has the advantage in that criterion as well:
http://slack.net/~whelan/tbrw/2005/cgi-bin/rankings.cgi?dispPWR=true;PWCdetails=true;PCTweight=25;OPPweight=50;OOPweight=25;topqual=15;homebon=.0010;neutbon=.0020;roadbon=.0030;rpifudge=playoff;PWCtb=RPI;PWCtbwt=1;PWCh2hwt=1;PWCh2h=per%20game;PWCtucwt=1;TUCdefcrit=rpi;TUCdefrel=ge;TUCdefcut=.500;PWCtuccrit=hhwp;PWCtucomit=true;PWClastwt=0;PWClastnum=16;PWClastcrit=pct;PWCcomwt=1;PWCcommingm=1;PWCcommintm=1;PWCcomcrit=pct;scoresel=current;scores=#cCrWi
If you want an anecdotal explanation for that, consider that of Cornell's 14 games against TUCs, only one was against a team in the top 10 of RPI or KRACH, while nine of Wisconsin's 18 games against TUCs were.  So it's hard to say Cornell winning 61% of their TUC games is better than Wisconsin winning 58% of theirs.

[Q]And Cornell does have a better record against common opponents.[/Q]

Based on four games by Cornell (2-0 vs Yale and 0-1-1 vs Michigan State) and two games by Wisconsin (0-0-1 vs Yale and 0-1 vs Michigan State).  Are those four games really as important as the teams' overall performance over the whole season?

[Q]Why model reality when it's there to see in the first place?  KRACH instead of RPI? Fine.  KRACH instead of PWR? Silly.[/q]

Except the other criteria in the PWR are handled even worse than RPI, which at least has some consideration of schedule strength built into it, even if it's a broken one.  If you really want to keep the hodgepodge of 1) overall performance 2) performance vs TUCs 3) performance vs common opponents and 4) each head-to-head win counts as much as any other criterion -- and I think #3 and #4 are giving too much weight to potentially just a few games -- then to do it right, you should 1) use KRACH instead of RPI and 2) define TUCs with KRACH instead of RPI and compare performance vs TUCs using the same strength-of-schedule method as KRACH.  (In principle, you should also consider strength of schedule in the common opponents criteria, since one team could play more games against stronger common opponents, but let's leave that and H2H as-is for simplicity.)  So program all those modifications into the DIY script (select c-HHWP to incorporate strength of schedule into any of the criteria that look at a subset of a team's games) and ...

http://slack.net/~whelan/tbrw/2005/cgi-bin/rankings.cgi?dispPWR=true;PWCdetails=true;PCTweight=25;OPPweight=50;OOPweight=25;topqual=15;homebon=.0010;neutbon=.0020;roadbon=.0030;rpifudge=playoff;PWCtb=RRWP;PWCtbwt=1;PWCh2hwt=1;PWCh2h=per%20game;PWCtucwt=1;TUCdefcrit=rrwp;TUCdefrel=ge;TUCdefcut=.500;PWCtuccrit=hhwp;PWCtucomit=true;PWClastwt=0;PWClastnum=16;PWClastcrit=pct;PWCcomwt=1;PWCcommingm=1;PWCcommintm=1;PWCcomcrit=pct;scoresel=current;scores=

Cornell is still #7.  In particular, we lose the comparison to Wisconsin because our KRACH is lower (although our RPI is higher) and our performance against TUCs is worse for the reasons explained above.

Similar recipes for how to improve PWR are spelled out in http://slack.net/~whelan/tbrw/tbrw.cgi?kpairwise and in fact the system we proposed then looks like this with the DIY script: http://slack.net/~whelan/tbrw/2005/cgi-bin/rankings.cgi?dispPWR=true;PWCdetails=true;PCTweight=25;OPPweight=50;OOPweight=25;topqual=15;homebon=.0010;neutbon=.0020;roadbon=.0030;rpifudge=playoff;PWCtb=RRWP;PWCtbwt=1;PWCh2hwt=1;PWCh2h=per%20game;PWCtucwt=1;TUCdefcrit=pct;TUCdefrel=ge;TUCdefcut=.500;PWCtuccrit=hhwp;PWCtucomit=true;PWClastwt=1;PWClastnum=16;PWClastcrit=hhwp;PWCcomwt=1;PWCcommingm=1;PWCcommintm=1;PWCcomcrit=hhwp;scoresel=current;scores=

jtwcornell91

Another reason why I think the NCAA ought to drop the pairwise comparison system: the committee no longer applies it as it was intended.  The pairwise comparison system tells you how team A relates to team B, so totalling up team A's comparisons with teams B, C, ... Z and team B's comparisons with teams A, C, ... Z is primarily useful for getting an overall feel for where the bubble is, but then you're supposed, as the system was originally conceived, to look at the comparisons among teams actually in the running for an at-large bid or a seed.  The Joe Marsh committees did this, and the old You Are The Committee scripts let you walk through the process: http://slack.net/~whelan/tbrw/tbrw.cgi?2002/tourney

But lately, it's clear that the committee just looks at the ordering of teams by PWR and uses those 1-16 seeds to rank the teams.  The clearest case was in 2003, when Ohio State and Harvard got 3 seeds and SCSU and Mankato got 4 seeds because they were 11, 12, 13, and 14 in the PWR, with 17, 16, 16, and 15 comparisons won, respectively.  (Harvard wins the comparison with SCSU and has a higher RPI, so whichever one is the "tiebreaker" they get it.)  But those overall PWR numbers include comparisons against teams that didn't make the field of 16.  If you actually use the comparisons among tournament teams to seed the field, the number of comparisons won is 4, 4, 4, and 2.  OSU, Harvard, and SCSU are in a rock-scissors-paper tie (OSU beats Harvard beats SCSU beats OSU), so we should go to the RPI to resolve it, and the order on that is Harvard, SCSU, OSU, which means OSU and Mankato should be 4 seeds and Harvard and SCSU should be 3 seeds.  (In effect, St. Cloud got a 4 seed because they lost the pairwise comparison with Dartmouth, a team not even in the field.)  So in fact that whole business with Cornell getting Mankato in the first round would have been avoided if the committee had used the PWCs as they were used in the past.  (See http://lists.maine.edu/cgi/wa?A2=ind0303&L=hockey-l&D=0&F=P&P=24721&F= for more.)

Having a pairwise comparison system is fine (as long as the criteria are sensible) but then it needs to get used as a pairwise comparison system.  If you want an overall ranking (which is what the current committee seems to want), use a system that gives you an overall ranking.

ugarte

[Q]Ken '70 Wrote: This transparency, alone, is enough to give PWR a significant advantage.[/q]Win% is transparent. That doesn't make it particularly explanatory when the schedule is unbalanced. Hockey fans (and, for that matter, the selection committee) has shown themselves to be particularly tolerant of statistical analysis to measure relative strength. If KRACH were explained to the community at large I think the neutrality of the methodology would be considered transparent even if the audience (like, say, me) couldn't figure it out themselves with a supercomputer and a truckload of slide rules.

[q]Instead of just simulation of how two teams relate because of schedules and their own results, PWR provides direct data in the form of COP. It creates a virtual "league of the best" in the form of TUC.  Most potently, it includes H2H which is the most direct evidence of two team's relative strength.[/q]As John said, the sample sizes are so small that they are meaningless for anything except harrassing a fan of the other team in a bar. All I can think of is Bentley beat Quinnipiac ... who beat Denver, so Bentley should be #1! The real heavy lifting is in John's response.

[q]Why model reality when it's there to see in the first place?  KRACH instead of RPI? Fine.  KRACH instead of PWR? Silly.[/q]Reality is in the eye of the beholder. If you think that one tie between Yale and Wisconsin is more representative of the relative strengths of those teams (and how we relate to both of them) than a recursive analysis of all games played by all teams, this isn't a debate. It is a clashing of realities.

The Rancor

PWR and KRACH are both good for determining seeding for the tournament but cant real show intangible elements of a team. an example being after months of play and practice Cornell is a way better team than they were in October, by virtue of playing the season together and having a low occurrence of injury etc. also, they are clicking, winning and 'on a roll' mental momentum is a big factor. Could the February team kick Michigan State's ass? yes, i think so, even if the October team couldn't. also State is a slightly different team at this point in the season, I'd say worse now than then. but that's what 'polls' measure i guess, so that's why we need both.

KeithK

[q]But lately, it's clear that the committee just looks at the ordering of teams by PWR and uses those 1-16 seeds to rank the teams.[/q]I think this is a case of USCHO doing too good of a job publlicixing the Pairwise.  As far as I know, the coachesdidn't sit down and come up with the PWR table as it is presented on USCHO.  They did comparisons of pairs of teams the way John described.  Then USCHO presents them in tabular form and people get the mistaken opinion that it's a ranking system.  The committee membership changes and the new guys start using it in the "improper" (that is, non-original) way because that is the general perception among fans and because that's more similar to the way most NCAA sports' selection process (RPI).

Ken \'70

[Q]jtwcornell91 Wrote:

   If you really want to keep the hodgepodge of 1) overall performance 2) performance vs TUCs 3) performance vs common opponents and 4) each head-to-head win counts as much as any other criterion -- you should 1) use KRACH instead of RPI and 2) define TUCs with KRACH instead of RPI and compare performance vs TUCs using the same strength-of-schedule method as KRACH.  (In principle, you should also consider strength of schedule in the common opponents criteria, since one team could play more games against stronger common opponents, but let's leave that and H2H as-is for simplicity.)  So program all those modifications into the DIY script (select c-HHWP to incorporate strength of schedule into any of the criteria that look at a subset of a team's games) and ...



Cornell is still #7.  In particular, we lose the comparison to Wisconsin because our KRACH is lower (although our RPI is higher) and our performance against TUCs is worse for the reasons explained above.

[/q]

Great!  We can combine advanced, recursive formulae with reality and get...an even better simulation of reality than we had before.  Works for me.


Jim Hyla

[Q]Ken '70 Wrote: Great!  We can combine advanced, recursive formulae with reality and get...an even better simulation of reality than we had before.  Works for me.
[/q]Define your reality, please. ::twitch::
"Cornell Fans Made the Timbers Tremble", Boston Globe, March/1970
Cornell lawyers stopped the candy throwing. Jan/2005

Ken \'70

[Q]Jim Hyla Wrote:

 [Q2]Ken '70 Wrote: Great!  We can combine advanced, recursive formulae with reality and get...an even better simulation of reality than we had before.  Works for me.
[/Q]
Define your reality, please.[/q]

MN beat WI 3 of 4 this year = reality
KRACH says WI is better than MN = unreality

Got it?