NCAA Lacrosse Bradley-Terry

Started by jtwcornell91, May 08, 2007, 04:50:44 PM

Previous topic - Next topic

jtwcornell91

I finally got around to calculating Bradley-Terry ratings (known in the college hockey context as KRACH) ratings for NCAA D1 lacrosse.  As I'm sure most of us know, straight-up BT puts Cornell, as an undefeated team with a non-insular schedule, at the top, with an infinite rating:

  # Team                BT    RRWP  W- L   W/L   SOS
  1 Cornell             infin 1.00 13- 0 infin  N/A
  2 Duke                22032 .961 14- 2 7.000  3147
  3 Virginia            6357  .918 12- 3 4.000  1589
  4 Georgetown          3714  .889 11- 2 5.500 675.3
  5 Johns Hopkins       3005  .876  9- 4 2.250  1336
  6 Maryland            2153  .852 10- 5 2.000  1076
  7 Albany              2016  .848 14- 2 7.000 288.1
  8 Navy                1769  .837 11- 3 3.667 482.5
  9 Princeton           1752  .837 10- 3 3.333 525.5
 10 North Carolina      1695  .834  9- 5 1.800 941.7
 11 Notre Dame          855.7 .774 11- 3 3.667 233.4
 12 Loyola              530.0 .725  7- 5 1.400 378.6
 13 UMBC                513.0 .721 10- 5 2.000 256.5
 14 Colgate             466.3 .711 11- 5 2.200 212.0
 15 Delaware            443.5 .705 11- 5 2.200 201.6
 16 Towson              403.4 .695  8- 6 1.333 302.5
 17 Drexel              293.1 .658 11- 5 2.200 133.2
 18 Bucknell            283.0 .654 11- 4 2.750 102.9
 19 Syracuse            272.4 .649  5- 8 .6250 435.9
 20 Ohio State          206.2 .616  9- 5 1.800 114.6
 21 Stony Brook         187.3 .604  8- 5 1.600 117.1
 22 Yale                161.7 .586  7- 6 1.167 138.6
 23 Rutgers             156.9 .582  6- 6 1.000 156.9
 24 Fairfield           140.7 .569  6- 6 1.000 140.7
 25 Massachusetts       137.6 .566  7- 7 1.000 137.6
 26 Pennsylvania        133.5 .563  6- 7 .8571 155.7
 27 Harvard             111.9 .541  5- 7 .7143 156.6
 28 Denver              108.7 .537  9- 7 1.286 84.51
 29 Brown               105.3 .534  7- 7 1.000 105.3
 30 Penn State          89.82 .514  5- 8 .6250 143.7
 31 Dartmouth           89.30 .514  5-10 .5000 178.6
 32 Hofstra             76.70 .495  6- 8 .7500 102.3
 33 Army                67.40 .480  6- 9 .6667 101.1
 34 Binghamton          44.18 .432  4- 9 .4444 99.40
 35 Hobart              35.63 .408  5- 9 .5556 64.13
 36 St. John's          32.04 .397  5- 8 .6250 51.27
 37 Villanova           30.68 .392  7- 7 1.000 30.68
 38 Lehigh              28.24 .383  4- 9 .4444 63.54
 39 Holy Cross          15.33 .323  6- 8 .7500 20.44
 40 Vermont             11.59 .297  4-10 .4000 28.97
 41 Quinnipiac          7.779 .262  6- 7 .8571 9.075
 42 Air Force           7.323 .257  2-10 .2000 36.61
 43 Bellarmine          6.846 .252  3-10 .3000 22.82
 44 Siena               6.792 .251  9- 6 1.500 4.528
 45 Sacred Heart        4.688 .221  4- 8 .5000 9.375
 46 Providence          3.681 .203  7- 9 .7778 4.733
 47 Manhattan           2.921 .186  6- 8 .7500 3.895
 48 Saint Joseph's      2.177 .167  6-12 .5000 4.354
 49 Marist              2.041 .163  6- 9 .6667 3.061
 50 Canisius            1.900 .158  6- 8 .7500 2.533
 51 Hartford            1.671 .151  2-13 .1538 10.86
 52 Mount St. Mary's    1.397 .141  4-10 .4000 3.493
 53 Robert Morris         0   .045  2- 9 .2222   0  
 54 Lafayette             0   .036  1-12 .0833   0  
 55 VMI                   0   .027  1-11 .0909   0  
 56 Wagner                0   .000  0-15   0    N/A

This is nicely illustrated with the following graph:

For "everybody else", see entries 2 to 52 in the table above.

Ronald '09

Well the good news for the committee from that is that only one team (Colgate) can have a legitimate argument as far as making the tournament from these rankings.  The field includes the top 16 teams minus Colgate plus Providence.

Georgetown and Navy have bigger complaints than we do.  Georgetown would be a top 4 seed, although they would play Hopkins in the second round anyway.  And Navy would be a top 8 seed, but they get to play a team they beat by 11 anyway.

So other than our seeding, which is ridiculous but really doesn't matter, and those other couple of small issues, it does appear that the committee did a reasonable job choosing the field of 16.

jtwcornell91

Okay, so there is an argument that explains how you might not judge from the game results that an undefeated team is automatically the best.  Before the season starts you are in a state of ignorance about the strengths of the the various teams.  (Okay, so you're really not, but to be fair to everyone you should put aside whatever prior expectations you have.)  Every result gives you a little more information, and it could be that if Team A beats Team B and a bunch of cream puffs, you've gained one game's worth of interesting results.  But then Team B goes out, plays a bunch of other teams which also rack up good records against a cross-section of teams, and say they lose to one other strong team, but rack up six wins against tough opponents, plus some array of easy wins that don't tell you too much.  Well, since wins over cream-puffs don't tell you all that much, the information you have to work with is one win by Team A against a tough opponent, and six wins and two losses by Team B against similar competition.  It could be that those eight games tell you more about Team B than the one does about Team A.

Well, the good news is we can make all of this quantitative, since this is exactly what Bayesian statistics tell us to do: start with some prior expectation the likelihood that unknown quantities (in this case teams' Bradley-Terry ratings) take on certain values, and modify those priors based on observational data (game results) to get a posterior probability distribution.

It turns out, if you use what's known as a Jeffreys prior, a uniform probability distribution in the logarithm of each team's BT rating, the maximum of the posterior probability distribution will be the usual set of ratings predicted by KRACH or its equivalent.  But this is problematic, since it lets things run off to infinity, and basically represents the wrong kind of ignorance.  For instance, if we ask the question "what fraction of games do we expect a given team to win against a team with a fixed rating, say 100", the prior probability distribution has infinitely sharp peaks at 0 and 1; basically, there's an infinite amount of room for a team's rating to be arbitrarily higher or lower than 100.

A more well-behaved prior is to pick some reference rating, like 100, and say that a team is a priori equally likely to have any expected head-to-head winning percentage against that team.  This gives a probability distribution in log(BT rating) which is peaked around log(100).  And, as it turns out, when you use this prior and construct the posterior probability distribution considering the results of the games, there is always a single peak at nice finite values of all the ratings, and it's basically the equivalent of a KRACH rating with two "fictitious games" (one win and one loss) for each team against that hypothetical team with a rating of 100.  So we can calculate the maximum likelihood ratings with the usual software, and get (now "W/L" actually means (W+1)/(L+1) and likewise SOS includes the fictitious games)

  # Team                BT    RRWP  W- L "W/L"  SOS
  1 Cornell             3384  .946 13- 0 14.00 241.7
  2 Duke                1889  .910 14- 2 5.000 377.8
  3 Virginia            992.5 .853 12- 3 3.250 305.4
  4 Albany              809.5 .830 14- 2 5.000 161.9
  5 Georgetown          804.7 .830 11- 2 4.000 201.2
  6 Johns Hopkins       661.5 .806  9- 4 2.000 330.8
  7 Princeton           513.8 .773 10- 3 2.750 186.8
  8 Navy                511.7 .773 11- 3 3.000 170.6
  9 Maryland            478.8 .763 10- 5 1.833 261.2
 10 North Carolina      430.9 .748  9- 5 1.667 258.5
 11 Notre Dame          427.3 .747 11- 3 3.000 142.4
 12 UMBC                286.8 .686 10- 5 1.833 156.4
 13 Colgate             273.4 .678 11- 5 2.000 136.7
 14 Delaware            269.8 .676 11- 5 2.000 134.9
 15 Loyola              255.5 .667  7- 5 1.333 191.7
 16 Bucknell            230.9 .650 11- 4 2.400 96.23
 17 Towson              218.0 .640  8- 6 1.286 169.5
 18 Drexel              215.1 .637 11- 5 2.000 107.5
 19 Stony Brook         177.8 .604  8- 5 1.500 118.6
 20 Ohio State          167.9 .594  9- 5 1.667 100.7
 21 Syracuse            150.9 .575  5- 8 .6667 226.4
 22 Yale                143.4 .566  7- 6 1.143 125.5
 23 Rutgers             128.6 .546  6- 6 1.000 128.6
 24 Denver              128.3 .546  9- 7 1.250 102.7
 25 Fairfield           124.8 .540  6- 6 1.000 124.8
 26 Massachusetts       123.6 .539  7- 7 1.000 123.6
 27 Pennsylvania        121.8 .536  6- 7 .8750 139.1
 28 Brown               112.1 .521  7- 7 1.000 112.1
 29 Harvard             106.6 .512  5- 7 .7500 142.1
 30 Dartmouth           91.79 .484  5-10 .5455 168.3
 31 Penn State          88.88 .478  5- 8 .6667 133.3
 32 Hofstra             84.20 .469  6- 8 .7778 108.3
 33 Army                81.36 .462  6- 9 .7000 116.2
 34 Villanova           68.51 .431  7- 7 1.000 68.51
 35 Binghamton          63.55 .418  4- 9 .5000 127.1
 36 St. John's          59.35 .405  5- 8 .6667 89.03
 37 Hobart              57.41 .400  5- 9 .6000 95.69
 38 Siena               51.16 .379  9- 6 1.429 35.81
 39 Lehigh              50.99 .379  4- 9 .5000 102.0
 40 Holy Cross          43.91 .353  6- 8 .7778 56.45
 41 Vermont             38.36 .330  4-10 .4545 84.40
 42 Quinnipiac          38.08 .329  6- 7 .8750 43.52
 43 Providence          32.08 .302  7- 9 .8000 40.11
 44 Sacred Heart        29.77 .290  4- 8 .5556 53.59
 45 Bellarmine          29.32 .288  3-10 .3636 80.62
 46 Manhattan           27.90 .280  6- 8 .7778 35.87
 47 Air Force           27.89 .280  2-10 .2727 102.3
 48 Canisius            27.04 .275  6- 8 .7778 34.76
 49 Saint Joseph's      22.17 .246  6-12 .5385 41.17
 50 Marist              21.41 .241  6- 9 .7000 30.59
 51 Mount St. Mary's    21.34 .241  4-10 .4545 46.94
 52 Hartford            12.50 .173  2-13 .2143 58.34
 53 Robert Morris       11.33 .163  2- 9 .3000 37.76
 54 Lafayette           7.304 .120  1-12 .1538 47.48
 55 VMI                 3.277 .065  1-11 .1667 19.66
 56 Wagner              1.241 .026  0-15 .0625 19.85

One of the nice things about this method is that it also comes with a probability distribution for the ratings, and so you can estimate the uncertainties in each of them.  I've got the raw numbers for those, but I don't have time to put them into a nice form tonight.  But I have some ideas for cool graphs...

Swampy

[quote jtwcornell91]Okay, so there is an argument that explains how you might not judge from the game results that an undefeated team is automatically the best.  Before the season starts you are in a state of ignorance about the strengths of the the various teams.  (Okay, so you're really not, but to be fair to everyone you should put aside whatever prior expectations you have.)  Every result gives you a little more information, and it could be that if Team A beats Team B and a bunch of cream puffs, you've gained one game's worth of interesting results.  But then Team B goes out, plays a bunch of other teams which also rack up good records against a cross-section of teams, and say they lose to one other strong team, but rack up six wins against tough opponents, plus some array of easy wins that don't tell you too much.  Well, since wins over cream-puffs don't tell you all that much, the information you have to work with is one win by Team A against a tough opponent, and six wins and two losses by Team B against similar competition.  It could be that those eight games tell you more about Team B than the one does about Team A.

Well, the good news is we can make all of this quantitative, since this is exactly what Bayesian statistics tell us to do: start with some prior expectation the likelihood that unknown quantities (in this case teams' Bradley-Terry ratings) take on certain values, and modify those priors based on observational data (game results) to get a posterior probability distribution.

It turns out, if you use what's known as a Jeffreys prior, a uniform probability distribution in the logarithm of each team's BT rating, the maximum of the posterior probability distribution will be the usual set of ratings predicted by KRACH or its equivalent.  But this is problematic, since it lets things run off to infinity, and basically represents the wrong kind of ignorance.  For instance, if we ask the question "what fraction of games do we expect a given team to win against a team with a fixed rating, say 100", the prior probability distribution has infinitely sharp peaks at 0 and 1; basically, there's an infinite amount of room for a team's rating to be arbitrarily higher or lower than 100.

A more well-behaved prior is to pick some reference rating, like 100, and say that a team is a priori equally likely to have any expected head-to-head winning percentage against that team.  This gives a probability distribution in log(BT rating) which is peaked around log(100).  And, as it turns out, when you use this prior and construct the posterior probability distribution considering the results of the games, there is always a single peak at nice finite values of all the ratings, and it's basically the equivalent of a KRACH rating with two "fictitious games" (one win and one loss) for each team against that hypothetical team with a rating of 100.  So we can calculate the maximum likelihood ratings with the usual software, and get (now "W/L" actually means (W+1)/(L+1) and likewise SOS includes the fictitious games)

  # Team                BT    RRWP  W- L "W/L"  SOS
  1 Cornell             3384  .946 13- 0 14.00 241.7
  2 Duke                1889  .910 14- 2 5.000 377.8
  3 Virginia            992.5 .853 12- 3 3.250 305.4
  4 Albany              809.5 .830 14- 2 5.000 161.9
  5 Georgetown          804.7 .830 11- 2 4.000 201.2
  6 Johns Hopkins       661.5 .806  9- 4 2.000 330.8
  7 Princeton           513.8 .773 10- 3 2.750 186.8
  8 Navy                511.7 .773 11- 3 3.000 170.6
  9 Maryland            478.8 .763 10- 5 1.833 261.2
 10 North Carolina      430.9 .748  9- 5 1.667 258.5
 11 Notre Dame          427.3 .747 11- 3 3.000 142.4
 12 UMBC                286.8 .686 10- 5 1.833 156.4
 13 Colgate             273.4 .678 11- 5 2.000 136.7
 14 Delaware            269.8 .676 11- 5 2.000 134.9
 15 Loyola              255.5 .667  7- 5 1.333 191.7
 16 Bucknell            230.9 .650 11- 4 2.400 96.23
 17 Towson              218.0 .640  8- 6 1.286 169.5
 18 Drexel              215.1 .637 11- 5 2.000 107.5
 19 Stony Brook         177.8 .604  8- 5 1.500 118.6
 20 Ohio State          167.9 .594  9- 5 1.667 100.7
 21 Syracuse            150.9 .575  5- 8 .6667 226.4
 22 Yale                143.4 .566  7- 6 1.143 125.5
 23 Rutgers             128.6 .546  6- 6 1.000 128.6
 24 Denver              128.3 .546  9- 7 1.250 102.7
 25 Fairfield           124.8 .540  6- 6 1.000 124.8
 26 Massachusetts       123.6 .539  7- 7 1.000 123.6
 27 Pennsylvania        121.8 .536  6- 7 .8750 139.1
 28 Brown               112.1 .521  7- 7 1.000 112.1
 29 Harvard             106.6 .512  5- 7 .7500 142.1
 30 Dartmouth           91.79 .484  5-10 .5455 168.3
 31 Penn State          88.88 .478  5- 8 .6667 133.3
 32 Hofstra             84.20 .469  6- 8 .7778 108.3
 33 Army                81.36 .462  6- 9 .7000 116.2
 34 Villanova           68.51 .431  7- 7 1.000 68.51
 35 Binghamton          63.55 .418  4- 9 .5000 127.1
 36 St. John's          59.35 .405  5- 8 .6667 89.03
 37 Hobart              57.41 .400  5- 9 .6000 95.69
 38 Siena               51.16 .379  9- 6 1.429 35.81
 39 Lehigh              50.99 .379  4- 9 .5000 102.0
 40 Holy Cross          43.91 .353  6- 8 .7778 56.45
 41 Vermont             38.36 .330  4-10 .4545 84.40
 42 Quinnipiac          38.08 .329  6- 7 .8750 43.52
 43 Providence          32.08 .302  7- 9 .8000 40.11
 44 Sacred Heart        29.77 .290  4- 8 .5556 53.59
 45 Bellarmine          29.32 .288  3-10 .3636 80.62
 46 Manhattan           27.90 .280  6- 8 .7778 35.87
 47 Air Force           27.89 .280  2-10 .2727 102.3
 48 Canisius            27.04 .275  6- 8 .7778 34.76
 49 Saint Joseph's      22.17 .246  6-12 .5385 41.17
 50 Marist              21.41 .241  6- 9 .7000 30.59
 51 Mount St. Mary's    21.34 .241  4-10 .4545 46.94
 52 Hartford            12.50 .173  2-13 .2143 58.34
 53 Robert Morris       11.33 .163  2- 9 .3000 37.76
 54 Lafayette           7.304 .120  1-12 .1538 47.48
 55 VMI                 3.277 .065  1-11 .1667 19.66
 56 Wagner              1.241 .026  0-15 .0625 19.85

One of the nice things about this method is that it also comes with a probability distribution for the ratings, and so you can estimate the uncertainties in each of them.  I've got the raw numbers for those, but I don't have time to put them into a nice form tonight.  But I have some ideas for cool graphs...[/quote]

Yeah, but look at where the selection committee went to school. They'd never understand this! :-)

jeh25

[quote Swampy]

Yeah, but look at where the selection committee went to school. They'd never understand this! :-)[/quote]

Hell. 5 Years ago, I demonstrated (in detail) on laxpower how the committee didn't even follow their own published selection criteria. As memory serves, the committee took Hofstra over Yale despite Yale clearly coming out ahead. Alas, the laxpower archives don't go back that far anymore.

Anyway, the takehome msg is that hockey fans don't realize how lucky they were/are to have a deterministic (if occasionally flawed) NCAA selection criteria than isn't a smokey backroom full of the old boys club. Consider that SU and JHUs storied histories wouldn't be quite so bright without 30+ years of favorable seedings and or outright selection.

Maybe Al can confirm but I seem to remember that back in the 4 team tourney days, Cornell, the defending champion, didn't even get a tourney bid because the committee felt Navy was more deserving in spite of Cornell having a better record.
Cornell '98 '00; Yale 01-03; UConn 03-07; Brown 07-09; Penn State faculty 09-
Work is no longer an excuse to live near an ECACHL team... :(

Al DeFlorio

[quote jeh25]Maybe Al can confirm but I seem to remember that back in the 4 team tourney days, Cornell, the defending champion, didn't even get a tourney bid because the committee felt Navy was more deserving in spite of Cornell having a better record.[/quote]
The only year Cornell wasn't invited to "defend" was 1972:  10-3 overall; 6-0 Ivy; but losses to Navy (12-9), Cortland (14-8), and Hobart (11-10).  The non-Ivy wins were Hofstra, Adelphi, Syracuse (a legitimate cupcake back then) and Fairleigh-Ridiculous.  I'm sure it was the old "strength of schedule" issue, but the losses to non-Ivys hurt (and helped Cortland make the tournament).  Cortland beat Navy in the quarters but was beaten by Virginia--the eventual champ--in the semis.  The 1973 team finished at 8-3, 5-1 Ivy, with losses to Navy, Hopkins, and Brown to open the season.  Non-Ivy wins were Hobart, Syracuse, and Cortland.  The 1974 through 1979 teams all made the tournament.

The tournament started with eight teams in 1971, and has since expanded to twelve and then, very recently, sixteen.
Al DeFlorio '65

jeh25

Ok, so Navy had SOS and H2H (and a southern bias?) while Cornell had WinPct, meaning it isn't quite as egregious as I had remembered. But still to take a .5714 team over the .7692 team when the later is the defending champ seems a little shady. I mean, 9 loses isn't anything to write home about, "quality teams" or not.

Also, I've always thought most fans and many coaches are too quick to overweight the importance of H2H: even a blind squirrel finds a nut once in a while. But of course, that's just the statistician in me showing through - most people don't think about measurement error on a daily basis. ;)

As far as selecting Cortland over Cornell in '72, they split H2H and WinPct - would SOS have been comparable or would Cortland have used the flexibility of a non-Ivy schedule to fit in more "quality" southern teams?
Cornell '98 '00; Yale 01-03; UConn 03-07; Brown 07-09; Penn State faculty 09-
Work is no longer an excuse to live near an ECACHL team... :(

Al DeFlorio

[quote jeh25]I mean, 9 loses isn't anything to write home about, "quality teams" or not.
[/quote]
12-9 was the score of the Cornell-Navy game--not Navy's season record.  Playing 21 lacrosse games in a regular season would probably in itself earn an invitation.;-)
Al DeFlorio '65

Beeeej

[quote jtwcornell91]It turns out, if you use what's known as a Jeffreys prior, a uniform probability distribution in the logarithm of each team's BT rating, the maximum of the posterior probability distribution will be the usual set of ratings predicted by KRACH or its equivalent.[/quote]

Henceforth, on this forum, we will refer to it as a Beeeej's Prior.
Beeeej, Esq.

"Cornell isn't an organization.  It's a loose affiliation of independent fiefdoms united by a common hockey team."
   - Steve Worona

ugarte

[quote Beeeej][quote jtwcornell91]It turns out, if you use what's known as a Jeffreys prior, a uniform probability distribution in the logarithm of each team's BT rating, the maximum of the posterior probability distribution will be the usual set of ratings predicted by KRACH or its equivalent.[/quote]

Henceforth, on this forum, we will refer to it as a Beeeej's Prior.[/quote]
Or Jeffffrey's Prior.

KeithK

[quote jeh25]But still to take a .5714 team over the .7692 team when the later is the defending champ seems a little shady.[/quote]
Why should being the defending champ have any impact on whether you make the tournament or not?  Applying any kind of carryover effect from previous seasons (even if restricted to championships) is exactly the kind of bias we are railing against here.

jeh25

sorry. wasn't clear - yes i agree that a name branding effect is to be avoided. but if you'd gonna throw rational criteria out the window and go with gut feeling, as they did in the bad old days, and only use an ephemeral "quality program" standard, not inviting the defending champion is a crock.

But in anyway, you can ignore my post because 12-9 was the margin of victory, not navy's win record. And this is my last post on the topic since my defense is in <4 weeks.

*john crawls back into his cave*
Cornell '98 '00; Yale 01-03; UConn 03-07; Brown 07-09; Penn State faculty 09-
Work is no longer an excuse to live near an ECACHL team... :(

Hillel Hoffmann

[quote jeh25]And this is my last post on the topic since my defense is in <4 weeks.[/quote]
Holy shit, what terrible timing. If Cornell makes it to... to... I can't say it, but if Cornell makes it far, I expect to see you there anyway.

KeithK

[quote Hillel Hoffmann][quote jeh25]And this is my last post on the topic since my defense is in <4 weeks.[/quote]
Holy shit, what terrible timing. If Cornell makes it to... to... I can't say it, but if Cornell makes it far, I expect to see you there anyway.[/quote]
It's not like defenses are that important anyway.  By the time you get there you know you're going to pass it.  Of course, if the draft isn't done yet it's a different story...

Rita

[quote KeithK][quote Hillel Hoffmann][quote jeh25]And this is my last post on the topic since my defense is in <4 weeks.[/quote]
Holy shit, what terrible timing. If Cornell makes it to... to... I can't say it, but if Cornell makes it far, I expect to see you there anyway.[/quote]
It's not like defenses are that important anyway.  By the time you get there you know you're going to pass it.  Of course, if the draft isn't done yet it's a different story...[/quote]

Yeah, they do not let you schedule the defense unless they are certain you will pass :). Here is some "unsolicited advice"; for the draft that you give your committee, do not get too hung up on formatting issues. They will most likely have changes and edits that they want you to make and you will have to re-format it anyways.

I also think it is some sort of "badge of honor" to only get ~ 15 hr of sleep in the last week of writing your thesis. ::rock::