3/10 Polls

Started by rhovorka, March 10, 2003, 04:23:18 PM

Previous topic - Next topic

gwm3

I agree that rampant subjectivity is certainly not something we want to inject into the system.  However, I think there are certain circumstances in which computers just can't comprehend all of the subtleties that go into determining what makes one team better than another.  Therefore, I think that the committee should be allowed to interpose some human judgment at the margins (I actually think they might already be able to do this to a certain extent, although the seeding criteria are not terribly clear).

I did not originally mean to advocate some sweeping system in which the only factor guiding seeding decisions was who was out with a tummy ache when.  I merely suggested that if we are going to allow some degree of subjectivity, a factor like injuries might be something to consider.  My more elaborate hypotheticals were not intended to advocate a fully discretionary system, but mainly to clarify and defend the internal logic of some of the statements I had made previously.

ugarte

There is ultimately no cognizable difference between your thought experiment and "who had a tummy ache when."  It can't be practically imposed without every other short-term flaw in a team being used as a back-room rationale for rank switching.

If you want to start with the premise that "we allow some subjectivity", then I agree that injuries can be part of the analysis.  I do not, however, think that it is the committee's role to try and predict the winners of the tournament.  It is to choose and reward teams that have had the best season, not the best March.   In any subjective analysis of the effect of injuries, I would propose that the  committee should not concern themselves with WHEN any losses due to injury occurred, but could give weight to THE FACT of losses due to an injury.


(How do you determine that a loss was "due to" a missing player anyway?  By way of example, we certainly didn't lose in Estero because LeNeveu was at the WJC.)


gwm3

[Q]In any subjective analysis of the effect of injuries, I would propose that the committee should not concern themselves with WHEN any losses due to injury occurred, but could give weight to THE FACT of losses due to an injury.[/Q]

This was exactly the point of including teams A and C in my hypo.  They would be treated the same regardless of at what point in the season their losses occurred.


In my defense, having now laid on the table what is admittedly a seriously flawed scheme, may I refer everyone back to my posts that started this mess.  These remain my points -- everything that followed was in defense of their internal consistency:


[Q]I do think that any ranking system should give an indication of how good a team is right now.

Consider the following scenario: Team A wins its first 25 games, then has two of its star players go down with season ending-injuries and loses its last 5 games. Team B loses its first 5 games while waiting for its star goalie to come back from an offseason injury, then goes on to win out its remaining 25 games. Which of these teams deserves a higher seed going in to the tournament? I would have to say it's team B, who is playing well now, and is going to be a much harder team to beat.[/Q]

[Q]The fact that star players were injured wasn't really central to my point. I included it to illustrate an extreme case where it is clear that a team is not as good going into the tournament as it was earlier in the season. Of course, I would apply the same logic to a team tanking down the stretch for any reason.

I do think, however, that seeding for the NCAA's should probably include some human discretion to account for known circumstances that the computer rankings don't capture (like injured players, etc.).[/Q]


In sum: (1) I think the end of the season should be given greater weight in the objective computer rankings (whether through last 16 or some new criteria), and (2) I think that a small degree of subjectivity might be acceptable at the margins because computers aren't perfect.


The point that the seedings should not reflect who is most likely to win the tournament is duly noted... I just don't happen to agree with it.



Post Edited (03-11-03 01:10)

jtwcornell91

BigRed Apple wrote:

> It is possible that the insular end-of-season conference
> schedules would not permit enough interconference play for
> KRACH to resolve itself or at least to make it a useful tool.

The way we dealt with this in KPWR was to use the KRACH ratings calculated with the full season's results as a measure of the strength of the opposition you played in your last 16 games.


jtwcornell91

DeltaOne81 '03 wrote:

> The first proposal would be to RPI-ify or KRACH-ify Last 16. In
> order words, how well you did, weighted for SoS, over your last
> 16 games. But that wouldn't be entirely fair considering
> everyone plays more conference games near the end. Is it really
> fair to hurt us bc we have to play ECAC teams at the end, even
> if we schedule some good opponents in the first stretch of
> games?

But if our opposition is easier in the last 16 games, we would be expected to win more of those games; the two ought two cancel each other out, allowing everyone to be judged fairly.

> The only thing I could think might be fair is some kinda
> Last 16 versus how you shoulda done against those teams. In
> other words, have you fallen down or risen up, and it should
> probably only be a tiebreaking criteria, but the math is beyond
> me at this time of night :).

Here's what we did in KPWR: given the KRACH ratings of your opponents in your last 16 games, what KRACH would you need to have to be expected to win exactly the number of games you actually won out of those 16?  That seems not entirely unlike what you're describing.  For more details, see http://slack.net/~whelan/tbrw/tbrw.cgi?kpairwise


jtwcornell91

Graham Meli '02 wrote:

> I think there are
> certain circumstances in which computers just can't comprehend
> all of the subtleties that go into determining what makes one
> team better than another.

If I see the word "computer" thrown out one more time to disparage objective consideration of a team's results, I may  ::yark:: .  What these unthinking computers do that you seem to have a problem with is look just at the actual outcomes of the games played.  The last time I checked, the way that teams earned the right to play in the postseason was by winning games, not by having a talented team that would have won more games if not for inopportune injuries (or bad calls, or bad bounces, or any other reason you can come up with why a team shouldn't really have lost a particular game).  Should we somehow get credit for the games in Florida because we were playing without our star goaltender?  No, we lost those games, and that's how they go in the books.

The inherent problem with the human element/common sense/good judgement is that, even if the people involved are not overtly biased, the conclusions they come to will depend on the humans doing the judging.  Then a team's playoff fate is not determined by how they performed on the ice (which is the only thing the "computer rankings" are taking into consideration) but the makeup of the committee.  As far as I'm concerned, the less of that we have, the better.


Shorts

DeltaOne81 said
[Q]The only thing I could think might be fair is some kinda Last 16 versus how you shoulda done against those teams.[/Q]

In fact, such a system does exist.  The most widely used ratings system for chess (and some other games) is based more or less on (pardon my mathematical imprecision, and perhaps inaccuracy):

new rating = old rating + C (W - We)

Where C is a weighting factor corresponding to the importance of the game (if you think all games should be equally weighted, this number could be the same for each game).

W = {1 for a win, .5 for a tie, 0 for a loss}

We = the probability, using an established table, that you "should have" won the game, based on the difference between the ratings of you and your opponent at the beginning of the game.  

Pros:  Like the KRACH system (and unlike RPI), this automatically takes into account the difficulty of the opponents you've played, based on their rating.  You never have to worry about your rating going down for beating a weak team.  Unlike KRACH, it takes more heavily into account games that have been played more recently, without having the arbitrary threshold of Last16.

Cons:  I don't think this system would be as good as the KRACH ratings at balancing out insular schedules (a persistent problem with RPI).  However, the weighting factor could be controlled to make interconference games more important.  Single-elimination tournaments would probably throw this system for a loop (although that also happened with L16).  A problem this system shares with KRACH is that tends to predict that very good teams will almost always beat very bad teams.
While the average chess player could probably put up a good-faith effort against a grand master and lose hundreds of times in a row, I strongly doubt that (using KRACH's prediction), in a prolonged series against Rensselaer, Cornell would win roughly 17 games for each game it lost.  Or that, in a Colorado vs. Mercyhurst game (for example, in the first round of the NCAA tourney this year), Colorado would be 65 times as likely as Mercyhurst to win.  If Mercyhurst really stands only a 1.5% chance of winning such a game, then the current system of auto-bids is little more than a formality, or a scam to get MAAC fans to buy tournament tickets.  But I think that hockey has enough random factors (due in part to low scoring compared to, say, basketball) like injuries and penalties that the probability of random, crazy stuff carrying Mercyhurst to victory would at least 1%.  Going even further down, if Iona (which is only 1.5 games off of .500 in conference play) were to win their next 3 games (which KRACH suggests is not all that implausible), and face CC in a first round game, CC would be 215 times as likely to win.  Think back over the last seven seasons of play for Cornell (or any other single team)--certainly there's been more than one fluke game.

Sorry for getting into a rant over KRACH (I really do like that system), but the point is that the chess ratings system actually makes the same sorts of claims.  Obviously, I don't think that hockey should actually take up this ratings system.  I just think that it's an interesting thing to look at, and a fun way to pass some time after a bye weekend.


Adam

I'm in favor of accounting for the last several games of the season (not sure that 16 is the proper number, but that is a mere detail).

Time is just as powerful a variable as any other you might use to rank teams.  Over the course of a season, things happen.  Injuries have been mentioned so far.  But a laundry list of things can really CHANGE a team over the course of the season.  Coaches get fired, off-ice distractions creep in, pressure to win builds, etc etc.

It seem perfectly reasonable to add a weight to end of season and/or playoff games.  Just ask yourself who is more likely to win the tournament, the 25 win team that lost its last 8 games straight or the 20 win team who won its last 15 games.  

In the scenario above, by NOT weighing late season games/playoff games, you'd likely give a lower seed to the 20 win team (assuming all other variables being equal for sake of this analysis).  That's, for one, unfair to the OPPONENT of the 20 win team.  Because everyone knows that they are a better team than the 25 win team AT THAT POINT IN TIME.

The NCAA tournament should be seeded to reflect accurately how good each team is ON THE DAY THE GAME IS PLAYED.  This is the only way to ensure the equity of the match-ups.

President, Beef-N-Cheese Academic Society 1998-2001

jd212

Just what we need. Let the seeding committee use subjectivity.  I can already hear the comments: They didn't pick us because they don't like us. Or, they picked them b/c they know their coach. Or any amount of permutations from the above. Right, just what we need. Since when does the selection committee have the right to decide subjectively? You think humans are less fallible than computers? People will find an excuse to complain regardless. No matter what happens, the selection process will never be "perfect." Injuries have nothing to do with the quality of a team. Hence, that is why it is a team. As a matter of fact, the longer a team is out with a star player, the more they should be able to adapt without him. And if they only win with him, then they aren't a very good team anyway, and they probably won't win the championship.


gwm3

I promise this is the last thing I'll say about this, as the dead horse indicator on my desk has begun to flash:

Whether calculated by computers, PhD's with slide rules, or trained chimps, all purely "objective" ranking systems were originally created by humans who had to decide what factors to include the formulas.  At some point some people made the subjective choice that OOper is relevant to deciding what teams are good and, say, goalie's save percentage is not.  Therefore, any objective ranking system can be criticized as not properly capturing all of the correct factors.  Save percentage may be a ridiculous hypo, but there certainly has been some reasonable debate here about whether recent games ought to be treated differently than early season games.  That they are not is ultimately a human choice.  

I have tried to argue, rather unsuccessfully, that there may be a whole range of factors that one might consider relevant in assessing the quality of a team that are not incorporated in the objective rankings.  Injuries were just one possible example I posited, but are by no means the only, or most important, one.  If we agree that a certain factor is relevant to determining how good a team is, there are two possibilities -- add it to the formula that computes rankings, or allow the committee to consider it "subjectively."  Due to the immense difficulty of the former, I don't think the latter is always inappropriate.  It might not be "fair," but some may argue that neither is a system that sometimes punishes teams simply for beating the weaker teams on their schedule.


jeh25

Graham Meli '02 wrote:

> I don't know if playoff games should necessarily be given
> greater weight, but I do think that any ranking system should
> give an indication of how good a team is right now.  

So it sounds to me like you would support a strength of schedule adjusted L16 factor in the PWR?

Personally, I really liked the L16 factor as I thought it did a pretty good job of giving credit to teams that got hot when it mattered.  Sure, the lack of strength of schedule adjustment was a problem, but to drop the whole term was throwing the baby out with the bathwater in my estimation.

Cornell '98 '00; Yale 01-03; UConn 03-07; Brown 07-09; Penn State faculty 09-
Work is no longer an excuse to live near an ECACHL team... :(

Greg Berge

Ultimately, the thing that makes any deterministic system better than any subjective system is the determinism itself.  Every team knows exactly what it needs to do to get in.  If it does that which is logically necessary, it gets in.  Don't underestimate the importance of that.  Of course we all will have our own pet ideas of what constitute "fair" or "improved" criteria (mine are below), but those are hopelessly parochial and a matter of arguing at the margins.  What does have significant meaning is knowing ahead of time that if you do what you need to you can't be screwed.  That immediately transfers the onus of outcome from the criteria to the performance, where it belongs.

Tournament games already matter *far* more than RS games.  They are only "ho-hum" if you lose.  Solution: win. ;-)

L16 tried to capture something of value though the method was flawed.  Bathwater problems: ignored s.o.s. and the 16 game cutoff was arbitrary.  Ways to save the baby: progressively discount games as they get farther "back" on the schedule; factor in s.o.s. along with recency.  Note that the current games vs TUC criterion has the same arbitrary cutoff problem.

The problem with ELO (the chess system mentioned above) is that it doesn't handle small numbers of results well at all.  Classic case: I started out with an ELO of about 1750 because I was playing all my games against 1200-1500 players.  Only when I got into the hundreds of results did my ELO fall into the real range for my skill, because by then I was playing opponents from all over the spectrum.  ELO itself recognizes this: one's early rating is determined by a completely different algorithm that is similar to RPI.  But this only works because at least the ratings of one's early opponents are usually "real" -- i.e., based on a significant number of outcomes.  This just isn't appropriate for the hockey schedule, where everybody starts from scratch every Fall.

Oh, ELO does do something interesting, however -- the weight of your result against a team never changes after it is assigned.   That means if you beat North Dakota early in the season when they were ranked high, you keep your mucho points for that even after North Dakota goes in the sewer later in the year.  If, OTOH, you beat them later in the year along with everybody else, you get much less for your pains.  There's something attractive in that, although of course the flaw can be seen from a mile away: what if North Dakota only looked good early because they had Canisius in a ball gag and leash?  What's workable in a world which compares opponents with hundreds of results would lead to weird assymetries in hockey when everybody is bootstrapping early.



Post Edited (03-11-03 12:10)

jtwcornell91

Greg wrote:
> What's
> workable in a world which compares opponents with hundreds of
> results would lead to weird assymetries in hockey when
> everybody is bootstrapping early.

Conversely, you'd have trouble applying straight Bradley-Terry to world chess rankings, since you'd have to recalculate everything every time a game was played.

Should I summon Ken Butler to this discussion as well?
;-)


jtwcornell91

If you look at some of the KPWRs from past seasons, you'll also see that the L16 criterion KRACH is not terribly correlated with the overall KRACH, so you really can still reward teams that went on a tear late, even if they faced weaker opposition.  I.e., playing your last 16 games against ECAC teams doesn't hurt you that mich if you go 14-1-1 in those games.



Post Edited (03-11-03 12:40)

jtwcornell91

Graham Meli '02 wrote:

> all purely "objective" ranking systems were
> originally created by humans who had to decide what factors to
> include the formulas.

But they decided before the season, when it wasn't known which teams would directly benefit from those choices.  That's the main reason why many of us prefer objective criteria.

> It might not be "fair," but some may argue that
> neither is a system that sometimes punishes teams simply for
> beating the weaker teams on their schedule.

Which is why we're calling for RPI to be dropped in favor of a system that does a better job of accounting for strength of schedule.