THINKING, FAST AND SLOW, ABOUT BASKETBALL

I recently wrote a post over on the Wages of Wins Journal about how I believe the book Thinking, Fast and Slow is chalk-full of descriptions of cognitive illusions that all basketball analysts (whether they are paid to do it or not) fall prey to. One of Kahneman's more famous illusions is the illusion of validity; the fact that people have a huge amount of confidence in their own judgment, even in the face of clear evidence that their judgment is wrong:

The confidence we experience as we make a judgment is not a reasoned evaluation of the probability that it is right. Confidence is a feeling, one determined mostly by the coherence of the story and by the ease with which it comes to mind, even when the evidence for the story is sparse and unreliable. The bias toward coherence favors overconfidence. An individual who expresses high confidence probably has a good story, which may or may not be true.

I coined the term “illusion of validity” because the confidence we had in judgments about individual soldiers was not affected by a statistical fact we knew to be true — that our predictions were unrelated to the truth. This is not an isolated observation. When a compelling impression of a particular event clashes with general knowledge, the impression commonly prevails. And this goes for you, too. The confidence you will experience in your future judgments will not be diminished by what you just read, even if you believe every word.

I encourage you to read the whole thing, it is full of examples of the illusion in action.

In basketball, one statistic that I believe illustrates the Illusion of Validity is the very popular plus/minus statistic. I'm convinced that +/- is precisely so popular because everyone kind of intuitively knows that it is meaningless. Whenever a person is convinced that a performance was great (terrible), a +/- number is dragged out and offered as evidence. "Look, this +/- number re-inforces my belief!" When the +/- number doesn't conform to the story, it is (conveniently) dismissed as a statistic that is meaningless, or which "requires large samples" (quick, name a stat that doesn't require large samples to converge!).

Here's the dirty truth about +/-: it really is meaningless. Why is it meaningless? Because it is horribly inconsistent over time. Even its proponents seem to agree:

Returning to Davis, this is a crucial part of the discussion. To me, the most obvious explanation for Davis' relatively poor net defensive plus-minus is the small sample size. At the time SI.com's Luke Winn wrote about this statistic, Davis had been off the court for 202 possessions--less than three games' worth. That's just not nearly enough time to make meaningful declarations. Even the entire NBA regular season, nearly three times a long, is not sufficient for the noisiness in plus-minus to filter out. There are plenty of examples of players' net plus-minus ratings bouncing wildly from year to year.

This doesn't invalidate plus-minus statistics. It merely means they must be used with more caution than individual numbers. To be clear, Winn didn't use plus-minus to say anything negative about Davis. He merely made note of the plus-minus numbers in the process of pointing out how effective Eloy Vargas has been defensively as Davis' backup.

It surprises me that Mr. Pelton can look at a number that "bounces wildly from year to year" and, a few sentences later, use it evaluate a player:

In terms of individual statistics, Collison doesn't impress. Because he uses so few possessions on offense and rarely blocks shots, Collison rates worse than replacement in WARP and little better in PERBasketball-Reference.com's Win Shares provide a superior estimate of Collison's value but still put him barely better than average.

Meanwhile, Collison's net plus-minus of +11.1 last season ranked eighth in the league, per BasketballValue.com. Every player ahead of him was an All-Star. The year before, the Thunder was 9.4 points better per 100 possessions with Collison on the floor.

Again, this surprises me (but it shouldn't!). If a measurement is horribly inconsistent over time, there are two possibitities:

  1. Whatever you are measuring is itself wildly inconsistent over time.
  2. You are not measuring what you think you are measuring.

Plus/Minus varies wildly over time. So we need to consider two possibilities:

  1. Basketball performances is very inconsistent and fluctuates wildly over time
  2. plus/minus is not measuring basketball performance.

I would postulate that 1) is not possible. Almost all aspects of basketball performance at both team and individual levels are pretty consistent over time. You can see the math in The Wages of Wins, but truthfully anyone with access to an Excel sheet and the internet can do the math themselves: raw box score stats in any given year correlates well with the same stat in the previous year. So, when we look at the choice above, any reasonable scientist should apply Occam's Razor and conclude that plus/minus isn't actually measuring basketball performance. 

So, apparently it only measures something useful when you want it to. It's the illusion of validity in action! As we've argued many times, the box-score contains lots of information. In other words, the box-score is fine, you're just doin' it wrong.

I was recently talking with the Wages of Wins author David Berri about plus/minus:

Recently I was updating a model that I had presented at a meeting.  The model was based on more than 1,000 observations, and I was adding another years worth of data.  In the process of adding the data, I miscoded a few observations (less than 10).  When I re-estimated the model -- with the miscoded data -- the results I had seen earlier went from statistically significant to insignificant.  Once I fixed the problem, the results became statistically significant. 

This experience highlights a problem researchers often face.  Small changes in a data set can dramatically impact a result.  That is why we a) check our data b) re-estimate our models with different independent variables and across different data sets (i.e. conduct robustness checks), and c) report our findings in the following fashion: "the data suggests the following..."

In other words, in the social sciences we avoid saying we "proved" something.

When I look at the adjusted plus-minus work, I fail to see these kind of efforts. The specific model is not often reported. And we see no effort at any kind of robustness checks.  Furthermore, the nature of the model -- regressing small segments of a game on essentially some dummy variables -- suggests that the results are never going to be definitive. This is because all the factors that can impact outcomes in these small segments are not controlled for in the model.

All of this indicates that the results from this research are unreliable. What is interesting, though, is that even when people acknowledge the lack of reliability, they still quote the results (while noting they are unreliable).  And that leads one to wonder, how do you know when something that is unreliable can be relied upon?

That last paragraph is the Illusion of Validity in action. If it supports what I believe to be true, then it must be meaningful!

Nerd_numbers_normal

Andres Alvarez

Great post Patrick! Oddly one thing I see a lot in discussing sports with people that use stats but don't understand them is this "I know X isn't a perfect stat but you have to admit it's impressive that so and so has such a high/low number in it" It's akin to going to a Casino and saying "I know that all of these games are based on probability and when you win it's luck. Still that guy over there keeps winning so it must be skill!" The basic trick I see is if there is a number that supports your claim it is ok to use. It's just very funny it is often juxtaposed next to an admission it doesn't mean anything.

491 days ago

Shutterstock_10276351_basketball_mind_normal

EvanZ

Patrick, if there is a system (such as RAPM) that is a better predictor of game outcomes than WP, how is it that someone can call that system less "reliable"? Indeed, what is the definition of "reliable", if it does not concern the ability to predict game outcomes? Is it more reliable because it causes fewer computer crashes or something? It's odd that predicting future game outcomes is apparently not that important, but Berri makes a big deal out of "predicting" past game outcomes (95% correlation to winning...in the past!). Please help me understand.

490 days ago

Wsu_5__normal

James

http://en.wikipedia.org/wiki/Reliability_%28statistics%29

490 days ago

Shutterstock_10276351_basketball_mind_normal

EvanZ

Is a stat "reliable" simply because it correlates with itself? Height correlates very well from year -to-year, but it's probably not a "reliable" metric for predicting wins.

489 days ago

Screen_shot_2011-11-27_at_1.04.29_am_normal

Alex D.

If someone believes a +/- stat "is based on probability and when you win it's luck" or "kind of intuitively knows that it is meaningless" and then uses flukes to establish a player's skill level, that's pretty transparently intellectually dishonest. On the other hand, what I think happens more frequently is that people believe (just like this network) that +/- is fraught with random noise. And yet, few people believe that it's a total crap shoot, and so even if someone believes that +/- evens out over time, players that consistently overperform or underperform +/- expectations or whose statistics are structured so differently from what you'd expect from a crap-shoot so as to provide tangible evidence against pure chance and noise, then those players (perhaps exceptions, perhaps flukes, perhaps evidence of a systematic flaw in box score measurements) should be examined more thoroughly to see if the center of their statistics is truly zero +/- an error term, and if their spread is within reason for a model built on pure chance. This is the idea behind two well-established methodologies in statistics: likelihood function and Bayesian updating. A coin that comes up fluky for a run is not necessarily evidence of a bias. However, some events defy plausibility in ways that can be made mathematically rigorous - a coin that is consistently leaning towards one or the other extreme beyond a few std. normal deviation or whose controlled conditions favor, say, one of the agents betting on it - and to hold to the initial line of chance may be intellectually dishonest or statistically absurd. The example of a casino is instructive: If someone beats the house consistently at a game ostensibly about luck, then the casino may feel (and make quite rigorously the proposition) that the player has an advantage or is using skill, and ban that person as a matter of course. Then again, I don't really know stats, so I'm kind of just feeling these ideas out. Let me know if I'm wrong.

489 days ago

Nba-geek-avatar

Patrick Minton

Even, Height correlates well with itself year over year. This is why height is a good way to measure "tallness". RAPM does not correlate well with either itself OR with team wins, yet team wins correlates well with itself year-over-year. This is why RAPM is a terrible way to measure "wins".

489 days ago

Nba-geek-avatar

Patrick Minton

Alex, What you say is in principle correct. Yet +/- and RAPM actually fail the very measures that you are citing. Neither measure "consistently leans toward one side" and neither measure controls for external variables. If it is not consistent over time but the thing that it purports to measure is, than what is it measuring?

489 days ago

Shutterstock_10276351_basketball_mind_normal

EvanZ

Patrick, can you explain why RAPM is a better predictor of future wins and point differential than WP? I'm not catching that part of the explanation from you.

489 days ago

Crop_of_my_face_normal

Jevan

Berri's notion that APM is unreliable is using year-to year data, when even APM proponents will say you need at least two years to get enough data. It's not that APM is unreliable. It IS RELIABLE over many years, or using past years as priors to the current year. Why does Lebron James consistently have an RAPM over +5 the last six years if this stat is so unreliable? The idea that APM is unreliable is quite simply a lie.

476 days ago

Crop_of_my_face_normal

Jevan

Top 10 players in 2011 according to RAPM Dirk,Manu,KG,LBJ,Collison,Nash,Paul,Howard,Wade,Deng Top 10 players in 2010 according to RAPM Lebron,Wade,Nash,Kobe,Dwight,Collison,Bogut,KG,Bosh,Deron 6 of the top ten players were the same Wins produced top ten in 2011 Howard, Paul, Love, Lebron, Nash, Wade, Gasol, Fields, Rondo, Allen According to Wins Produced the top ten in 2010 LBJ,Howard,Kidd,Wallace,Rondo,Camby,Wade,Durant,Nash,Gasol Leaving 7 of the top ten the same. Doesn't seem like RAPM is much less reliable then WP

476 days ago


You must be signed in to leave a comment.