I recently wrote a post over on the Wages of Wins Journal about how I believe the book Thinking, Fast and Slow is chalk-full of descriptions of cognitive illusions that all basketball analysts (whether they are paid to do it or not) fall prey to. One of Kahneman's more famous illusions is the illusion of validity; the fact that people have a huge amount of confidence in their own judgment, even in the face of clear evidence that their judgment is wrong:
The confidence we experience as we make a judgment is not a reasoned evaluation of the probability that it is right. Confidence is a feeling, one determined mostly by the coherence of the story and by the ease with which it comes to mind, even when the evidence for the story is sparse and unreliable. The bias toward coherence favors overconfidence. An individual who expresses high confidence probably has a good story, which may or may not be true.
I coined the term “illusion of validity” because the confidence we had in judgments about individual soldiers was not affected by a statistical fact we knew to be true — that our predictions were unrelated to the truth. This is not an isolated observation. When a compelling impression of a particular event clashes with general knowledge, the impression commonly prevails. And this goes for you, too. The confidence you will experience in your future judgments will not be diminished by what you just read, even if you believe every word.
I encourage you to read the whole thing, it is full of examples of the illusion in action.
In basketball, one statistic that I believe illustrates the Illusion of Validity is the very popular plus/minus statistic. I'm convinced that +/- is precisely so popular because everyone kind of intuitively knows that it is meaningless. Whenever a person is convinced that a performance was great (terrible), a +/- number is dragged out and offered as evidence. "Look, this +/- number re-inforces my belief!" When the +/- number doesn't conform to the story, it is (conveniently) dismissed as a statistic that is meaningless, or which "requires large samples" (quick, name a stat that doesn't require large samples to converge!).
Here's the dirty truth about +/-: it really is meaningless. Why is it meaningless? Because it is horribly inconsistent over time. Even its proponents seem to agree:
Returning to Davis, this is a crucial part of the discussion. To me, the most obvious explanation for Davis' relatively poor net defensive plus-minus is the small sample size. At the time SI.com's Luke Winn wrote about this statistic, Davis had been off the court for 202 possessions--less than three games' worth. That's just not nearly enough time to make meaningful declarations. Even the entire NBA regular season, nearly three times a long, is not sufficient for the noisiness in plus-minus to filter out. There are plenty of examples of players' net plus-minus ratings bouncing wildly from year to year.
This doesn't invalidate plus-minus statistics. It merely means they must be used with more caution than individual numbers. To be clear, Winn didn't use plus-minus to say anything negative about Davis. He merely made note of the plus-minus numbers in the process of pointing out how effective Eloy Vargas has been defensively as Davis' backup.
It surprises me that Mr. Pelton can look at a number that "bounces wildly from year to year" and, a few sentences later, use it evaluate a player:
In terms of individual statistics, Collison doesn't impress. Because he uses so few possessions on offense and rarely blocks shots, Collison rates worse than replacement in WARP and little better in PER. Basketball-Reference.com's Win Shares provide a superior estimate of Collison's value but still put him barely better than average.
Meanwhile, Collison's net plus-minus of +11.1 last season ranked eighth in the league, per BasketballValue.com. Every player ahead of him was an All-Star. The year before, the Thunder was 9.4 points better per 100 possessions with Collison on the floor.
Again, this surprises me (but it shouldn't!). If a measurement is horribly inconsistent over time, there are two possibitities:
- Whatever you are measuring is itself wildly inconsistent over time.
- You are not measuring what you think you are measuring.
Plus/Minus varies wildly over time. So we need to consider two possibilities:
- Basketball performances is very inconsistent and fluctuates wildly over time
- plus/minus is not measuring basketball performance.
I would postulate that 1) is not possible. Almost all aspects of basketball performance at both team and individual levels are pretty consistent over time. You can see the math in The Wages of Wins, but truthfully anyone with access to an Excel sheet and the internet can do the math themselves: raw box score stats in any given year correlates well with the same stat in the previous year. So, when we look at the choice above, any reasonable scientist should apply Occam's Razor and conclude that plus/minus isn't actually measuring basketball performance.
So, apparently it only measures something useful when you want it to. It's the illusion of validity in action! As we've argued many times, the box-score contains lots of information. In other words, the box-score is fine, you're just doin' it wrong.
I was recently talking with the Wages of Wins author David Berri about plus/minus:
Recently I was updating a model that I had presented at a meeting. The model was based on more than 1,000 observations, and I was adding another years worth of data. In the process of adding the data, I miscoded a few observations (less than 10). When I re-estimated the model -- with the miscoded data -- the results I had seen earlier went from statistically significant to insignificant. Once I fixed the problem, the results became statistically significant.
This experience highlights a problem researchers often face. Small changes in a data set can dramatically impact a result. That is why we a) check our data b) re-estimate our models with different independent variables and across different data sets (i.e. conduct robustness checks), and c) report our findings in the following fashion: "the data suggests the following..."
In other words, in the social sciences we avoid saying we "proved" something.
When I look at the adjusted plus-minus work, I fail to see these kind of efforts. The specific model is not often reported. And we see no effort at any kind of robustness checks. Furthermore, the nature of the model -- regressing small segments of a game on essentially some dummy variables -- suggests that the results are never going to be definitive. This is because all the factors that can impact outcomes in these small segments are not controlled for in the model.
All of this indicates that the results from this research are unreliable. What is interesting, though, is that even when people acknowledge the lack of reliability, they still quote the results (while noting they are unreliable). And that leads one to wonder, how do you know when something that is unreliable can be relied upon?
That last paragraph is the Illusion of Validity in action. If it supports what I believe to be true, then it must be meaningful!