Wine competitions, judging, and blind luck
That’s the conclusion of Robert Hodgson, a winemaker and statistician whose paper (written with SMU’s Jing Cao) is called “Criteria for Accrediting Expert Wine Judges” and appears in the current issue of The Journal of Wine Economics. It says that those of us who judge wine competitions, including some of the world’s best-known wine experts, are ordinary at best. And most of us aren’t ordinary.
… [M]any judges who fail the test have vast professional experience in the wine industry. This leads to us to question the basic premise that experts are able to provide consistent evaluations in wine competitions and, hence, that wine competitions do not provide reliable recommendations of wine quality.
The report is the culmination of research started at the California State Fair wine competition at the end of the last decade. The competition’s organizers wanted to see if judging was consistent; that is, did the same wine receive the same medal from the same judge if the judge tasted it more than once during the event? The initial results, which showed that there was little consistency, were confirmed in the current study.
More than confirmed, actually. Just two of the 37 judges who worked the competition in 2010, 2011, and 2012 met the study’s criteria to be an expert; that is, that they gave the same wine the same medal (within statistical variation) each time they tasted it. Even more amazing, 17 of the 37 were so inconsistent that their ratings were statistically meaningless. In other words, presented with Picasso’s Guernica, most of the judges would have given a masterpiece of 20th century art three different medals if they saw it three different times.
“This is not a reflection on the judges as people, and I don’t mean that kind of criticism,” says Hodgson. “But the task assigned them as wine judges was beyond their capabilities.”
Which, given the nature of wine competitions, makes more sense than many doubters want to believe. Could the problem be with the system, and not the judges? Is it possible to be consistent when judges taste 100 wines day? Or when they taste flight after flight of something like zinfandel, which is notoriously difficult to judge under the best circumstances?
When I asked him this, Hodgson agreed, but added: “But we don’t see an alternative. But it is an inherent problem. You just want to see the competitions give the judges sufficient time to do it.”
Perhaps. But my experience, after a decade of judging regularly, is that the results seem better (allowing for this um-mathematical approach) when I judge fewer wines. That means that the competition is smaller, or that the organizers have hired more judges. Maybe that’s where the next line of study should go, determining if judging fewer wines leads to more consistent results.