Untappd and The Wisdom of Crowds

Ask a brewer what he or she thinks of Untappd and the response you’ll get is some cocktail of groaning, sighing, head-shaking, eye-rolling, and some compulsive twitching. For beer drinkers, it’s a magnificent tool to enhance and assist with their quest to enjoy beer. For brewers, it’s often a source of anxiety and an object of resentment. And by “brewers”, I mean me.

For those unfamiliar with the product who are reading a brewing blog (my parents, if they’re still reading), Untappd is a social media application that has become ubiquitous in the modern beer world. Drinking a beer (even sharing a taster glass with someone) entitles you to “check-in” the beer. The check-in includes any details you want to provide – a photo, a description, and a star rating from 0.25 to 5. Through these check-ins, you develop a profile that displays your conquests; and the beer develops a profile, showing who’s been drinking it and what they’ve thought of it.

The ubiquity of Untappd can hardly be overstated. Untappd reported over 37 million check-ins in 2017. Our little brewery alone has over 11,000 check-ins since we opened in November, 2016. No doubt, the killer feature that has driven Untappd to this level is its Badge system. Check-in 5 different beers from the U.S. and get a “Land of the Free” badge. Check-in 5 more and get “Level 2” of that badge. Check-in 5 pale ales and get “Pale as the Moon”. Check-in 5 beers with photos and get “Photogenic Brew”. And so on. In my short Untappd career of 71 check-ins, I earned 59 badges. (And I’m a chump – proper beer drinkers typically have over 5,000 check-ins and 2,000 badges.) The badge feature takes this otherwise boring cataloguing exercise and turns it into a hunting game, complete with the serotonin-boosting status accumulation that motivates people to collect pokemon and participate in airline loyalty programs with unconditional, shameless devotion.

The feeling of improved status with every sip of beer is motivation enough to use the app, but the app also serves the very real purpose of helping the beer drinker make sense of an insanely huge plethora of beer options. Let’s say I’ve got a couple hours to kill today and I want to go enjoy a beer. Within a 10 minute Uber ride, I have access to over 100 establishments to buy a beer. Among them, there are probably over 500 unique beers available.

Now hold on a second! Let’s stop and admire how freaking amazing it is that this is a reality. How many people in the history of man have ever had this much variety at their fingertips? And this is a cheap luxury – the cost of less than a couple hours’ work for most Americans. If that’s not breathtaking, then neither is the Grand Canyon.

So, anyway, how do we make sense of this sea of options? What’s available, what’s good, what are people drinking, what haven’t I had before? Untappd is an extremely useful resource to answer these questions.

But is it useful to a brewer?

It should be. There’s a rich amount of data within these check-ins that should serve as feedback to the brewer. Is the beer good? Is it popular? What do people like or dislike about it? Where is it being enjoyed?

On our opening night, we were anxious to see our Untappd reviews. What a neat way to get feedback on the product we’ve put our heart and soul into. As soon as we got home, we started scrolling through the reviews and were shocked. Lots of glowing reviews, but some lousy ones, too. 2- and 3-star reviews for our pilsner? But it’s so clean and delicious – what more could you ask for? “A parade of 2-star beers”? These beers are awesome! Brewers, beer buyers, and beer writers have told us so! Were they just lying to us to be polite?

It didn’t take long to talk ourselves down from the ledge. Browsing around other breweries’ reviews revealed that it’s not just us. My shock at our Dollar Pils Y’all’s rating of 3.5 dissipated when I saw Real Ale’s Hans Pils (one of my all-time favorite beers) is also 3.5.

Commiserating with other brewers was comforting. A veteran brewer shared his philosophy on feedback: “The only feedback I care about comes from the cash register.” This phrase stuck with me. At the register, people back their preferences with money. They have skin in the game. And, whether selling beer is our objective or a necessary condition for survival, it’s vitally important. Would we rather have a portfolio of 5-star beers that don’t sell or 3-star beers that do?

It turns out that, for us, Untappd ratings are not a good predictor of sales. We instinctively know this to be the case, but just to put some numbers to it, we took a recent week’s tap room sales and compared them to each beer’s Untappd rating. Below are the 12 beers we had on tap over a weekend in February, plotting their sales volume against their Untappd rating.

Untappd rating vs sales

That R2 value of 0.14% means that Untappd ratings explain only 0.14% of the differences in sales among our beers. The remaining 99.86% is a mystery! To be fair, that point on the top-left of the chart is our pils. In spite of its low rating, it’s consistently our best seller. If we remove it from the group, the R2 jumps to about 50%. So I’ll concede that Untappd has some predictive power for sales, except for best-selling beers. I’m not sure that’s much use.

Browse around Untappd and it doesn’t take long to understand that a beer’s rating is heavily style-dependent. High ABV beers, rare beers, barrel-aged beers, sour beers, double-quintupple hazy dry-hopped beers will tend to have significantly higher ratings than pilsners, hefeweizens, and ESB’s, no matter how clean or technically sound any of those beers may be.

But, still, given that the styles are known, shouldn’t we be able to remove the style impact and decipher the quality of a beer from its Untappd rating? There’s a famous experiment by Francis Galton in England in 1907 where non-expert individuals were asked to estimate the weight of an ox by looking at it. 787 individual estimates were tallied, and their average was 1,197 pounds. The weight of the ox was… wait for it… 1,197 pounds. Untappd ratings may be a similar phenomenon: the individual ratings are meaningless, but a beer’s average rating, especially when comparing it to that of similar beers, should be an indication of its quality. For example, if I want to know whether my saison is good, I should be able to learn this from comparing its average rating to other saisons, right?

A little experiment of my own

One way to test whether Untappd ratings can indicate beer quality is to compare them to competition results. For example, The Great American Beer Festival enlists a rigorous process to judge beer quality among its 8000+ entries each year and award medals for the highest quality beers in each style category. If the Untappd rating is a strong indicator of quality, then a gold medal-winning beer ought to be among the highest rated beers in its category – analogous to the folks, collectively, correctly guessing the weight of that ox.

To test this, I took a random sample of 20 gold medal-winning beers from the 2017 Great American Beer Festival1 and looked at their standing on Untappd. For each style of beer, Untappd generates a “Top Rated” list, showing the 50 best-rated beers of that style. For each of the 20 beers considered: Is it one of the top 50 rated beers in its style on Untappd?2 It turns out that, for 19 of those 20 beers, the answer is, “No”.

GABF vs Untappd

This probably doesn’t surprise you. We might hastily conclude that, in contrast to the weight of an ox, the quality of a beer is not objective. And this, I think, is incorrect. I’m a firm believer that the quality of a beer – whether a beer is “good” or “bad” – is objective. The difference between a good and a bad beer is a set of noticeable characteristics. A good beer has minimal off-flavors and expresses desirable flavor and appearance characteristics in a coherent way (and “desirable” is almost entirely objective). A bad beer does not. Like the weight of the ox, quality is exclusively dependent on the beer itself.

The Wisdom of Crowds by James Surowiecki was published in 2004 and used the ox experiment as its opening anecdote. The book popularized crowd-sourcing3 and also generated criticism. Jaron Lanier argued that the wisdom of crowds is only harnessed when it isn’t defining its own questions and when the goodness of an answer can be evaluated by a simple result. Neither of these conditions are met with Untappd ratings: There is no definition or even guidance of the star ratings, and the user gets no feedback on whether his or her rating was accurate. The only feedback you get on Untappd is a badge for more check-ins, which motivates you to try more beers and give less thought to each.

There’s another theory that explains this discrepancy, which is that Untappd ratings do not intend to measure the quality of the liquid, but the enjoyment of the experience. Enjoyment, different than quality, is not entirely dependent on the beer. It depends as much on the person experiencing the beer as the beer itself. In other words, it’s subjective in nature. And the liquid is only a part of the experience. How it’s served, where it’s served, the packaging, the name, the story of the beer and the brewery who made it, what others think of it – these all contribute to the drinking experience, and there’s no reason why they can’t be considered in that rating.

The good news

While it’s not a good predictor of sales or indicator of the liquid’s quality (at least not with high resolution – I will concede that a 4.5-star IPA is probably higher quality liquid than a 2.5 star IPA), the ratings are not meaningless noise. They capture the enjoyment of the experience by the Untappd crowd. To the Untappd crowd, today, a great pilsner (Hans Pils) is not as enjoyable as an average IPA (Goose IPA). In a way, Untappd is an articulation of the craft beer zeitgeist. If we want to stay relevant – to stay a part of the conversation, to be on tap at key accounts, to keep influencers coming into the tap room – we ought not ignore it.

And for the brewer, specifically, there is good information scattered among the reviews. “Too much spice in this wit for my taste,” prompts us to evaluate that question and perhaps make adjustments. “Tastes like garbage water,” provides a potential name for a new beer. There are specific users who are capable of and interested in providing honest and insightful feedback, and we take their reviews seriously. It’s the kind of feedback we should have to pay for. In this way, Untappd is an awesome medium for the dialog between brewer and patron.

So, while the data may easily mislead or frustrate us, we mustn’t let that distract from the valuable information that this platform provides when interpreted appropriately. And as for that anxiety: the very human fear of being judged, even unfairly, is something that we would all be better off for overcoming.

1Untappd is highly protective against data mining, so it would have taken too long to look up all 97 beers; instead, we took the first 20 listed in alphabetical order by category. Random enough.

2Consider that many of the ratings of these beers were probably levied with the knowledge that the beer won a GABF Gold – one would expect this to actually boost the beers’ Untappd ratings!

3To its credit, the book detailed several caveats required for the wisdom of crowds to work. One necessary condition for success was that members of the crowd must not be able to influence each other, so that each rating would be independent. On Untappd, however, you see the average rating of a beer and the ratings your friends have given before you provide your input.

6 thoughts on “Untappd and The Wisdom of Crowds

  1. Despite the fact that Untappd beers in general all fall within almost one point of each other and along the lines of what is stated above that the personal rating scale of most individuals is pretty far off base and is more about how much one likes a beer than how much a beer is to style, that chart similarly doesn’t mean much if it doesn’t confirm whether or not the 50 top beers in Untappd have even been entered in GABF.

    GABF isn’t a judging of all beers. It is a judging of beers entered by breweries. Further, breweries are limited to ~5 beers each per year and may have to pay for the privilege (not sure about this one).

    Like

    1. Hey Travis,

      Thanks for your opinion. I agree that Untappd is measuring something different than GABF. GABF is an accepted industry standard for awarding beer quality. It is the most competitive beer competition in the world. Last year, more than 1 in 3 operating U.S. breweries entered an average of 3.5 beers each — probably their best beers (to style), given that they could only choose 4 and, yes, they have to pay for the beers to be entered — they are betting on their best beers. This resulted in an average of 82 beers entered in each category, judged in a structured, deliberate process. The very best of each category is awarded a gold medal, but only if it is deemed a world-class beer.

      Given the selective, competitive process that a GABF gold medal-winning beer has endured, it is reasonable to expect that it is one of the top beers in that style. The fact that (95% of the time) Untappd’s top 50 U.S. beers did not include these is something I find very interesting. Either Untappd ratings are measuring something different than beer quality, or every top-rated Untappd beer (except 1) is a GABF gold-winning beer that never bothered to enter the competition.

      To be clear, I don’t think Untappd ratings are illegitimate or lack value — just that they are not a reliable measure of beer quality.

      -John

      Like

      1. You’re still ignoring the point that it can’t win if it isn’t entered. I’ve had more beers registered in Untappd than were entered in GABF last year. Those are both facts, not opinions. My opinion is that Untappd scoring in general is shit and no one should need that explained.

        Like

  2. No doubt, most beers don’t enter GABF. Many of the 900 Untappd top beers considered probably did not enter GABF. And there is no way to know how many did. But, all that is required for this chart to be meaningful is that some do. For example, even if only 1 of the top 50 beers in each category entered the competition (a very conservative estimate), that top-rated beer was deemed inferior to a non-top-rated beer 95% of the time.

    The alternative explanation to this is that all of the top rated beers are gold-medal worthy beers that simply didn’t bother to enter GABF, which I think we can agree is extremely unlikely.

    Like

  3. Funny. Our best seller, Pearl-Snap, is also a pilsner in the 3.5 range. So is Firestone’s Pivo, which won GABF Gold 3 TIMES IN A ROW.
    Seems for that style, in particular, a higher rating is due to factors other than quality.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s