The best beer is where priests go to drink. For a quart of Ale is a dish for a king.

William Shakespeare

Shakespeare's poetic reference to beer demonstrates its esteemed position in society, even during the 17th century. However, even though his preference is solely a personal choice, which would likely be a topic of debate among enthusiasts, beer has a longstanding cultural significance, enjoyed across diverse regions with unique brewing traditions.

In the current globalised market, beer has become a shared experience with a wide range of preferences. While some beers achieved universal acclaim, others have sparked debate due to differences in flavor, regional brewing methods, and cultural context. This dynamic interplay of preference and locality has turned some brews into cultural symbols and conversation starters.

The objective is to gain insight into the factors influencing the varying perceptions of beers. The analysis will focus on several categories: from « controversial » brews, which are those that receive polarised reviews, to « universal » beers, which receive more consistent reviews. The study analyses both quantitative ratings and textual reviews in order to examine how sensory attributes, such as aroma, appearance, palate and taste, influence consumer opinion. The study also examines regional differences and compares the input of novices and experts in order to gain insight into why some beers have broad appeal while others are the subject of intense debate.

Grab your favorite brew, and let's explore what the data says about the beers that stir up the strongest reactions.



Research questions

The following key questions define how this study is structured and what are the main topic that we will explore.

  • How can we define and classify beer controversiality?
  • Can we use the textual review to enhance our classification?
  • Are there features explaining controversiality?

Data presentation

To begin this analysis, we first uncovered an interesting pattern: some breweries, beers, and users were present in both RateBeer and BeerAdvocate. We also found instances where a single brewery in one dataset corresponded to multiple entries in the other. We merged the two datasets into a single unified one encompassing beers, breweries, users, and reviews, with consistent ID matching across all features. More details on how the datasets were merged can be found here.

With the data now organised into a more accessible format, we can begin examining its core content. We start by looking at some key numbers.

Dataset summary

This dataset provides a rich source of information for our analysis. Notably, approximately 90% of the beers have at least one review, ensuring that the research will be both meaningful and insightful. For this project, we will focus solely on these beers.

It is important to acknowledge, however, that the dataset is not fully representative. A significant portion of the reviews comes from American users, meaning our analysis primarily caters to their preferences and tastes. We decided not to rebalance the dataset, as doing so would have excluded many brews and reviews, that are still relevant to our analysis. Below is a visualisation highlighting the 20 countries with the most reviews in our dataset.

Top countries by number of beer reviews

Reviews from both websites follow a similar structure: users rate the appearance, aroma, palate, and taste of each beer, as well as provide an overall grade. To standardize the data, we converted all ratings to a unified 1-to-5 scale, as this range is the most widely used. Please note that some of the scaled ratings have a different resolution.

Exploring the definition of controversiality

To identify controversiality in our dataset, we first need to define what it means and how to measure it. Let’s start with a general definition:

Controversial:

"subject of intense public argument, disagreement, or disapproval" [1]

While this definition is broad and applies to various situations, our analysis narrows it to beers with the most polarised reviews: those that are most likely to spark disagreement, even with a Shakespeare fan in the room. We base our analysis on user ratings across appearance, aroma, palate, taste and overall features as well as textual reviews.

Preliminary considerations

The metrics used for this analyis of user ratings is the variance, which describes the level of disagreement around the average score. Since variance is dimension-independent, it provides a consistent measure across attributes, relying uniquely on the given ratings.

To ensure reliable results, we exclude beers with fewer than 30 ratings or reviews. More details on this decision can be found here, as beers with lower ratings may show higher variance (e.g.only two extreme opinions).

First classification

With the data cleaned, the first step in classifying beers as controversial or not involves analysing the variances of each attribute. It may seem logical to use the variance of the overall grade, as it reflects a combination of each attribute. However, setting a threshold on this variance to classify beers presents challenges. A deeper analysis revealed that using an arbitrary threshold primarily highlighted taste as the most polarizing factor. We believe that relying solely on a single parameter oversimplifies controversiality, and that it is inherently multi-dimensional and influenced by the interaction of all attributes.

Clustering

Recognising the limitations of a one-dimensional approach, we turned to a clustering method, only using the grades, and keeping the text for later. We chose to use a Gaussian Mixture Model (GMM), factoring in the variances of all five attributes, to group together beers.

Using the elbow method on the negative log-likelihood plot, we determined 3 clusters to be optimal.


The clusters were computed using standardised variances. They serve as our key metric for controversiality: High variance indicates controversy or disagreement, while low variance suggests a universal appeal. The plot shows standardised variances, where smaller values represent low variance, and larger values indicate high variance. Based on this, we define the clusters as follows:

  • Cluster 0 : Controversial beers, representing 9.81% of the beers.
  • Cluster 1 : Universal beers, representing 37.13% of the beers.
  • Cluster 2 : Neutral beers, representing 53.06% of the beers.

Extracting insights from textual reviews

The Gaussian Mixture Model offers a more effective analysis by considering the interaction of all attributes. We are now going to extend our exploration by including insights from textual reviews to better define beer controversiality. Our first approach was to use a sentiment analysis model on every qualitative review, classifying each review as either Positive, Neutral, or Negative.

With this new information, we attempted the Gaussian Mixture Model again, this time incorporating sentiment classifications. Three models were constructed, each categorises beers into three categories (Controversial, Universal and Neutral), but differs in the data used:

  • Classifier 1 uses all ratings, as before.
  • Classifier 2 uses only ratings having textual reviews, accounting for sentiment label.
  • Classifier 3 uses only ratings having textual reviews, not accounting for sentiment label.

Percentage of beers that have the same labels between models

Restricting the data to only reviews with text results in a 14% loss in the dataset size. Despite the additional information provided by sentiment analysis, the results showed that the similarity between the models was high, suggesting that sentiment analysis did not significantly alter the classification. Considering the lack of improvement in classification accuracy, the loss of data and the uncertainty added by the sentiment analysis, we will pursue without using the sentiment analysis.

Analysing the potential results of controversiality

Now that we have a satisfying classification for the controversiality of the beers, we can start to explore the data to find the reasons behind it.

Main attributes

We begin by analysing the distribution of the variances of the main attributes for each class.

Appearance and palate exhibit the highest mean variance across all classes, suggesting they may contribute the most to the controversiality of a beer. However, it is important to note that these scaled attributes have a lower resolution in the RateBeer dataset. As a result, it amplifies the variance of these attributes.

The plot also shows that there are no extreme outliers for universal and neutral class. This suggests that the GMM classification effectively separates beers with fewer controversial attributes and performs reliably in the classification.

Alcohol By Volume

Alcohol By Volume (ABV) is a key attribute of beer, influencing its taste. We found, as expected, that extreme values of ABV are related to the controversilty of a beer. For the next part we choose to not plot the beers beyond 20% ABV as there are too few to be significant, and caused fluctuations.

By analysing the evolution of the proportion of the classification of the beers per level of alcohol by volume we can extract the following:

  • After a peak at around 5%, the more alcohol by volume the less beers there are.
  • Most of the common beers (with an ABV between 2.5% and 7.5%) are neutral and some are controversial.
  • Beers with a low abv (<5%) tends to be controversial or neutral, beers with a very high abv (>15%) also tends to shows this behaviour.
  • Interestingly, the stronger beers (between 6% and 16%) are more likely to be universal or neutral. This would mean that the ratings of the users tends to converge on relatively strong beers.

Number of ratings per beer

We performed a similar analysis for the number of reviews per beer, showing that independently of the number of reviews, the proportion between the cluster remains the same. Furthermore the total amount of beer per number of rating is a heavy tail distribution. The impact of the number of ratings on our classification is thus low.

Level of the user

It is important to also consider which user made the review, the severity and controversialy might differ between their first and their last review. We choose to separate users into 3 categories [4.4], novice with less than 20 reviews, enthusiast with less than 800 reviews and connoisseur with more than 800 reviews.

We can conclude that the connoisseur tend to drink and rate more universal beers and the novice tend to rate more controversial beers. We can also try to state some hypothesis:

  • Connoisseur might know their taste better, as a result they might drink more niche beers that would cater to them.
  • Novice might weight in more importance to a specific attribute, while connoisseur might take various into account. As a result, connoisseur ratings might be more fair and thus closer to the mean
  • The higher proportion of universal in the connoisseur might be due to the herding effect. If the first few ratings are made by connoisseur, then the novice users might copy their opinions.
  • Country of the users

    The following maps illustrates the proportion of labels (neutral, universal or controversial) assigned depending on the users country of origin. It is important to consider that some countries are less represented than others, particularly those where alcohol consumption is not a cultural norm, leading to a very low number of reviews. On the contrary, the USA still represents more than 50% of the reviews.

    Choose a map

    Location of the brewery

    Similarly, we analysed the origin of the beers themselves. We anticipate the proportion of each category for each country to be similar to the overall division, especially for the country with a lot of beers.

    Choose a map
    We can see that our hypothesis holds for the European countries. However, the US seem to have a higher percentage of universal beers than expected. Furthermore, south american, asian and some african countries, as well as some smaller countries, tends to have a higher proportion of controversial beers. Since some countries have fewer beers, they might have a higher proportion of controversial beers and not follow our initial hypothesis.

    Score of comment habit

    Finally, we defined a score to measure comment habit. This score indicates whether users tend to rate a higher proportion of categories (controversial, universal or neutral) of local beers or foreign ones. On the map below the score approaches +1 if the proportion of universal labels is higher for home beers and goes to -1 if the proportion is higher for foreign beer.

    Let \( s_x^{class} \) be the proportion of a category of beers that are from the same country as the user, and let \( s_y^{class} \) be the be the proportion of a category of beers that are from a different country as the user, we obtain that the score per country is defined as: \[ Score^{class} = s_x^{class} - s_y^{class}\]
    Choose a map
    We can observe that:
    • South American and Asian countries rates more universal foreign beers.
    • South American countries have controversial opinion about their own beers.
    • The USA are unique: beers from their country are more universal, and they rate foreign beers as controversial.
    • Some Asian and African countries tend to not judge local or foreign beers differently.
    • The mean for controversiality and neutrality is positive. Most countries tend to rate more controversially and neutrally beers of their own countries.
    • Furthermore, the mean score for universality is negative, invalidating any hypothesis of chauvinism.
    Note that the mean is not weighted with the amount of beers of a country, each country has the same influence on the mean score. The US produces the most beers, and has the most users, they also tends to grade their own beer as universally. Due to the importance of the US in the dataset we must keep in mind their influence and how it might bias the results. Other users would also drink those universal beers but they would be considered as foreign countries for them.

Conclusion

Our initial exploration of the dataset did not reveal a clear separation between beers. Employing advanced clustering methods (GMM), we identified three primary clusters: universal, neutral, and controversial beers. Sentiment analysis was conducted, but its marginal benefits did not outweigh the cost of reducing the dataset size.
Further analysis revealed that the ABV plays a strong role, with the most controversial and neutral beers having the highest and lowest ABV, and a consensus on relatively strong beers. Interestingly, user expertise levels (novice, enthusiast, connoisseur) exhibited a strong association with the proportion of universal beers, with connoisseurs favoring them most. Geographical data highlighted the outsized influence of the USA due to its significant representation in the dataset, affecting the observed patterns in beer and user origins.
Just as Shakespeare's era recognized beer's cultural significance, our analysis reveals that the appreciation of beer continues to evolve, shaped by expertise, regional influences, and the complex interplay of brewing characteristics.


To go further

Our study revealed that our clusters were sensitive to the number of beers used. Increasing the number of beers would improve the robustness of the clusters. However, achieving this requires addressing the impact of variance fluctuations in cases of a low number of ratings. While we chose a threshold which marks the beginning of a plateau, we observed that the variance still slowly decreases over the number of ratings. Therefore, developing a method to effectively weight the number of ratings in our analysis would improve the robustness of our analysis, enabling us to have even more reliable results.

Another way to dive deeper into understanding why a beer might be controversial is by analysing qualitative reviews. While quantitative data analysis is already complex, the true insights behind a 7/10 rating, for instance, lie within the qualitative feedback. By using various analysis methods detailed in the GitHub, we can extract relevant themes from the comments, such as taste, color, and other attributes. This approach not only allows us to determine whether the sentiment is positive or negative but also uncovers the underlying reasons behind these sentiments. Although this method of analysis is time-consuming, it offers the potential to uncover deeper insights into what makes a beer controversial.

Meet the team

Alan Wicht

Robotics

Gustave Lapierre

Mechanical Engineering

Jehan Piaget

Robotics

Mattéo Fiore

Robotics

Valentin Perret

Robotics