Why Some Beers Spark Debate:
A Short Walk Through Controversial Brews
Discover with us the in-depth analysis of the controversiality hidden behind beers.
The best beer is where priests go to drink. For a quart of Ale is a dish for a king.
William Shakespeare
Shakespeare's poetic reference to beer demonstrates its esteemed position in society, even during the 17th century. However, even though his preference is solely a personal choice, which would likely be a topic of debate among enthusiasts, beer has a longstanding cultural significance, enjoyed across diverse regions with unique brewing traditions.
In the current globalised market, beer has become a shared experience with a wide range of preferences. While some beers achieved universal acclaim, others have sparked debate due to differences in flavor, regional brewing methods, and cultural context. This dynamic interplay of preference and locality has turned some brews into cultural symbols and conversation starters.
The objective is to gain insight into the factors influencing the varying perceptions of beers. The analysis will focus on several categories: from « controversial » brews, which are those that receive polarised reviews, to « universal » beers, which receive more consistent reviews. The study analyses both quantitative ratings and textual reviews in order to examine how sensory attributes, such as aroma, appearance, palate and taste, influence consumer opinion. The study also examines regional differences and compares the input of novices and experts in order to gain insight into why some beers have broad appeal while others are the subject of intense debate.
Grab your favorite brew, and let's explore what the data says about the beers that stir up the strongest reactions.
The following key questions define how this study is structured and what are the main topic that we will explore.
To begin this analysis, we first uncovered an interesting pattern: some breweries, beers, and users were present in both RateBeer and BeerAdvocate. We also found instances where a single brewery in one dataset corresponded to multiple entries in the other. We merged the two datasets into a single unified one encompassing beers, breweries, users, and reviews, with consistent ID matching across all features. More details on how the datasets were merged can be found here.
With the data now organised into a more accessible format, we can begin examining its core content. We start by looking at some key numbers.
This dataset provides a rich source of information for our analysis. Notably, approximately 90% of the beers have at least one review, ensuring that the research will be both meaningful and insightful. For this project, we will focus solely on these beers.
It is important to acknowledge, however, that the dataset is not fully representative. A significant portion of the reviews comes from American users, meaning our analysis primarily caters to their preferences and tastes. We decided not to rebalance the dataset, as doing so would have excluded many brews and reviews, that are still relevant to our analysis. Below is a visualisation highlighting the 20 countries with the most reviews in our dataset.
Reviews from both websites follow a similar structure: users rate the appearance, aroma, palate, and taste of each beer, as well as provide an overall grade. To standardize the data, we converted all ratings to a unified 1-to-5 scale, as this range is the most widely used. Please note that some of the scaled ratings have a different resolution.
To identify controversiality in our dataset, we first need to define what it means and how to measure it. Let’s start with a general definition:
"subject of intense public argument, disagreement, or disapproval" [1]
While this definition is broad and applies to various situations, our analysis narrows it to beers with the most polarised reviews: those that are most likely to spark disagreement, even with a Shakespeare fan in the room. We base our analysis on user ratings across appearance, aroma, palate, taste and overall features as well as textual reviews.
The metrics used for this analyis of user ratings is the variance, which describes the level of disagreement around the average score.
Since variance is dimension-independent, it provides a consistent measure across attributes, relying uniquely on the given ratings.
To ensure reliable results, we exclude beers with fewer than 30 ratings or reviews.
More details on this decision can be found here, as beers with lower ratings may show higher variance (e.g.only two extreme opinions).
With the data cleaned, the first step in classifying beers as controversial or not involves analysing the variances of each attribute. It may seem logical to use the variance of the overall grade, as it reflects a combination of each attribute. However, setting a threshold on this variance to classify beers presents challenges. A deeper analysis revealed that using an arbitrary threshold primarily highlighted taste as the most polarizing factor. We believe that relying solely on a single parameter oversimplifies controversiality, and that it is inherently multi-dimensional and influenced by the interaction of all attributes.
Recognising the limitations of a one-dimensional approach, we turned to a clustering method, only using the grades, and keeping the text for later.
We chose to use a Gaussian Mixture Model (GMM), factoring in the variances of all five attributes, to group together beers.
Using the elbow method on the negative log-likelihood plot, we determined 3 clusters to be optimal.
The clusters were computed using standardised variances.
They serve as our key metric for controversiality: High variance indicates controversy or disagreement, while low variance suggests a universal appeal.
The plot shows standardised variances, where smaller values represent low variance, and larger values indicate high variance.
Based on this, we define the clusters as follows:
The Gaussian Mixture Model offers a more effective analysis by considering the interaction of all attributes.
We are now going to extend our exploration by including insights from textual reviews to better define beer controversiality.
Our first approach was to use a sentiment analysis model on every qualitative review, classifying each review as either Positive, Neutral, or Negative.
With this new information, we attempted the Gaussian Mixture Model again, this time incorporating sentiment classifications.
Three models were constructed, each categorises beers into three categories (Controversial, Universal and Neutral), but differs in the data used:
Restricting the data to only reviews with text results in a 14% loss in the dataset size. Despite the additional information provided by sentiment analysis, the results showed that the similarity between the models was high, suggesting that sentiment analysis did not significantly alter the classification. Considering the lack of improvement in classification accuracy, the loss of data and the uncertainty added by the sentiment analysis, we will pursue without using the sentiment analysis.
Now that we have a satisfying classification for the controversiality of the beers, we can start to explore the data to find the reasons behind it.
Appearance and palate exhibit the highest mean variance across all classes, suggesting they may contribute the most to the controversiality of a beer.
However, it is important to note that these scaled attributes have a lower resolution in the RateBeer dataset.
As a result, it amplifies the variance of these attributes.
The plot also shows that there are no extreme outliers for universal and neutral class. This suggests that the GMM classification effectively separates beers with fewer controversial attributes and performs reliably in the classification.
By analysing the evolution of the proportion of the classification of the beers per level of alcohol by volume we can extract the following:
We performed a similar analysis for the number of reviews per beer, showing that independently of the number of reviews, the proportion between the cluster remains the same. Furthermore the total amount of beer per number of rating is a heavy tail distribution. The impact of the number of ratings on our classification is thus low.
We can conclude that the connoisseur tend to drink and rate more universal beers and the novice tend to rate more controversial beers. We can also try to state some hypothesis:
Our initial exploration of the dataset did not reveal a clear separation between beers.
Employing advanced clustering methods (GMM), we identified three primary clusters: universal, neutral, and controversial beers.
Sentiment analysis was conducted, but its marginal benefits did not outweigh the cost of reducing the dataset size.
Further analysis revealed that the ABV plays a strong role, with the most controversial and neutral beers having the highest and lowest ABV, and a consensus on relatively strong beers.
Interestingly, user expertise levels (novice, enthusiast, connoisseur) exhibited a strong association with the proportion of universal beers, with connoisseurs favoring them most.
Geographical data highlighted the outsized influence of the USA due to its significant representation in the dataset, affecting the observed patterns in beer and user origins.
Just as Shakespeare's era recognized beer's cultural significance, our analysis reveals that the appreciation of beer continues to evolve, shaped by expertise, regional influences, and the complex interplay of brewing characteristics.
Our study revealed that our clusters were sensitive to the number of beers used. Increasing the number of beers would improve the robustness of the clusters. However, achieving this requires addressing the impact of variance fluctuations in cases of a low number of ratings. While we chose a threshold which marks the beginning of a plateau, we observed that the variance still slowly decreases over the number of ratings. Therefore, developing a method to effectively weight the number of ratings in our analysis would improve the robustness of our analysis, enabling us to have even more reliable results.
Another way to dive deeper into understanding why a beer might be controversial is by analysing qualitative reviews. While quantitative data analysis is already complex, the true insights behind a 7/10 rating, for instance, lie within the qualitative feedback. By using various analysis methods detailed in the GitHub, we can extract relevant themes from the comments, such as taste, color, and other attributes. This approach not only allows us to determine whether the sentiment is positive or negative but also uncovers the underlying reasons behind these sentiments. Although this method of analysis is time-consuming, it offers the potential to uncover deeper insights into what makes a beer controversial.
Robotics
Mechanical Engineering
Robotics
Robotics
Robotics