How on-screen emotions
make money?
Analysis of movie sentiment impact on ratings and revenue in years.
What is in this data story and Why?
The first question of all is why we should analyze the movie industry. There is money. A lot of money. The flourishing film industry in 2019 achieved a global box office worth of around $42.2 billion. One year earlier, in 2018, when including home entertainment, this market segment was priced at $136 billion. For comparison, in 2018, Denmark's GDP was $556.8 billion, and Switzerland's GDP was $735.5 billion. And single most popular titles, such as Top Gun: Maverick or Avatar, brought box office revenues of respectively $1.49 billion or $2.92 billion worldwide. This shows how big of a deal for the filmmakers it is to draw up a good recipe for an entertaining and gripping movie!
Comparison of Top Gun and Avatar revenues with 2019 global box office and 2018 global movie industry worths, and with 2018 Denmark GDP.
Throughout this data story, we will walk you through analyzing the popular movies' plots and extracting their key sentiment ingredients. Humans are emotional beings, and let's be honest, they watch only movies close to their hearts. Therefore, besides assessing general sentiment, we will dive deeper and focus on, closest to the human soul, emotions. This connection to emotion in the movies was actually also somehow contained in the famous quote by movie director Rich Moore - "A good movie makes the audience feel like they've journeyed with the characters". Moreover, we will try to answer how and which manifestations of violence influence movie popularity. All of these movie aspects will result in an ultimate cookbook for any aspiring movie creator containing the right emotion-related ingredients for a successful movie.
"A good movie makes the audience feel like they've journeyed with the characters."
~Rich Moore
Which data have we used?
To be able to obtain some conclusions for the cookbook, on which future directors can base their movies we need a lot of data. Our analysis is based on the CMU Movie Summary Corpus. However, the amount of data has shown not to be sufficient for our purposes, as only small fraction cotains revenue data. Thereofre, we decided to get more data from the internet and scraped another (and also the same :D) movies from IMDB and TMDB. This data unfortunately contain again only a small fraction of the movies with revenue data, but as the full set is bigger, the fraction is bigger as well. The revenue data are used for the analysis for the cookbook. In the following figure can be seen composition of the data we used in our analysis and their respective distributions over the years.
We based our analysis on 7.5k revenues data from CMU, 17k revenue data in total, 42k movie data from CMU and over 400k films scraped from the internet.