Tidy Tuesday – Films Dataset

Hello, today we are going to be looking at this weeks tidy Tuesday dataset. This is just quick EDA as i got a bit carried away with just the dataset. I initially set out to do just one interesting graph but kept finding more and more interesting insights. So below you can see the structure of the dataset:


As you can see its got 3401 observations of 9 variables. All different films so first of all lets look at home the production budgets atre distributed with a histogram


We can see that the vast majority of films in this dataset have a budget less than 25 million dollars. But how does that change for each genre in the dataset:


Now you can see some interesting insights. Comedy, drama and horror films have clear peaks at the lower end of the production budget. Action and Adventure films are much more spread out across all production budgets.


Now let’s look at how the production budget influences how much the film grosses. Action and Adventure have the steeper slopes, so the more money put into these films on average the higher reward.

release date

Above you can see when each film is released during the year. Notice the peak for horror in the 10th month, Halloween. Also, Drama seems to increase toward the end of the year in time for Oscar season. Adventure has a peak in July for summer blockbusters and in December maybe aiming for the holiday season.


Finally, for now, let’s look at the median profit percent per month, can we get any ideas when its best to release a particular genre. For action, adventure and comedy they seem to have 2 peaks. One in the middle of the year and one towards the end. Horror seems to have generally the highest median earnings. This data set is a simple one but one which insights are easy to come by. I could definitely at least write another blog with more information I have found form this dataset. Another day perhaps.

