Hello welcome to today’s blog. Its actually my entry into the Reddit subreddit Data Is Beautiful monthly data vis battle. So if you want to join in go check it out this month the data set is ………………………. would be great to see lots of different takes on it. A few months ago it was a travel survey conducted on the Travel sub reddit. Heres my take on it:
Initial Exploratory Analysis
The three graphs above you can see a basic breakdown of the dataset. The first thing to point out is there are clearly more males filling out the survey then females. Also, the most common age is the 22-29 age bracket. American is by far the most common nationality which I think is more a reflection of Reddit users rather than people who generally go travelling. At the start I said the idea was to find some groups and review the groups different tendencies. I am going to do that using trips per year variable and create a new one based on trips and spend: spend per trip.
Now the relationship is not entirely unexpected, the more trips a person takes the less they spend per trip. Not many people are millionaires that can just permanently be globetrotting in 5* resorts. Not the people that read the reddit travel area at least. We are now going to use k means clustering to allocate the groups. First up let’s see how many groups there needs to be
K- Means Clustering
Clearly by reviewing the sil plot either 3 or 5 clusters is appropriate. We are going to go with 3 clusters as that is the highest value on the plot
Now above is previous plot with it now coloured by the cluster each point belongs. There are three clear groups:
- Cluster 1 :- Travel little but spend a lot
- Cluster 2 :- Travel little and cheaply
- Cluster 3 :- Travel a lot and cheaply
We can now use these clusters or groupings to look for trends in the three groups.
We are going to use the three groups we identified to look for different trands. The first area to look at is the types of accomdation the groups prefer:
Cluster 1 is probably a cluster that the average person would belong to and seems to have a fairly even spread of different accommodation types. Cluster 2 has the majority of people in either AirBnb or hotels. Finally, cluster 3 seems to have the highest proportion of people in hotels which is surprising. I would have thought people who travel a lot would use hostels more.
The first thing that is surprising is that in 2 of the 3 groups less than a quarter of people use social media while travelling. That increases to about a third for people in cluster 3. Then when we look at what it’s used for cluster 3 has by far the most people who want to use social media to earn money. These must be the travel vloggers.
That’s it for todays review i’m sure there is loads more you can do to review this dataset and i’m sure there loads of other insights to find.