This week Tidy Tuesday looks at house price increase in the USA across states. The data is linked below:
There are three sets of data:
- State HPI – which has the state average HPI and the mean for all states for every month and year since January 1975 up to November 2018.
- Mortgage – The interest rates for various types of mortgages since 1975
- Recessions – details the dates of all major recessions in the USA
To start with I’m going to focus on the state HPI prices as I think that will be the best dataset to develop insights. So let’s read the data into the script and see what its made of :
We can see the data frame contains the year and the month. State is the two-letter state code and there is the price index for the state and then the US average price index.
Above you can see a comparison of the two states price index from 1975 to 2018. Florida generally stays above the US average and increases at what seems like a faster rate than Georgia. Up to the year 2000, Georgia is level with the national average but since then it has fallen behind
I’m going to look for the difference in all the states using k means clustering. In order to do the k means, I will use two different metrics. Each states average difference to the US average and each states difference between the November 2018 value and the January 1975 number. First, let’s plot each state by those metrics and see if I can see anything before K means
Reviewing the graph it looks like there are two states with significantly higher growth than other states. There are possibly 3 groups states below average, states above average and the states significantly above average. Let’s run K means and see if that agrees
Yes, the silhouette plot is indicating 3 groups. Let’s go with 3 clusters and see what we have:
My initial hunch seems to be correct. It looks like they are split on under the national average, above the national average and significantly above the national average. Hawaii and District of Columbia are significantly higher than all of the other states. It would be interesting to look into the reasons why they have an increase way higher than all the other states.
Above you can see each state coloured by the cluster they belong to. It’s maybe a slight east-west split with most of the western states belonging to cluster 1.
Thats it for today’s blog. Thanks for reading let me know if you see something that you don’t agree with or i could do better.