Hello, welcome to today’s blog which is going to be my second one covering the tidy Tuesday dataset. This week it was looking at a dataset with life expectancy for every country in the world since 1950. I decided you could do some cluster analysis on this dataset and then once you have the clusters can further analyse to understand trends. We are going to use K-means clustering to put the countries together then look for trends and differences between the clusters. So the dataset has country, year between 1950 and 2015 and the life expectancy of that year. Now in order to do clustering, you need at least two measures, therefore, I created one with the change in life expectancy per year. The other measure is going to be the life expectancy in 2015.
In order to find our value of K, I did the below silhouette plot. Now you’re meant to use the value of K with the highest sil width, in this case, it would be 3. However, with so many different countries I feel that would be unfair and group the countries up too much. There are further spikes at 6 and 10.
I decided to do the below plot for different k values both 6 and 10. The plot for 10 is below
10 seems like a good value as there are not too many clusters to deal with but also good variation between the different clusters. We will take k equal to 10 for further analysis.
The comparison above looks at causes of death and i have grouped it up to get the mean for each for cause for each cluster. Conclusions that can be made:
- Cancer is prevalent across all clusters, however, the higher the life expectancy the more prevalent it is. This could be because your more likely to get cancer at older ages.
- Dementia is another cause which seems to increase with older life expectancy.
- HIV is highest in the two lowest life expectancy clusters the same with neonatal deaths
- Finally, road accident is an interesting cause, by far the highest cluster is cluster 7 which seems to be the cluster with the highest increase in life expectancy over the last 65 years. Could this because these are fast developing nations and have not got the safe road infrastructure in as yet.
That’s it for a little intro into reviewing the data this way. Let me know your thoughts and comments. There are lots of dataset on the World Health Organisations website as well as other datasets such as economic growth i can add to this analysis and develop it further.
Hello, welcome to the preview of Group F in the world cup. Thanks for all the support so far on these previews. I would love to hear peoples thoughts and predictions on the competition. Today we will be looking at group F which contains Germany, Mexico, South Korea and Sweden.
The first thing to look at is the age distribution of all 4 teams. Germany seems to have one of the younger squads in the tournament with a relatively small distribution between youngest and oldest players. Mexico has players from the youngest in their 20’s all the way up to near 40. South Korea and Sweden have the same median ages but South Korea has more players clustered around their median and have the lowest amount of players above 30.
Mexico has what looks to be the most experienced squad with players mostly having around 50 caps but they have some players up to 150 caps. Germany has a lot of players with a relatively low amount of caps but also have the trend we have seen with other squads of having a group of players with a lot of caps. I wonder if these players would be a similar age and therefore could be a golden generation. Sweden possibly has the most inexperienced squads in the group with a lot of players less than 50 caps.
On the face of it, Germany seems to have a small number of attackers in the squad. However, they have more midfielders and a few of them are creative attacking midfielders, therefore, I don’t think they will struggle for goals. South Korea also has the same amount of attackers as Germany but seem have picked more defenders. This could leave them struggling to score goals.
Last but not least we look at each teams chances using the probability of implied odds. No surprise really Germany are big favourites to get out the group. However, the fight for second place looks to be a realistic target for the other three teams. It looks particularly close between Mexico and Sweden. They play each other in the last game of the group stage, therefore, it could be a straight shoot-out for second place. Also, South Korea playing Germany who may have already qualified and therefore may make changes could give them an outside chance if it goes to the last game.
That’s it for today’s look a group F please let me know your thoughts would love to start a good debate on your thoughts. Also, check out the other blogs in the series.
Todays World Cup preview we are going to be looking at group E. This group contains Brazil, Serbia, Switzerland and Costa Rica. As mentioned previously this is all part of my series previewing the world cup so please go check the others out and let me know your thoughts. So lets first look at the age distributions of the 4 squads
First of all, I think this is the biggest differences between ages across a group we have seen so far. Brazil and Costa Rica have medians around late 20’s whereas Serbia and Switzerland both have medians around 25. Brazil seems to have a good cluster of players in the late 20’s age bracket which is peak age. Is everything aligning to make Brazil the strongest team in the competition? Costa Rica look to have the group around the same age all in their late thirties. Serbia has a squad towards the younger end of the scale with a lot of players between 20 -25.
The age profiles in all four squads are reflected in the caps distribution. Both Brazil and Costa Rica have players with at least 30 caps. It would be interesting to investigate if the number of caps a team has affected the chances of winning the World Cup. Serbia having the youngest squad of the group also have the most players with the low amount of caps. Switzerland has a fairly even spread across the cap levels.
All four teams seem to have similar squad compositions The only slight difference is Brazil has fewer midfielders at the expense of more attackers. Costa Rica looks to be relatively slim on the ground when it comes to attackers.
Lets now take a look at how each teams chances in the tournament compare. This has been done by working out the chance based on implied bookies odds. Whats not surprising is Brazil has an excellent chance of getting through the group seen as though they are the actual favourites of the competition. Whats good to see if that it looks to be a close competition for second place in the group with both Serbia and Switzerland with around 50/50 chance. Will be good to see how close it is when the games are close.
That’s it for today’s preview please check the others out and let me know your thoughts. I’ll be back with the next one tomorrow.
Hi there welcome to next in series of little previews ahead of the FIFA World Cup. Today we are dissecting the 4 teams in group C; France, Peru, Denmark and Australia. Please do check out the other previews and further previews are upcoming at 6 pm everyday ahead of the first game.
On the face of it, these look to be some of the youngest squads in the tournament. Australia seems to have players from both ends of the spectrum and a good grouping around peak age players. France has probably the lowest median age across all squads in the competition. Peru doesn’t have too many players between 20-25, however, have a good grouping between 25-28.
Looking at the distribution of caps in each squad it looks like all four teams have relatively inexperienced players. Denmark has the most amount of players which have around 25 caps. They also have the familiar trend of having a spike higher up showing a good amount of experienced pros vital in any squad make up. Peru seems to have the most amount of players with experience in their squad which could stand them in good stead to get out the group. The big question for France is will their lack of experience affect them later in the competition.
Finally looking at squad composition France and Australia seem to have the most amount of attackers. France has done this by bringing fewer midfielders Australia by bringing fewer Defenders. Peru seems to have gone a totally different direction to the rest of the team with a squad overloaded with midfielders. Most are attacking midfielders so they should still have goalscoring options.
Now we look at each teams probability of getting out the group and winning the tournament. Finally, we have a group that on the face of it could be quite competitive for second place at least. Denmark is a clear favourite but both Peru and Australia seem to have good outside chances at least according to the bookies. France has a decent chance of winning the whole tournament and is currently 4th favourites, so it will be interesting to see how they do with their young squad.
That’s it for today’s overview let me know your thoughts how far do you think France will go and who you think will get out the group?
Hello welcome to today’s blog. We are going to be looking to see if Home Secretary is the poison chalice job it is made out to be in the media. Recently Amber Rudd was forced to resign from the job due to being found to have lied to parliament. Many political commentators following it commented it being the hardest job in government and the apparent high turnover in occupants. I thought rather then take there word for it it could be tested with readily available data. I created my own data set from the last 100 years or so with the number of incumbents to the 4 great offices of state. Looking at the number of days they served in the role. I didn’t include anyone who died in the job as that’s nothing to do with difficulty of the job.
The first thing to look at is the number of holders of the 4 great offices of state since 1916. Clearly the “safest” job looks to be Prime Minister. This I think is because clearly the Prime Minister is responsible for hiring and firing the other three jobs and possibly Prime Ministers will often push incumbents out of those jobs in order to protect themselves. Also when we get back to the main question we were asking at the start of this blog then Home Secretary has had the most incumbents in the last 100 years suggesting there is a higher turnover then other jobs. However Chancellor and Foreign Secretary are no too far behind.
The plot above showing the distribution of days in office with the mean plotted as a black dot. Clearly the Prime Minister has the highest mean number of days in office but as you can see from the general spread its broadly similar to the other three jobs however it has been dragged up by the two outliers (Thatcher and Blair). The other three jobs have very similar means however home secretary does have the lowest. The general distribution though is similar to Foreign Secretary and Chancellor. Therefore it could be small sample size that is effecting the result. Looking at this i definitely don’t think its as clear the press make out.
Finally we look at the general trend over the last 100 years for each of the 4 great offices of state. Overall you can see that generally Prime Minister and Chancellor times in office are increasing. Possibly because in the last 20 years there have been two Prime Ministers that have aligned themselves closely with their chancellors. Foreign and home secretaries however have not changed and there tenures have stayed around the same levels over the last 100 years.
In conclusion I don’t think its clear that home Secretary is the worst job in government however it does seem they spend generally shorter in position then other 3 great offices of state. What’s surprising is Foreign Secretary is pretty similar to Home Secretary when its a lot small area to cover and a lot less that can go wrong. Maybe its easy to move the Foreign Secretary around in a re shuffle. Thanks for reading this blog if you enjoyed and want to see more please let me know and give the blog a follow so you can see when I post a new blog.