Hello, welcome to today’s blog which is going to be my second one covering the tidy Tuesday dataset. This week it was looking at a dataset with life expectancy for every country in the world since 1950. I decided you could do some cluster analysis on this dataset and then once you have the clusters can further analyse to understand trends. We are going to use K-means clustering to put the countries together then look for trends and differences between the clusters. So the dataset has country, year between 1950 and 2015 and the life expectancy of that year. Now in order to do clustering, you need at least two measures, therefore, I created one with the change in life expectancy per year. The other measure is going to be the life expectancy in 2015.
In order to find our value of K, I did the below silhouette plot. Now you’re meant to use the value of K with the highest sil width, in this case, it would be 3. However, with so many different countries I feel that would be unfair and group the countries up too much. There are further spikes at 6 and 10.
I decided to do the below plot for different k values both 6 and 10. The plot for 10 is below
10 seems like a good value as there are not too many clusters to deal with but also good variation between the different clusters. We will take k equal to 10 for further analysis.
The comparison above looks at causes of death and i have grouped it up to get the mean for each for cause for each cluster. Conclusions that can be made:
- Cancer is prevalent across all clusters, however, the higher the life expectancy the more prevalent it is. This could be because your more likely to get cancer at older ages.
- Dementia is another cause which seems to increase with older life expectancy.
- HIV is highest in the two lowest life expectancy clusters the same with neonatal deaths
- Finally, road accident is an interesting cause, by far the highest cluster is cluster 7 which seems to be the cluster with the highest increase in life expectancy over the last 65 years. Could this because these are fast developing nations and have not got the safe road infrastructure in as yet.
That’s it for a little intro into reviewing the data this way. Let me know your thoughts and comments. There are lots of dataset on the World Health Organisations website as well as other datasets such as economic growth i can add to this analysis and develop it further.
Hello and welcome to the second part of my mini-series using cluster analysis in order to categorise formula 1 circuits. please go check the first part it outlines the basic data we are using to categorise the circuits and an overview of the method used for hierarchical clustering. Today we are going to go with K-means clustering.
For K-means clustering we have to set our own value for K we are going to do that with two different types of analysis. An elbow plot and silhouette analysis.
The code below is what was used in order to generate the elbow plot. The elbow plot generated is below:
Reviewing the elbow plot it looks like already we are seeing a slightly different amount of clusters then we got when we conducted hierarchical clustering. The elbow of the plot looks to be at 3 but you can also argue there is one at 4 as well as the value for k.
The other way to decide a k value when conducting k means clustering is to produce a silhouette graph. This takes every point which is part of the analysis and rates it on how it fits in with each cluster with -1 being doesn’t fit at all and 1 being fits well. You then produce a graph for each value of k with the average silhouette width and the highest point is the value of k. I have put a picture of the code below and also the silhouette graph produced
Fascinatingly there are two high points. One for a k of 9 and another for a k of 3. I am going to choose a k of 3 as this is closely aligned to what we saw in the elbow plot and 9 clusters are just too many to deal with.
The above graph shows all the circuits in the calendar and where they are for average straight length and average speed, colour by the cluster they have been put in. I am a bit unsatisfied with this. I feel this doesn’t quite fit the different circuits on the calendar. For instance, Singapore is different to China and Germany. Therefore K-means is not going to be the clustering I use in the final blog to look at pace trends across the season. Look out for the final blog which we will look at the pace across all circuits so far for all the teams and we will look at some other metrics like overtakes and pitstops.
Hello, welcome to the preview of Group F in the world cup. Thanks for all the support so far on these previews. I would love to hear peoples thoughts and predictions on the competition. Today we will be looking at group F which contains Germany, Mexico, South Korea and Sweden.
The first thing to look at is the age distribution of all 4 teams. Germany seems to have one of the younger squads in the tournament with a relatively small distribution between youngest and oldest players. Mexico has players from the youngest in their 20’s all the way up to near 40. South Korea and Sweden have the same median ages but South Korea has more players clustered around their median and have the lowest amount of players above 30.
Mexico has what looks to be the most experienced squad with players mostly having around 50 caps but they have some players up to 150 caps. Germany has a lot of players with a relatively low amount of caps but also have the trend we have seen with other squads of having a group of players with a lot of caps. I wonder if these players would be a similar age and therefore could be a golden generation. Sweden possibly has the most inexperienced squads in the group with a lot of players less than 50 caps.
On the face of it, Germany seems to have a small number of attackers in the squad. However, they have more midfielders and a few of them are creative attacking midfielders, therefore, I don’t think they will struggle for goals. South Korea also has the same amount of attackers as Germany but seem have picked more defenders. This could leave them struggling to score goals.
Last but not least we look at each teams chances using the probability of implied odds. No surprise really Germany are big favourites to get out the group. However, the fight for second place looks to be a realistic target for the other three teams. It looks particularly close between Mexico and Sweden. They play each other in the last game of the group stage, therefore, it could be a straight shoot-out for second place. Also, South Korea playing Germany who may have already qualified and therefore may make changes could give them an outside chance if it goes to the last game.
That’s it for today’s look a group F please let me know your thoughts would love to start a good debate on your thoughts. Also, check out the other blogs in the series.
Group D is next on the agenda for us to take a look at in this series previewing the world cup. This is part of a series looking at all the groups in the World Cup so please takes a look at the others and let me know your thoughts. Group D contains Argentina, Croatia, Iceland and Nigeria. So let’s take a look at the age make up of the 4 squads
Argentina has one of the oldest medians we have seen so far and it looks to be about 30. Does this mean its this squad last opportunity to win the world cup? Lionel Messi will no be around forever and this is probably his last chance. Both Croatia and Iceland have similar medians which are around the area we have seen most medians so far in this preview series. Nigeria has a relatively young median age however interestingly they have more players over 30 then Croatia and Iceland.
There seems to be a big correlation between Nigeria’s relatively young squad and it seems to have the lowest amount of caps. Croatia seems to have a relatively experienced squad with most players having more then 25 caps this should stand them in good stead in the tournament if the experience is a key attribute to any good squad. Argentina caps seem to be evenly distributed across all of the range, they also seem to have the most amount of players above 100 caps.
Next, we review squad composition for the 4 teams in group D. All teams in this group seem to have varying amounts of all different departments in a team. Whats surprising is Argentina seem to have the least amount of attackers however the attackers they do have are all world class and it’s going to be difficult to fit them all in the team. Croatia seems to have the most amount of defenders which could mean they are strong defensively. Iceland and Nigeria have a similar makeup in their squads with only a slight difference in attackers and defenders.
Finally, we look at the chances of each team in the tournament based on looking at chance from implied odds. As you can see this group is expected to be pretty easy for Croatia and Argentina. Iceland and Nigeria look expected to be quite evenly matched teams but are not expected to have any impact on the group. Looking at the chances to win the competition Argentina are one of the big favourites unsurprisingly. However, it also looks like Croatia are seen as having a good outside chance so will be interesting to see how they do in the competition.
That’s it for today’s group D overview please let me know your thoughts in the comments below and check out the other previews.
Today we are going to look at group B in the World Cup. This is part of my series reviewing each squad in the World Cup in order to asses strengths and weaknesses and understand squad make up. If you haven’t seen the other Blogs go check them out group A went live yesterday and the other groups will follow over the coming days. Group B consists of Spain, Portugal, Iran and Morocco.
The first thing to look at is the age composition of the 4 squads. Interestingly Iran seem to generally have the youngest squad in the group with the lowest median. Also Spain seem to have the largest grouping around peak age between 27-30. Morocco despite having the highest median have the lowest age players in the group. Portugal have some young players but also have some of the generally older players with a lot of squad members above 30.
Looking at the experience of the players Morocco looks clearly the least experienced squad. This could be because of the high amount of lower age players compared to the other teams. Spain and Portugal have similar caps profiles with a group of inexperienced players but also complimented by a few experienced players.
The main thing that’s interesting with the squad composition of the 4 teams is that Portugal and Spain have the same composition. Is this a template the the bigger countries seem to be following? Also there is an increase in attacking players in these 4 squads compared to Group A which should mean these are all better balanced. In fact Morocco, Portugal and Spain have the same amount of attackers. With Iran having more Midfielders then any other team could this give them more options however lack of attackers could harm them if chasing a game.
Finally we look at the chances of team qualifying from the group and the chance of winning the World Cup. Qualification for the group looks like a pretty much over and done deal. Portugal and Spain look to have by far the strongest chances of qualifying from the group. This could make this group not too interesting for spectators. Portugal and Spain do play each other in the first game which if there is a loser could add extra pressure when they come to play Morocco or Iran. I’m surprised the low chance compared to Spain of Portugal winning the title. Portugal are the reigning European champions and have mercurial talent Cristiano Ronaldo. Spain however look to be one of the big favourites so it will be interesting to see how they do after last World Cups total failure.
Thats it for group B overview any questions or comments let me know or if you have any ideas of other things i should look at let me know.
Hello welcome to the first of my blogs looking at each group in the world cup. Over the next 8 blogs I hope to dissect each country’s squad and finally look at their chances of progressing and winning the cup. So today we start with group A which contains hosts Russia, Uruguay, Saudi Arabia and Egypt.
The first thing to look at is the age range of each squad in group A. All 4 teams have a median around the same area. As you can see Egypt have a 45 year old player, one of their GK’s who is the oldest player in any squad in the tournament. Uruguay and Egypt tend to have some younger players than Russia and Saudi Arabia. Saudi Arabia look to have generally one of the older squads in the tournament.
Next we look at the caps all the players in the squad have received. It’s clear Russia has the most players with the least amount of international experience. Will they struggle to cope with pressure from playing in front of home crowd. Saudi Arabia despite having the older squad of the 4 teams seems to have generally the least experienced team. Uruguay however seem to have a good balance with experience at all different levels.
Looking at each teams squad composition clearly all have the same percent GK as 3 GK is stipulated in the rules. One thing that’s clear is Egypt, Russia and Saudi Arabia have a low amount of strikers within there squads. All three have just 3 recognised strikers will this leave all them struggling to score goals. Also Egypt seem to have the lowest amount of midfielders compared to the rest of the group with an increase in defenders. This will give egypt lots of options in defence in case of injuries however could leave them exposed if they need to make changes from the bench to try and win games.
Finally we use the implied probability from the betting odds to look at the chance of each team getting out the group and winning the tournament. Overall group A seems to have no teams really capable of mounting a serious challenge for the world cup. With Russia with home advantage rated lower then Uruguay. Also it doesn’t seem to be a particularly close group for qualification the clear favourites are Uruguay and Russia. The wild card in this is Egypt if Mo Salah is fit for the tournament then expect there chances compared to Russia to increase considerably. If he isn’t expect this could be a pretty straight forward group.
Thats it for today’s look at group A please let me know your thoughts do you think any teams in this group can go far? let me know your thoughts in the comments. Group B will follow tomorrow.
Hello welcome to the next blog on this blog. If this is our first time here then please have a read of all the other blogs on here and let me know your thoughts anything I havent spotted or things you want looked at. Today we are going to look at the performances of the Middlesbrough first team throughout the season.
The data used for this I have used the rating each player gets on whoscored.com. Overall it was a season to forget for Middlesbrough. Ahead of the season the chairman had promised they would smash the league and £40 million spent in the transfer market seemed to suggest that could be possible. However they ended up 25 points behind winners Wolves and easily got knocked out by Aston Villa in the play off semi final. The idea here is to look at performances over the full season look at if you can see if the change of managers had an effect, did performances improve? Also which areas of the team generally performed well which areas didn’t which might provide insight where the team could be improved in the transfer market.
Above you can see box plots for each league game of the season including the play offs. Generally the team played better in the wins then defeats. Hows that for an earth shattering conclusion! What is interesting is the team had two managers during the season and it does look like under Tony Pulis the performances were more consistent, Lets look at this in more detail…..
Now lets look at performances under both managers. The density plot above shows there really wasn’t much difference. The players generally performed at the same level under both managers however Pulis seemed to be able to get more when it comes to ratings above 8.
25 different players started a game for the team this season with one player the clear outstanding performer. Adama Traore. However Traore also has the largest spread of performances showing he can be an inconsistent performer. Also it seems generally attacking players are more consistent performers. What will be disappointing is Ben Gibson seems to be overall the worst performing defender in the team. In midfield It looks pretty close between Adam Clayton and Jonny Howson for the best median perofrmances however clayton looks to be much more consistent.
Overall its interesting to review the the players performances over the season. It could be interesting to further stretch this to look at other teams or look at previous seasons for specific players. Also it could be further drilled down into home and away performances. Let me know your thoughts or if you have any questions really would like to hear from you.