Hello welcome to the first of my blogs looking at each group in the world cup. Over the next 8 blogs I hope to dissect each country’s squad and finally look at their chances of progressing and winning the cup. So today we start with group A which contains hosts Russia, Uruguay, Saudi Arabia and Egypt.
The first thing to look at is the age range of each squad in group A. All 4 teams have a median around the same area. As you can see Egypt have a 45 year old player, one of their GK’s who is the oldest player in any squad in the tournament. Uruguay and Egypt tend to have some younger players than Russia and Saudi Arabia. Saudi Arabia look to have generally one of the older squads in the tournament.
Next we look at the caps all the players in the squad have received. It’s clear Russia has the most players with the least amount of international experience. Will they struggle to cope with pressure from playing in front of home crowd. Saudi Arabia despite having the older squad of the 4 teams seems to have generally the least experienced team. Uruguay however seem to have a good balance with experience at all different levels.
Looking at each teams squad composition clearly all have the same percent GK as 3 GK is stipulated in the rules. One thing that’s clear is Egypt, Russia and Saudi Arabia have a low amount of strikers within there squads. All three have just 3 recognised strikers will this leave all them struggling to score goals. Also Egypt seem to have the lowest amount of midfielders compared to the rest of the group with an increase in defenders. This will give egypt lots of options in defence in case of injuries however could leave them exposed if they need to make changes from the bench to try and win games.
Finally we use the implied probability from the betting odds to look at the chance of each team getting out the group and winning the tournament. Overall group A seems to have no teams really capable of mounting a serious challenge for the world cup. With Russia with home advantage rated lower then Uruguay. Also it doesn’t seem to be a particularly close group for qualification the clear favourites are Uruguay and Russia. The wild card in this is Egypt if Mo Salah is fit for the tournament then expect there chances compared to Russia to increase considerably. If he isn’t expect this could be a pretty straight forward group.
Thats it for today’s look at group A please let me know your thoughts do you think any teams in this group can go far? let me know your thoughts in the comments. Group B will follow tomorrow.
Hello welcome to the next blog on this blog. If this is our first time here then please have a read of all the other blogs on here and let me know your thoughts anything I havent spotted or things you want looked at. Today we are going to look at the performances of the Middlesbrough first team throughout the season.
The data used for this I have used the rating each player gets on whoscored.com. Overall it was a season to forget for Middlesbrough. Ahead of the season the chairman had promised they would smash the league and £40 million spent in the transfer market seemed to suggest that could be possible. However they ended up 25 points behind winners Wolves and easily got knocked out by Aston Villa in the play off semi final. The idea here is to look at performances over the full season look at if you can see if the change of managers had an effect, did performances improve? Also which areas of the team generally performed well which areas didn’t which might provide insight where the team could be improved in the transfer market.
Above you can see box plots for each league game of the season including the play offs. Generally the team played better in the wins then defeats. Hows that for an earth shattering conclusion! What is interesting is the team had two managers during the season and it does look like under Tony Pulis the performances were more consistent, Lets look at this in more detail…..
Now lets look at performances under both managers. The density plot above shows there really wasn’t much difference. The players generally performed at the same level under both managers however Pulis seemed to be able to get more when it comes to ratings above 8.
25 different players started a game for the team this season with one player the clear outstanding performer. Adama Traore. However Traore also has the largest spread of performances showing he can be an inconsistent performer. Also it seems generally attacking players are more consistent performers. What will be disappointing is Ben Gibson seems to be overall the worst performing defender in the team. In midfield It looks pretty close between Adam Clayton and Jonny Howson for the best median perofrmances however clayton looks to be much more consistent.
Overall its interesting to review the the players performances over the season. It could be interesting to further stretch this to look at other teams or look at previous seasons for specific players. Also it could be further drilled down into home and away performances. Let me know your thoughts or if you have any questions really would like to hear from you.
Hello welcome to today’s blog. We are going to be looking to see if Home Secretary is the poison chalice job it is made out to be in the media. Recently Amber Rudd was forced to resign from the job due to being found to have lied to parliament. Many political commentators following it commented it being the hardest job in government and the apparent high turnover in occupants. I thought rather then take there word for it it could be tested with readily available data. I created my own data set from the last 100 years or so with the number of incumbents to the 4 great offices of state. Looking at the number of days they served in the role. I didn’t include anyone who died in the job as that’s nothing to do with difficulty of the job.
The first thing to look at is the number of holders of the 4 great offices of state since 1916. Clearly the “safest” job looks to be Prime Minister. This I think is because clearly the Prime Minister is responsible for hiring and firing the other three jobs and possibly Prime Ministers will often push incumbents out of those jobs in order to protect themselves. Also when we get back to the main question we were asking at the start of this blog then Home Secretary has had the most incumbents in the last 100 years suggesting there is a higher turnover then other jobs. However Chancellor and Foreign Secretary are no too far behind.
The plot above showing the distribution of days in office with the mean plotted as a black dot. Clearly the Prime Minister has the highest mean number of days in office but as you can see from the general spread its broadly similar to the other three jobs however it has been dragged up by the two outliers (Thatcher and Blair). The other three jobs have very similar means however home secretary does have the lowest. The general distribution though is similar to Foreign Secretary and Chancellor. Therefore it could be small sample size that is effecting the result. Looking at this i definitely don’t think its as clear the press make out.
Finally we look at the general trend over the last 100 years for each of the 4 great offices of state. Overall you can see that generally Prime Minister and Chancellor times in office are increasing. Possibly because in the last 20 years there have been two Prime Ministers that have aligned themselves closely with their chancellors. Foreign and home secretaries however have not changed and there tenures have stayed around the same levels over the last 100 years.
In conclusion I don’t think its clear that home Secretary is the worst job in government however it does seem they spend generally shorter in position then other 3 great offices of state. What’s surprising is Foreign Secretary is pretty similar to Home Secretary when its a lot small area to cover and a lot less that can go wrong. Maybe its easy to move the Foreign Secretary around in a re shuffle. Thanks for reading this blog if you enjoyed and want to see more please let me know and give the blog a follow so you can see when I post a new blog.
Hello welcome to this blog and today we are going to look at something we havnt looked at yet in this blog. Formula 1. I have watched F1 since 1997 and often wondered when ever they say we reviewed the data, what exactly the data they review and what process they use to review it. Now sadly I don’t have access to anything like the data F1 teams have (one day maybe!) however the main piece of data is freely available. The qualifying time. I decided I wanted to have a look at the competitive picture and now we are 4 races in that’s a decent sample size.
So to do this analysis I took each drivers fastest lap for the 4 qualifying sessions so far. I then added it all up to get each drivers qualifying time. The result plotted the below graph:
So after 4 races Vettel has the lowest total qualifying time, closely followed by Hamilton. What is clear from this is the large gap between the top 6 drivers from the top 3 teams and the rest. Also apart from Ferrari and Mercedes being mixed up every other team is 2 by 2. This is surprising considering the small gaps between teams in the midfield. The next question I had was differences between team mates as in formula 1 your main rival to beat is always your team mate.
The graph above shows the difference between each teams drivers with points at the top right smaller difference then at the bottom left. The team with the clearly the closest matched drivers are Red Bull with 0.07 seconds between them. This is good news for Ricciardo in particular who can use this information to increase his value in his contract talks. At the other end there is big pressure on Stoffel Vandoorne and Kimi Raikkonen. Both have been over a second in total behind there team mates which if it carries on could see them losing their seats.
I’m going to keep this dataset up to date as the season goes on and I have similar information for total race time. I think there’s more information you can derive by this such as whose developing their car the best. Please let me know your thoughts or if you have any questions i like to hear feedback.
Hello this is going to be a shorter blog then normal I just felt I had to share the early findings. Inspired by the R4DS online learning community recent tidy Tuesday article in which we looked at a dataset which had the wages of various positions in the NFL. Reviewing it showed that while some positions salary was increasing at a high rate, others it was shown were not growing at all.
I decided to look at wages on in the English premier league. I got the data from the same website which had the wages for all players for every year from 2013 to the current year. I took that data and plotted the graph below which shows the wages for the top 50 players in each position.
Now despite all the money going into the league now with the latest increases in TV deal money. The wages for all players seems to staying at the same level. This shocked me and I can only think of couple reasons as to why:
- The increase in TV money has been spent on things other then wages – transfer fees or gone to the owners of the clubs
- If higher wages have been paid it has gone to the less skilled lower paid players
Stay tuned I have an few ideas how we can review this further and come up with some ideas if this is true.
Hello and welcome to the start of a new series on my blog. The idea behind is that this is going to be a long running series on the blog for at least the next year and a half documenting going from an rstats novice to the fully fledged freelance data scientist. The background behind this is since finishing university 7 or 8 years ago I have been stuck in the corporate working environment and all the restrictions that entails. An environment which is very comfortable and recently I received a nice pay rise. However since learning the existence of R 3 years ago I have always felt its something interesting and wondered what the possibilities are.
Over the intervening years I made numerous attempts to start to learn it and always dropped out. This year 2018 I decided i’m actually going to apply my self and see where I can get to. I currently work as an Analyst for a utility company who have numerous renewable generation site. I really enjoy it however I think I can be challenged more. I was discussing with my significant other and we came up the the title freelance before 30. So this is my journey over the next 19 or so months.
Today we are going to look at my current progress with the Datacamp course I have been working on since the turn of the year. I had struggled to find ways to get into R and I think learning is best done not just in 1 way. So I use datacamp, i’m also a member of R4DS online learning community and I read a lot on the internet.
I’m following the data science with R track on Datacamp which has 23 modules in total. Up to now I have completed 12 and the aim is to complete by end of June. The graph above shows my percentage scores for all 3 modules I have completed. I worked out the scores using what actual experience I gained against what was available for the exercise. I’m happy with the high scores for data visualisation in ggplot as that’s the output that everyone sees. I can see though I need to do more work on the background coding aspects with intermediate R practice my lowest score. There also seems to be a bit of a split 7 modules I scored in the high 70’s and above and the other 5 which have a score around the 60’s. It would be interesting to see if there is anything that links the lower scores as that clearly something to work on if i am to get better.
Again in the summary by chapter you can see my weakness in the background programming in particularly loops. So if anyone has any ideas how i can get better at that please let me know. This series will be updated every few weeks with my progress obviously its not just about finishing the Datacamp course as that doesn’t make you a data scientist. As always any comments or thoughts please let me know or anything you want to see on the blog. Please follow so you can see when i post new blogs.