Middlesbrough Performance Review

Hello welcome to the next blog on this blog. If this is our first time here then please have a read of all the other blogs on here and let me know your thoughts anything I havent spotted or things you want looked at. Today we are going to look at the performances of the Middlesbrough first team throughout the season.

The data used for this I have used the rating each player gets on whoscored.com. Overall it was a season to forget for Middlesbrough. Ahead of the season the chairman had promised they would smash the league and £40 million spent in the transfer market seemed to suggest that could be possible. However they ended up 25 points behind winners Wolves and easily got knocked out by Aston Villa in the play off semi final. The idea here is to look at performances over the full season look at if you can see if the change of managers had an effect, did performances improve? Also which areas of the team generally performed well which areas didn’t which might provide insight where the team could be improved in the transfer market.

season perf

Above you can see box plots for each league game of the season including the play offs. Generally the team played better in the wins then defeats. Hows that for an earth shattering conclusion! What is interesting is the team had two managers during the season and it does look like under Tony Pulis the performances were more consistent, Lets look at this in more detail…..

density plot

Now lets look at performances under both managers. The density plot above shows there really wasn’t much difference. The players generally performed at the same level under both managers however Pulis seemed to be able to get more when it comes to ratings above 8.


25 different players started a game for the team this season with one player the clear outstanding performer. Adama Traore. However Traore also has the largest spread of performances showing he can be an inconsistent performer. Also it seems generally attacking players are more consistent performers. What will be disappointing is Ben Gibson seems to be overall the worst performing defender in the team. In midfield It looks pretty close between Adam Clayton and Jonny Howson for the best median perofrmances however clayton looks to be much more consistent.

Overall its interesting to review the the players performances over the season. It could be interesting to further stretch this to look at other teams or look at previous seasons for specific players. Also it could be further drilled down into home and away performances. Let me know your thoughts or if you have any questions really would like to hear from you.


#TidyTuesday 1 – The best City for a Starbucks Crawl!

Hello welcome to this blog looking at what i have learned from looking at this weeks tidy Tuesday data set. If you need some background behind tidy Tuesday its a community initiative from the R4DS online learning community. If you want to get involved please look it up on twitter and join in. This week it was looking at a data set with shop location for three coffee chains: Starbucks, Dunkin Donuts and Tim Hortons. I decided to focus on Starbucks as that was more world wide rather then USA/ Canada centric.

carbon (1) Above you can see all the code I wrote for this analysis. I did add another small data set I created with cities population and size in km2.

starbucks by country

The first part I looked at was number of Starbucks by country. As you can see Starbucks have by far the most stores in the US. This isnt too suprising since the chan started in that country and its a big country. What is surprising is out of the top 5 3 of the countries are in Asia. Great Britain leads with the most amount of Coffee shops in Europe.


For the top twenty countries by number of Starbucks I also looked how the ownership type broke down. What surprised me was the low amount lof franchise ownership (only seen in France and UK). Also joint ownership seems to be employed in East Asia.

cafes by city

Next I looked at cities and ffound the top twenty cities with starbucks shops in. Note these are excluding the Chinese and Korean cities as they came up in symbols in the dataset and i couldnt work out which city they were. New York holds the record for the most Starbucks followed by London. I did question after this how size and population of each city effects it.

shops per kmpop

Finally we look at the density of Starbucks and the number per population in a city. If you don’t want to walk yourself far between coffees go to Vancouver. It averages over 1.25 shops per km squared. They also will not be too busy either with the second lowest population per cafe.  If you want to do a Starbucks crawl go to Vancouver! Also Vancouver looks to be the outlier when it comes to coffee shops per km squared with most cities less then 0.5 cafes per km squared. Is this something Starbucks aim for so the market isn’t saturated? If you have any comments thoughts please let me know would love to hear your views on this.

Home Secretary the Poison Chalice?

Hello welcome to today’s blog. We are going to be looking to see if Home Secretary is the poison chalice job it is made out to be in the media. Recently Amber Rudd was forced to resign from the job due to being found to have lied to parliament. Many political commentators following it commented it being the hardest job in government and the apparent high turnover in occupants. I thought rather then take there word for it it could be tested with readily available data. I created my own data set from the last 100 years or so with the number of incumbents to the 4 great offices of state. Looking at the number of days they served in the role. I didn’t include anyone who died in the job as that’s nothing to do with difficulty of the job.

home sec

The first thing to look at is the number of holders of the 4 great offices of state since 1916. Clearly the “safest” job looks to be Prime Minister. This I think is because clearly the Prime Minister is responsible for hiring and firing the other three jobs and possibly Prime Ministers will often push incumbents out of those jobs in order to protect themselves. Also when we get back to the main question we were asking at the start of this blog then Home Secretary has had the most incumbents in the last 100 years suggesting there is a higher turnover then other jobs. However Chancellor and Foreign Secretary are no too far behind.


The plot above showing the distribution of days in office with the mean plotted as a black dot. Clearly the Prime Minister has the highest mean number of days in office but as you can see from the general spread its broadly similar to the other three jobs however it has been dragged up by the two outliers (Thatcher and Blair). The other three jobs have very similar means however home secretary does have the lowest. The general distribution though is similar to Foreign Secretary and Chancellor. Therefore it could be small sample size that is effecting the result. Looking at this i definitely don’t think its as clear the press make out.


Finally we look at the general trend over the last 100 years for each of the 4 great offices of state. Overall you can see that generally Prime Minister and Chancellor times in office are increasing. Possibly because in the last 20 years there have been two Prime Ministers that have aligned themselves closely with their chancellors. Foreign and home secretaries however have not changed and there tenures have stayed around the same levels over the last 100 years.

In conclusion I don’t think its clear that home Secretary is the worst job in government however it does seem they spend generally shorter in position then other 3 great offices of state. What’s surprising is Foreign Secretary is pretty similar to Home Secretary when its a lot small area to cover and a lot less that can go wrong. Maybe its easy to move the Foreign Secretary around in a re shuffle. Thanks for reading this blog if you enjoyed and want to see more please let me know and give the blog a follow so you can see when I post a new blog.

Formula 1 – The Competitive Picture

Hello welcome to this blog and today we are going to look at something we havnt looked at yet in this blog. Formula 1. I have watched F1 since 1997 and often wondered when ever they say we reviewed the data, what exactly the data they review and what process they use to review it. Now sadly I don’t have access to anything like the data F1 teams have (one day maybe!) however the main piece of data is freely available. The qualifying time. I decided I wanted to have a look at the competitive picture and now we are 4 races in that’s a decent sample size.

So to do this analysis I took each drivers fastest lap for the 4 qualifying sessions so far. I then added it all up to get each drivers qualifying time. The result plotted the below graph:


So after 4 races Vettel has the lowest total qualifying time, closely followed by Hamilton. What is clear from this is the large gap between the top 6 drivers from the top 3 teams and the rest. Also apart from Ferrari and Mercedes being mixed up every other team is 2 by 2. This is surprising considering the small gaps between teams in the midfield. The next question I had was differences between team mates as in formula 1 your main rival to beat is always your team mate.


The graph above shows the difference between each teams drivers with points at the top right smaller difference then at the bottom left. The team with the clearly the closest matched drivers are Red Bull with 0.07 seconds between them. This is good news for Ricciardo in particular who can use this information to increase his value in his contract talks. At the other end there is big pressure on Stoffel Vandoorne and Kimi Raikkonen. Both have been over a second in total behind there team mates which if it carries on could see them losing their seats.

I’m going to keep this dataset up to date as the season goes on and I have similar information for total race time. I think there’s more information you can derive by this such as whose developing their car the best. Please let me know your thoughts or if you have any questions i like to hear feedback.

Premier League Wages Stalling?

Hello this is going to be a shorter blog then normal I just felt I had to share the early findings. Inspired by the R4DS online learning community recent tidy Tuesday article in which we looked at a dataset which had the wages of various positions in the NFL. Reviewing it showed that while some positions salary was increasing at a high rate, others it was shown were not growing at all.

I decided to look at wages on in the English premier league. I got the data from the same website which had the wages for all players for every year from 2013 to the current year. I took that data and plotted the graph below which shows the wages for the top 50 players in each position.


Now despite all the money going into the league now with the latest increases in TV deal money. The wages for all players seems to staying at the same level. This shocked me and I can only think of couple reasons as to why:

  • The increase in TV money has been spent on things other then wages – transfer fees or gone to the owners of the clubs
  • If higher wages have been paid it has gone to the less skilled lower paid players

Stay tuned I have an few ideas how we can review this further and come up with some ideas if this is true.

Freelance Before 30 Blog 1

Hello and welcome to the start of a new series on my blog. The idea behind is that this is going to be a long running series on the blog for at least the next year and a half documenting going from an rstats novice to the fully fledged freelance data scientist. The background behind this is since finishing university 7 or 8 years ago I have been stuck in the corporate working environment and all the restrictions that entails. An environment which is very comfortable and recently I received a nice pay rise. However since learning the existence of R 3 years ago I have always felt its something interesting and wondered what the possibilities are.

Over the intervening years I made numerous attempts to start to learn it and always dropped out. This year 2018 I decided i’m actually going to apply my self and see where I can get to. I currently work as an Analyst for a utility company who have numerous renewable generation site. I really enjoy it however I think I can be challenged more. I was discussing with my significant other and we came up the the title freelance before 30. So this is my journey over the next 19 or so months.

Today we are going to look at my current progress with the Datacamp course I have been working on since the turn of the year. I had struggled to find ways to get into R and I think learning is best done not just in 1 way. So I use datacamp, i’m also a member of R4DS online learning community and I read a lot on the internet. course progress

I’m following the data science with R track on Datacamp which has 23 modules in total. Up to now I have completed 12 and the aim is to complete by end of June. The graph above shows my percentage scores for all 3 modules I have completed. I worked out the scores using what actual experience I gained against what was available for the exercise.  I’m happy with the high scores for data visualisation in ggplot as that’s the output that everyone sees. I can see though I need to do more work on the background coding aspects with intermediate R practice my lowest score. There also seems to be a bit of a split 7 modules I scored in the high 70’s and above and the other 5 which have a score around the 60’s. It would be interesting to see if there is anything that links the lower scores as that clearly something to work on if i am to get better.


Again in the summary by chapter you can see my weakness in the background programming in particularly loops. So if anyone has any ideas how i can get better at that please let me know. This series will be updated every few weeks with my progress obviously its not just about finishing the Datacamp course as that doesn’t make you a data scientist. As always any comments or thoughts please let me know or anything you want to see on the blog. Please follow so you can see when i post new blogs.


EFL Championship Win Odds

Hello welcome to another blog this time looking at win odds in the championship so far this year and comparing each teams odds. The aim is to review the data and see if their are any trends we can spot. To get the data i downloaded the raw CSV from the football data. The CSV is available on their website for free and contains lots of other interesting information.


The summary above shows each team in the sky bet championship with the home and away game odds plotted. The big thing to take away is the spread for some teams. If you look at Wolves they were generally well fancied in their home and away games. Burton however even in their well fancied games at home they were still less fancied then other teams at home. Also the better team the more overlapping of home and away odds.

This slideshow requires JavaScript.

I have now updated the graphs to focus just home and away games. The home games again Wolves generally have the lower odds for home games. The only team that has some odd close the Wolves is Aston Villa who are obviously a well fancied home team. The biggest surprise for me is that despite Burton having clearly the higher odds then any other team in the division they dont hole the least fancied odds for a home team. That accolade goes to Barnsley. A similar pattern is seen with the away odds, obviously they are generally higher then the home odds.  This data seems ot suggest that the better teams have both lower odds and smaller grouping of odds. Also this could be a way to review how closely matched a league is the more spread out the odds the closer the teams are in terms of quality.

home wins

The graph above compares the number of home wins for a team against their average odds to win. As expected the lower the average home win odds the more home wins a team has got. However there are some outliers which are interesting. The two big overachievers when looking at the bookies odds are Cardiff and Bolton. Cardiff look like they should have similar amount of wins to the teams in the playoff mix and Bolton look like they should have theoretically the second lowest home wins in the league. Underachievers look to be maybe Brentford and Norwich though there seems to be more teams Overachieving then Underachieving.

away wins

Finally away wins shows the same trend however this time there are clear teams at the bottom and top showing how much harder it is to win away from home. The big overachiever away from home is Burton which suggests they play well when teams underestimate them. A team which has under achieved is Middlesbrough who look to have been expected to get more then 10 wins away this season but have only 7.