#TidyTuesday 1 – The best City for a Starbucks Crawl!

Hello welcome to this blog looking at what i have learned from looking at this weeks tidy Tuesday data set. If you need some background behind tidy Tuesday its a community initiative from the R4DS online learning community. If you want to get involved please look it up on twitter and join in. This week it was looking at a data set with shop location for three coffee chains: Starbucks, Dunkin Donuts and Tim Hortons. I decided to focus on Starbucks as that was more world wide rather then USA/ Canada centric.

carbon (1) Above you can see all the code I wrote for this analysis. I did add another small data set I created with cities population and size in km2.

starbucks by country

The first part I looked at was number of Starbucks by country. As you can see Starbucks have by far the most stores in the US. This isnt too suprising since the chan started in that country and its a big country. What is surprising is out of the top 5 3 of the countries are in Asia. Great Britain leads with the most amount of Coffee shops in Europe.

distributionstar

For the top twenty countries by number of Starbucks I also looked how the ownership type broke down. What surprised me was the low amount lof franchise ownership (only seen in France and UK). Also joint ownership seems to be employed in East Asia.

cafes by city

Next I looked at cities and ffound the top twenty cities with starbucks shops in. Note these are excluding the Chinese and Korean cities as they came up in symbols in the dataset and i couldnt work out which city they were. New York holds the record for the most Starbucks followed by London. I did question after this how size and population of each city effects it.

shops per kmpop

Finally we look at the density of Starbucks and the number per population in a city. If you don’t want to walk yourself far between coffees go to Vancouver. It averages over 1.25 shops per km squared. They also will not be too busy either with the second lowest population per cafe.  If you want to do a Starbucks crawl go to Vancouver! Also Vancouver looks to be the outlier when it comes to coffee shops per km squared with most cities less then 0.5 cafes per km squared. Is this something Starbucks aim for so the market isn’t saturated? If you have any comments thoughts please let me know would love to hear your views on this.

Advertisements

Home Secretary the Poison Chalice?

Hello welcome to today’s blog. We are going to be looking to see if Home Secretary is the poison chalice job it is made out to be in the media. Recently Amber Rudd was forced to resign from the job due to being found to have lied to parliament. Many political commentators following it commented it being the hardest job in government and the apparent high turnover in occupants. I thought rather then take there word for it it could be tested with readily available data. I created my own data set from the last 100 years or so with the number of incumbents to the 4 great offices of state. Looking at the number of days they served in the role. I didn’t include anyone who died in the job as that’s nothing to do with difficulty of the job.

home sec

The first thing to look at is the number of holders of the 4 great offices of state since 1916. Clearly the “safest” job looks to be Prime Minister. This I think is because clearly the Prime Minister is responsible for hiring and firing the other three jobs and possibly Prime Ministers will often push incumbents out of those jobs in order to protect themselves. Also when we get back to the main question we were asking at the start of this blog then Home Secretary has had the most incumbents in the last 100 years suggesting there is a higher turnover then other jobs. However Chancellor and Foreign Secretary are no too far behind.

distribution

The plot above showing the distribution of days in office with the mean plotted as a black dot. Clearly the Prime Minister has the highest mean number of days in office but as you can see from the general spread its broadly similar to the other three jobs however it has been dragged up by the two outliers (Thatcher and Blair). The other three jobs have very similar means however home secretary does have the lowest. The general distribution though is similar to Foreign Secretary and Chancellor. Therefore it could be small sample size that is effecting the result. Looking at this i definitely don’t think its as clear the press make out.

trend

Finally we look at the general trend over the last 100 years for each of the 4 great offices of state. Overall you can see that generally Prime Minister and Chancellor times in office are increasing. Possibly because in the last 20 years there have been two Prime Ministers that have aligned themselves closely with their chancellors. Foreign and home secretaries however have not changed and there tenures have stayed around the same levels over the last 100 years.

In conclusion I don’t think its clear that home Secretary is the worst job in government however it does seem they spend generally shorter in position then other 3 great offices of state. What’s surprising is Foreign Secretary is pretty similar to Home Secretary when its a lot small area to cover and a lot less that can go wrong. Maybe its easy to move the Foreign Secretary around in a re shuffle. Thanks for reading this blog if you enjoyed and want to see more please let me know and give the blog a follow so you can see when I post a new blog.

Formula 1 – The Competitive Picture

Hello welcome to this blog and today we are going to look at something we havnt looked at yet in this blog. Formula 1. I have watched F1 since 1997 and often wondered when ever they say we reviewed the data, what exactly the data they review and what process they use to review it. Now sadly I don’t have access to anything like the data F1 teams have (one day maybe!) however the main piece of data is freely available. The qualifying time. I decided I wanted to have a look at the competitive picture and now we are 4 races in that’s a decent sample size.

So to do this analysis I took each drivers fastest lap for the 4 qualifying sessions so far. I then added it all up to get each drivers qualifying time. The result plotted the below graph:

F11

So after 4 races Vettel has the lowest total qualifying time, closely followed by Hamilton. What is clear from this is the large gap between the top 6 drivers from the top 3 teams and the rest. Also apart from Ferrari and Mercedes being mixed up every other team is 2 by 2. This is surprising considering the small gaps between teams in the midfield. The next question I had was differences between team mates as in formula 1 your main rival to beat is always your team mate.

f12

The graph above shows the difference between each teams drivers with points at the top right smaller difference then at the bottom left. The team with the clearly the closest matched drivers are Red Bull with 0.07 seconds between them. This is good news for Ricciardo in particular who can use this information to increase his value in his contract talks. At the other end there is big pressure on Stoffel Vandoorne and Kimi Raikkonen. Both have been over a second in total behind there team mates which if it carries on could see them losing their seats.

I’m going to keep this dataset up to date as the season goes on and I have similar information for total race time. I think there’s more information you can derive by this such as whose developing their car the best. Please let me know your thoughts or if you have any questions i like to hear feedback.

Premier League Wages Stalling?

Hello this is going to be a shorter blog then normal I just felt I had to share the early findings. Inspired by the R4DS online learning community recent tidy Tuesday article in which we looked at a dataset which had the wages of various positions in the NFL. Reviewing it showed that while some positions salary was increasing at a high rate, others it was shown were not growing at all.

I decided to look at wages on in the English premier league. I got the data from the same website which had the wages for all players for every year from 2013 to the current year. I took that data and plotted the graph below which shows the wages for the top 50 players in each position.

wage2

Now despite all the money going into the league now with the latest increases in TV deal money. The wages for all players seems to staying at the same level. This shocked me and I can only think of couple reasons as to why:

  • The increase in TV money has been spent on things other then wages – transfer fees or gone to the owners of the clubs
  • If higher wages have been paid it has gone to the less skilled lower paid players

Stay tuned I have an few ideas how we can review this further and come up with some ideas if this is true.

Freelance Before 30 Blog 1

Hello and welcome to the start of a new series on my blog. The idea behind is that this is going to be a long running series on the blog for at least the next year and a half documenting going from an rstats novice to the fully fledged freelance data scientist. The background behind this is since finishing university 7 or 8 years ago I have been stuck in the corporate working environment and all the restrictions that entails. An environment which is very comfortable and recently I received a nice pay rise. However since learning the existence of R 3 years ago I have always felt its something interesting and wondered what the possibilities are.

Over the intervening years I made numerous attempts to start to learn it and always dropped out. This year 2018 I decided i’m actually going to apply my self and see where I can get to. I currently work as an Analyst for a utility company who have numerous renewable generation site. I really enjoy it however I think I can be challenged more. I was discussing with my significant other and we came up the the title freelance before 30. So this is my journey over the next 19 or so months.

Today we are going to look at my current progress with the Datacamp course I have been working on since the turn of the year. I had struggled to find ways to get into R and I think learning is best done not just in 1 way. So I use datacamp, i’m also a member of R4DS online learning community and I read a lot on the internet. course progress

I’m following the data science with R track on Datacamp which has 23 modules in total. Up to now I have completed 12 and the aim is to complete by end of June. The graph above shows my percentage scores for all 3 modules I have completed. I worked out the scores using what actual experience I gained against what was available for the exercise.  I’m happy with the high scores for data visualisation in ggplot as that’s the output that everyone sees. I can see though I need to do more work on the background coding aspects with intermediate R practice my lowest score. There also seems to be a bit of a split 7 modules I scored in the high 70’s and above and the other 5 which have a score around the 60’s. It would be interesting to see if there is anything that links the lower scores as that clearly something to work on if i am to get better.

modules

Again in the summary by chapter you can see my weakness in the background programming in particularly loops. So if anyone has any ideas how i can get better at that please let me know. This series will be updated every few weeks with my progress obviously its not just about finishing the Datacamp course as that doesn’t make you a data scientist. As always any comments or thoughts please let me know or anything you want to see on the blog. Please follow so you can see when i post new blogs.