Hello welcome to this blog and today we are going to look at something we havnt looked at yet in this blog. Formula 1. I have watched F1 since 1997 and often wondered when ever they say we reviewed the data, what exactly the data they review and what process they use to review it. Now sadly I don’t have access to anything like the data F1 teams have (one day maybe!) however the main piece of data is freely available. The qualifying time. I decided I wanted to have a look at the competitive picture and now we are 4 races in that’s a decent sample size.
So to do this analysis I took each drivers fastest lap for the 4 qualifying sessions so far. I then added it all up to get each drivers qualifying time. The result plotted the below graph:
So after 4 races Vettel has the lowest total qualifying time, closely followed by Hamilton. What is clear from this is the large gap between the top 6 drivers from the top 3 teams and the rest. Also apart from Ferrari and Mercedes being mixed up every other team is 2 by 2. This is surprising considering the small gaps between teams in the midfield. The next question I had was differences between team mates as in formula 1 your main rival to beat is always your team mate.
The graph above shows the difference between each teams drivers with points at the top right smaller difference then at the bottom left. The team with the clearly the closest matched drivers are Red Bull with 0.07 seconds between them. This is good news for Ricciardo in particular who can use this information to increase his value in his contract talks. At the other end there is big pressure on Stoffel Vandoorne and Kimi Raikkonen. Both have been over a second in total behind there team mates which if it carries on could see them losing their seats.
I’m going to keep this dataset up to date as the season goes on and I have similar information for total race time. I think there’s more information you can derive by this such as whose developing their car the best. Please let me know your thoughts or if you have any questions i like to hear feedback.
Hello this is going to be a shorter blog then normal I just felt I had to share the early findings. Inspired by the R4DS online learning community recent tidy Tuesday article in which we looked at a dataset which had the wages of various positions in the NFL. Reviewing it showed that while some positions salary was increasing at a high rate, others it was shown were not growing at all.
I decided to look at wages on in the English premier league. I got the data from the same website which had the wages for all players for every year from 2013 to the current year. I took that data and plotted the graph below which shows the wages for the top 50 players in each position.
Now despite all the money going into the league now with the latest increases in TV deal money. The wages for all players seems to staying at the same level. This shocked me and I can only think of couple reasons as to why:
- The increase in TV money has been spent on things other then wages – transfer fees or gone to the owners of the clubs
- If higher wages have been paid it has gone to the less skilled lower paid players
Stay tuned I have an few ideas how we can review this further and come up with some ideas if this is true.
Hello and welcome to the start of a new series on my blog. The idea behind is that this is going to be a long running series on the blog for at least the next year and a half documenting going from an rstats novice to the fully fledged freelance data scientist. The background behind this is since finishing university 7 or 8 years ago I have been stuck in the corporate working environment and all the restrictions that entails. An environment which is very comfortable and recently I received a nice pay rise. However since learning the existence of R 3 years ago I have always felt its something interesting and wondered what the possibilities are.
Over the intervening years I made numerous attempts to start to learn it and always dropped out. This year 2018 I decided i’m actually going to apply my self and see where I can get to. I currently work as an Analyst for a utility company who have numerous renewable generation site. I really enjoy it however I think I can be challenged more. I was discussing with my significant other and we came up the the title freelance before 30. So this is my journey over the next 19 or so months.
Today we are going to look at my current progress with the Datacamp course I have been working on since the turn of the year. I had struggled to find ways to get into R and I think learning is best done not just in 1 way. So I use datacamp, i’m also a member of R4DS online learning community and I read a lot on the internet.
I’m following the data science with R track on Datacamp which has 23 modules in total. Up to now I have completed 12 and the aim is to complete by end of June. The graph above shows my percentage scores for all 3 modules I have completed. I worked out the scores using what actual experience I gained against what was available for the exercise. I’m happy with the high scores for data visualisation in ggplot as that’s the output that everyone sees. I can see though I need to do more work on the background coding aspects with intermediate R practice my lowest score. There also seems to be a bit of a split 7 modules I scored in the high 70’s and above and the other 5 which have a score around the 60’s. It would be interesting to see if there is anything that links the lower scores as that clearly something to work on if i am to get better.
Again in the summary by chapter you can see my weakness in the background programming in particularly loops. So if anyone has any ideas how i can get better at that please let me know. This series will be updated every few weeks with my progress obviously its not just about finishing the Datacamp course as that doesn’t make you a data scientist. As always any comments or thoughts please let me know or anything you want to see on the blog. Please follow so you can see when i post new blogs.