Hello welcome to this blog and today we are going to look at something we havnt looked at yet in this blog. Formula 1. I have watched F1 since 1997 and often wondered when ever they say we reviewed the data, what exactly the data they review and what process they use to review it. Now sadly I don’t have access to anything like the data F1 teams have (one day maybe!) however the main piece of data is freely available. The qualifying time. I decided I wanted to have a look at the competitive picture and now we are 4 races in that’s a decent sample size.
So to do this analysis I took each drivers fastest lap for the 4 qualifying sessions so far. I then added it all up to get each drivers qualifying time. The result plotted the below graph:
So after 4 races Vettel has the lowest total qualifying time, closely followed by Hamilton. What is clear from this is the large gap between the top 6 drivers from the top 3 teams and the rest. Also apart from Ferrari and Mercedes being mixed up every other team is 2 by 2. This is surprising considering the small gaps between teams in the midfield. The next question I had was differences between team mates as in formula 1 your main rival to beat is always your team mate.
The graph above shows the difference between each teams drivers with points at the top right smaller difference then at the bottom left. The team with the clearly the closest matched drivers are Red Bull with 0.07 seconds between them. This is good news for Ricciardo in particular who can use this information to increase his value in his contract talks. At the other end there is big pressure on Stoffel Vandoorne and Kimi Raikkonen. Both have been over a second in total behind there team mates which if it carries on could see them losing their seats.
I’m going to keep this dataset up to date as the season goes on and I have similar information for total race time. I think there’s more information you can derive by this such as whose developing their car the best. Please let me know your thoughts or if you have any questions i like to hear feedback.
Hello this is going to be a shorter blog then normal I just felt I had to share the early findings. Inspired by the R4DS online learning community recent tidy Tuesday article in which we looked at a dataset which had the wages of various positions in the NFL. Reviewing it showed that while some positions salary was increasing at a high rate, others it was shown were not growing at all.
I decided to look at wages on in the English premier league. I got the data from the same website which had the wages for all players for every year from 2013 to the current year. I took that data and plotted the graph below which shows the wages for the top 50 players in each position.
Now despite all the money going into the league now with the latest increases in TV deal money. The wages for all players seems to staying at the same level. This shocked me and I can only think of couple reasons as to why:
- The increase in TV money has been spent on things other then wages – transfer fees or gone to the owners of the clubs
- If higher wages have been paid it has gone to the less skilled lower paid players
Stay tuned I have an few ideas how we can review this further and come up with some ideas if this is true.
Hello and welcome to the start of a new series on my blog. The idea behind is that this is going to be a long running series on the blog for at least the next year and a half documenting going from an rstats novice to the fully fledged freelance data scientist. The background behind this is since finishing university 7 or 8 years ago I have been stuck in the corporate working environment and all the restrictions that entails. An environment which is very comfortable and recently I received a nice pay rise. However since learning the existence of R 3 years ago I have always felt its something interesting and wondered what the possibilities are.
Over the intervening years I made numerous attempts to start to learn it and always dropped out. This year 2018 I decided i’m actually going to apply my self and see where I can get to. I currently work as an Analyst for a utility company who have numerous renewable generation site. I really enjoy it however I think I can be challenged more. I was discussing with my significant other and we came up the the title freelance before 30. So this is my journey over the next 19 or so months.
Today we are going to look at my current progress with the Datacamp course I have been working on since the turn of the year. I had struggled to find ways to get into R and I think learning is best done not just in 1 way. So I use datacamp, i’m also a member of R4DS online learning community and I read a lot on the internet.
I’m following the data science with R track on Datacamp which has 23 modules in total. Up to now I have completed 12 and the aim is to complete by end of June. The graph above shows my percentage scores for all 3 modules I have completed. I worked out the scores using what actual experience I gained against what was available for the exercise. I’m happy with the high scores for data visualisation in ggplot as that’s the output that everyone sees. I can see though I need to do more work on the background coding aspects with intermediate R practice my lowest score. There also seems to be a bit of a split 7 modules I scored in the high 70’s and above and the other 5 which have a score around the 60’s. It would be interesting to see if there is anything that links the lower scores as that clearly something to work on if i am to get better.
Again in the summary by chapter you can see my weakness in the background programming in particularly loops. So if anyone has any ideas how i can get better at that please let me know. This series will be updated every few weeks with my progress obviously its not just about finishing the Datacamp course as that doesn’t make you a data scientist. As always any comments or thoughts please let me know or anything you want to see on the blog. Please follow so you can see when i post new blogs.
Hello welcome to another blog this time looking at win odds in the championship so far this year and comparing each teams odds. The aim is to review the data and see if their are any trends we can spot. To get the data i downloaded the raw CSV from the football data. The CSV is available on their website for free and contains lots of other interesting information.
The summary above shows each team in the sky bet championship with the home and away game odds plotted. The big thing to take away is the spread for some teams. If you look at Wolves they were generally well fancied in their home and away games. Burton however even in their well fancied games at home they were still less fancied then other teams at home. Also the better team the more overlapping of home and away odds.
I have now updated the graphs to focus just home and away games. The home games again Wolves generally have the lower odds for home games. The only team that has some odd close the Wolves is Aston Villa who are obviously a well fancied home team. The biggest surprise for me is that despite Burton having clearly the higher odds then any other team in the division they dont hole the least fancied odds for a home team. That accolade goes to Barnsley. A similar pattern is seen with the away odds, obviously they are generally higher then the home odds. This data seems ot suggest that the better teams have both lower odds and smaller grouping of odds. Also this could be a way to review how closely matched a league is the more spread out the odds the closer the teams are in terms of quality.
The graph above compares the number of home wins for a team against their average odds to win. As expected the lower the average home win odds the more home wins a team has got. However there are some outliers which are interesting. The two big overachievers when looking at the bookies odds are Cardiff and Bolton. Cardiff look like they should have similar amount of wins to the teams in the playoff mix and Bolton look like they should have theoretically the second lowest home wins in the league. Underachievers look to be maybe Brentford and Norwich though there seems to be more teams Overachieving then Underachieving.
Finally away wins shows the same trend however this time there are clear teams at the bottom and top showing how much harder it is to win away from home. The big overachiever away from home is Burton which suggests they play well when teams underestimate them. A team which has under achieved is Middlesbrough who look to have been expected to get more then 10 wins away this season but have only 7.
Hello and welcome to the final of my previews for this years IPL. Please do go check the others in the series and let me know your thoughts and predictions. Who do you think has the best squad in the league this year? Today we are focusing on the Sunrisers Hyderabad who were founded in 2012. Since then they have won tournament once in 2016. They are now captained by Kane Williamson and are coached by the Australian Tom Moody.
Sunrisers have the second biggest squad in the league which should give them an advantage if any of the players lose form or they get injured. Its also a very well balanced age squad with not many young players. Finally looking at the distribution of games played in the IPL they have a number of key players with lots of experience. One of them though is David Warner, not playing this year due to the fallout from the ball tampering scandal.
Looking at the treemap its clear that the team spent most of its money on batsmen and fast bowlers, there is relatively little investment in all rounders. Sunrisers have also invested big in the Afghanistan leg spinner Rashid Khan. Him and Bhuvneshwar Kumar will be the key members of the bowling attack.
Overall the batsmen look pretty weak. This is because they are mostly on the left side of the graph. One hope is their relatively strong levels of strike rate however the issue is the batsmen may not stick around long enough to get going.
With the all rounders the news gets a little bit better. Despite the low outlay in players a number of them appear to have good batting and bowling stats with good economy rates. This looks to be impressive work getting good players for the low outlay they had.
The bowling department looks to be the teams greatest strength. They will be hoping two of the best bowlers in the competition Kumar and Rashid live up to their past performances. They will need to if the Sunrisers are to be sucessful.
Overall I think this is a well balanced squad though i think the absence of Warner could be too big a void to fill.
Hello welcome to the 7th part of this series reviewing the rosters of all the IPL teams for the upcoming 2018 season. Please go and check the other posts as there’s some interesting insights to be find and let me know what you think. Today’s review we are looking at the Royal Challengers Bangalore, who could be termed as the unlucky team. Three times they have made it to the final of the tournament and lost all three times. Currently their captain is the world class Virat Kohli who has been with the team since its 2008 inception. They are coached by Daniel Vetori who has been the coach since 2014 and has a reputation as an expert twenty 20 coach.
Overall Bangalore have a decent sized squad compared to the other teams. Age wise most of their players are towards the higher age bracket but probably the most dense grouping around peak ages for a player. There is a good amount of experience throughout the squad when compared to the other teams but there are a few players towards the lower end which could effect them
Big value has been spent on Virat Kohli as you would expect but also Chris Woakes. He is the most expensive all rounder in the team and this is likely due to his ability to bowl at the death and restrict runs. He will be a key component in the bowling attack. There is also a good selection of fast bowlers to back up and good spin options in Moeen Ali and Chahal. Having both makes playing two spinners a lot easier due to Moeen’s ability with the bat.
In the treemap it looked like Bangalore had spent big money on Batsmen and it looks like that investment has paid off. The batting line up looks capable of threatening any team with high strike rates and high averages indicating long innings. I think this is clearly the best batting line up of the teams we have see so far.
All rounders it looks clear that they have aimed to go for bowling all rounders with good economy rates. This looks to be a smart move when the batting line up is as strong as it is.
When you look at the bowling attack its understand able why they have mostly aimed to go for bowling all rounders. It looks particularly weak especially for conceding runs. They will hope the bowling all rounders such as the likes of Woakes and De Grandhomme will supplement the attack.
Overall Bangalore clearly have a super strong squad with particular strength in the batting line up. If the all rounders can contribute as much as a bowler can then look out for Bangalore going far in this years tournament.
Hello i’m back with the next preview for this years IPL. As i say in the start of all of these posts its part of a series please go check out the others let me know what you think and your thoughts its definitely going to be an exciting league this year. Today we are covering the Rajasthan Royals a team with a chequered history. They won the league in the inaugural year in 2008 but since then its been poor performances on on the wicket and controversy off it, (Rajasthan were expelled for two seasons). Already this year they have lost their captain Steve Smith so are lead by Ajinkya Rahane.
Overall for number of players Rajasthan have the third smallest squad however its pretty similar to some of the other squads therefore I don’t think its an issue. With regards to ages there is a good grouping around peak age as well as a few players at the top ages which will be helpful. Experience is where it starts to go a bit downhill for the royals with clearly the least experienced team in the competition.
The treemap shows that the Royals have spent big on a number of key players. In all categories. One of the key players in Steve Smith will not be playing therefore that leaves a big hole in the batting line up. They will be hoping that the big money spent on all star all round Ben stokes can help overcome his loss. Its clear the Royals have spent big then filled the rest of the roster with cheaper players to supplement the big players.
Unfortunately the batting when viewed the same way we have viewed all the other teams so far looks pretty weak. Especially when you remove Steve Smith from that line up. They have a couple of batsmen with decent strike rates but no one to play the long innings. This could mean they struggle post big totals or chase big totals down.
Things look a bit better when the all rounders are looked at. There is a good spread of good bowling all rounders to supplement the bowling attack and batting all rounders the help the batting line up. This seems to be the most complete squad when it comes to all rounders.
The bowling attack could be the weakest area of the team. The whole attack is the the right hand side of the graph which means they could be conceding a lot of runs. Therefore without the batting line up without Steve Smith present could be struggling to chase down runs.
In summary I think that despite there obvious strength with all rounders, the weak batting line up missing Steve Smith and the poor bowling attack could mean the Royals are in for a season of struggle.