Kaggle Playground Series – Tidymodels

Hello readers, we are entering another Kaggle playground competition, so get your Yorkshire tea ready and enjoy the process of joining. This month the competition I entered is this one https://www.kaggle.com/competitions/playground-series-s3e7It’seiew It’s looks like looks are canncellations from hotels and spoiler alert – I had a lot of fun with this dataset. EDA First, I […]

Replacing Nikita — 3

Hello, Welcome to the third and final part of my replacing Nikita Parris series. If you haven’t caught the first 2 blogs go check out before this one as they take you through the various parts of the process. We identified our 3 best candidates to replace Nikita Parris: Francesca Kirby, Millie Farrow and Vivianne […]

Replacing Nikita 1

Today I have a challenge go through. Done by FC Rstats however the submission was a few weeks ago and I did it in a rush and I don’t think it was my best work. Therefore this is a re hash of my submission so could end up with different results. The challenge is simple […]

Twenty20 Adjusted Strike Rates

Introduction Hello, so today I am going to be diving deeper into my analysis of players in twenty20 cricket. If you didn’t check out the previous blogs go check them out. One of the key findings from that work was how much strike rates differ depending on the ball in the innings that was faced […]

Twenty20 Data Exploratory Data Analysis 1

Hello, welcome to today’s blog which I am going to do some exploratory data analysis on data in twenty cricket. You may have seen my blogs earlier looking at twenty20 batting metrics. Well they were all calculated just as they are but its likely these will be effected by state of the match and series […]

Tidy Tuesday Trains

Hello, Welcome to today’s blog looking at a tidy Tuesday data-set. Train delays in France. The first thing to do is to read the data into R Studio and have a look at the overview of the data: I can see the data looks broken up by year and month. Theres also different routes, the […]

The Next Chris Gayle

Hello, welcome to today’s data adventure where we are going to be scouting for the next Chris Gayle. I am going to be using K-means clustering in order to achieve this. The first question is what numbers am I going to use for this? Previously I detailed the creation of several metrics with which to […]

Tidy Tuesday – Films Dataset

Hello, today we are going to be looking at this weeks tidy Tuesday dataset. This is just quick EDA as i got a bit carried away with just the dataset. I initially set out to do just one interesting graph but kept finding more and more interesting insights. So below you can see the structure […]

Biketown EDA – P2

Hello, welcome to the second part of this blog doing exploratory data analysis on the bike town dataset. If you haven’t read the first one then go check it out. An overview is we found that most of the records were either subscribers to the system of casual people who might just use it every […]