Hello and welcome to today’s data adventure. I am going to be continuing the development of my model to predict the performance of young players at batting in twenty 20 cricket. Please check out the first blog which detailed the data I had and the initial processing of the data.
So the next problem or the next area I need to process the data is to normalise the data for the number of innings. In the data set a lot of the players had played differing amounts of innings, therefore, it’s unfair to compare the players. Therefore I have to normalise for the number of innings. I can’t just carry on at the same rate due to regression to the mean. A player might only play 3 innings and have a batting average of over 60 which is way higher than the overall mean. It is likely that if that player played more games there average would reduce back to the mean. So let’s look at how the distribution of average and strike rate.
For both batting average and strikerate you can clearly see how the variation reduces over the amount of innings that are played. For batting average number of innings to minimise the variation looks to be around 10 however strike rate looks to stabilise around 4/5 games. Therefore I will regress to the mean for average up to 10 innings and for strikerate 5 innings.
Above you can see the code used to implement the regression to the mean in order to make it a fair analysis. The next step is to check how age affects the batting average and strike rate.
I have limited to the data to only looking at players 22 and younger as i suspect above that age there will be no relationship between age and the two metrics. In this analysis I am just going to look at players under the age of 21, therefore, I can use the equation of both lines to adjust the average and strike rate as if the batsmen was 21.
Above you can see the basic linear formulas I am using in order to get the slope for how a batsmen’s average increases with Age. That slope can be used in the following code
All I have done is find the difference between the player’s age and forecast age and then project what the average would if they were 21.
Above you can see a summary of the 18-year-old players. Now we have processed all the data.
- The big limitation of this early method is the lack of normalising for the type of wicket that it has been played on. Wicket quality will vary across each team, therefore, some players could be over or underrated due to that
- Another limitation is in some cases I am extrapolating from not a lot of data. Overall it’s a small sample size
Now I have calculated the final age-adjusted batting averages there is a number of further areas for analysis
- How many players go on to play for the first team is there levels where players go on to play for the first team
- For the players which go on to play in the first team how does that there first team average and strike rate compare to the calculated one
- Is there other metrics to look at which influence things such as there 4 or 6 hitting rate.
Thanks for reading, sorry its a bit longer than normal hopefully it was too boring. See you in the next adventure.