Twenty 20 cricket first hit the world stage in 2003 and since then has arguably become (in some parts) the most popular form of the game. Data analysis will possibly or possibly not (it should) play a key role in squad selection. Today I am going to look at identifying a selection batting KPI’s. These KPI’s may or may not influence if a team wins or loses, but will identify different styles of play for batsmen. Seeing how they influence a teams chances of victory is another piece of work.
To do this I am using the ball by ball dataset for the IPL from 2008 to 2018. This dataset is available on Kaggle here:
- Batting Average – A pretty standard cricket KPI used in all forms of cricket. Key question it gives guidance on is, does the batsman score big or not
- Strike rate – this is the runs scored on average per 100 balls faced. Another standard KPI used in all forms of cricket but more crucial in the shortest form of the game
- Average Balls faced per innings – Does the batsman stick around and hold up an end or do they play short innings. A batsman with low balls faced but high strike rate may be useful for coming in down the order to chase a total down
- 6 Rate – Percentage of balls faced hit for six. Is the batsman a sniper just waiting for the ball to hit the maximum
- 4 rate – Percentage of balls faced hit for four. Similar to the previous one but will show if the batsman likes to time through the field or take the aerial route.
- Boundary Percent – what percentage of their runs do they score from boundaries.
- Non-Boundary Strikerate – how many runs are they scoring when not hitting boundaries. Are they rotating the strike or just blocking waiting for the next 6 to hit
- Average balls before the first boundary – How quickly is the batsman able to go in and start scoring runs at an effective rate
- Caught Rate – What percentage of the batsman dismissals or out caught. Interesting to see if this is high for the 6 hitters or the strike rotater’s
- Bowled Rate – What percentage of the batsman dismissals are bowled. I suspect this will be high for the 6 hitters
- Other Dismissals rate – the percentage of dismissals due to other reasons LBW, Run out etc.
Now I have identified 11 KPI’s which is probably a lot however it leaves scope for further work stripping out and finding the key ones.
Above you can see a summary of each of the 11 KPI’s. I can now use the distribution of each of the KPI’s to set up a visualisation that will be able ot summarise this stats into one visual.
I have run the code and calculated the KPI’s for two players; Virat Kohli and Chris Gayle. Two vastly experienced twenty 20 players.
So we can see these are both high quality twenty 20 cricketers hower the KPI’s identified show they go about it in a totally different way. First, let’s look at batting average which is broadly similar for both players. Chris Gayles is lsightly higher but only slightly. Next, we have our first big difference – the Gayles strike rate is significantly higher. The next difference is non-boundary strike rate Virat Kohli is significantly higher then Gayles which suggests Kohli is much better at rotating the strike away from scoring boundaries. Kohli also seems to stick around longer has generally longer innings. They both score their first boundary on average after a similar amount of balls. Next, we have further evidence on the difference between the two players Gayle scores most of his runs from boundaries whereas Kohli is nearly a 50:50 split. When we move onto 4 and 6 rate the number of runs scored from 4s is similar between players. However, the six rate is significantly higher for Gayle. Their wicket types are pretty similar overall despite the different methods for making runs.
This is just the first part of this work. I am going to look to develop this further so during this years IPL I can regularly update the visualisation. Further areas of work
- Plot multiple players, different seasons or different competitions on the same plot to allow much easier comparisons
- Investigate the influence of the KPI’s on the result identifying if certain ones are more key then others
- Enhance visualisation.
This will be further deloped as outlined above however if you wish to review yourself or just experiment the code is on my git hub
Thanks for reading!