Hello, welcome to this exploratory data analysis looking at the PUDG data from Kaggle. Find the competition in the link below:
So this first part we are going to some exploratory data analysis for this dataset as you cant train machine learning models without understanding the data. I’m not going to put information on the column titles in this as there are too many to mention. Check them out in the link above to see.
Lets first look at the kills and how this influence winning. If you have read the column titles you will know there is one column (winPlacePerc) which identifies where each player finished. Let’s see how this value you changes according to the number of kills:
Generally, the more kills the more likelihood that you will win the game. However, there seems to be a large number of people that don’t get any kills right across the finishing position spectrum.
The histogram shows how many kills the winners generally make. Despite PUBG being a game that can be played with up to 100 players. Looks like most players who win kill less than 10% of the people in the game. There are 4 different game modes which we have the data for does game mode make any difference to the kills
Generally, more kills are required to win in solo-fpp game mode with the highest density around 5 kills the other game modes. In a game of PUBG there is a maximum of 100 players and amount of kills I would think so let’s see how the composition of the game changes:
As you can see often games have less than 100 players in and there is a wide range of players in a game from around 20 to 100. Generally, though most games have between 80 – 100 players in.
More kills do seem to be required if there are more players in the game, however, some of the game sizes the sample sizes are pretty small therefore I am going to be cautious about this conclusion.
In the data set, there are 4 columns which are related: kills, headshot kill, assists and damage dealt. Can we create a new variable from these? I have valued a headshot kill as 1000, a kill none headshot 750, an assist 500 and each bit of damage caused as a point.
The graph looks pretty similar to the previous one. If anything the difference between the average winners and none winners has been increased. This could mean it will make the machine learning model more accurate. One thing is surprising is that there looks like one person who has won the game without any kills points. Unbelievable really and obviously a rare occurrence.
Thats it for this first EDA focusing on the kill statistics. In the next weeks, we will look at other variables such as walk distance and kills made. Then the third blog I will create the model.