PUBG – My First Kaggle Entry

Hello, Welcome to today’s blog looking at developing my first model to enter a Kaggle competition. We are going to be doing it in the PUBG competition so if you haven’t checked out my two previous exploratory data analysis then go check them out.

So the idea here is you have the whole dataset and you have to predict the winPacePerc which is basically where the player finished in the game. The result will e calculated by Mean Absolute Error. The data has mainly all numerical apart from one column which identified the game type. Therefore we are going to use a linear model to start with.

code1

Above you can see the initial code you can see the import of the data and removal of some variables that in the EDA I showed were not too key to predicting. It should make the model run quicker. I also need to create the training and testing dataset which is shown below.

code2

The code used to code the initial linear model is shown below as well as a summary of the model.

code3

result1

This model has an MAE of 0.096 which if i look at the current leaderboard wouldn’t get me very high at all. Therefore we are going to need a more complicated model. I am going to use the caret package and make a random forest model lets see how that performs

code4result2

The code for the random forest model to start with outlined above. Also, the results for the MAE is also pictured. I am using an MTRY of 15 to start within the first model. As you can see the random forest model offers significant improvement in the MAE metric for predicting players finishing position. This would get me a few places higher on the leaderboard but not super high. We can use cross-validation on the training dataset in order to get the model into a better fit.

code6

result3

So I used cross validations repeated twice however it hasn’t seemed to improve the model at all. This looks to be the best that this method can achieve. I, therefore, submitted this model and I ended up (at the time of writing) 463 out of 591 entries. So it isn’t great.

Things i can look at to improve the result:

  • Use a different model xgboost is the model thats been used to win most Kaggle competition
  • Experiment with the parameters of the model such as the amount of cross-validation
  • Use feature engineering to develop new variables in the data set in order to better predict

Overall I’m happy with my first attempt and there is still 2 months of the competition left so I will further develop it and hopefully move further up the leaderboard

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s