Hello, welcome to today’s blog going through this weeks tidy Tuesday data set. Find the data set here:
Its all to do with bird collisions in Chicago. Now what I want to do today is go through this as I would when I have never seen a data set before. This is going to be the Part Time Analyst unedited.
First things first reading in the data:
Here I load the tidyverse package. I don’t do anything in R without it and 90% of any tasks can be done with it. I then look at the structure of the data set so I have an understanding of what it contains.
There’s 8 variables and 69965 observations. lets first have a look to see how many genus and species make up the data set
Two things are apparent. I have made my first mistake using col instead of fill to colour the columns. It also looks like there are a number of different geneus’s definitely more than should be allocated to a colour aesthetic. I try to limit to 8 different colours 12 at the maximum. After that colours are hard to distinguish and therefore not ideal. However when we review the graphs you can clearly see some genus/species are a lot more common then others. First thing first lets correct the graphs. I’m also going to filter the species to species with greater then 100 recordings.
That’s much better. It clearly shows the colours now. We can see that melospiza the largest genus is made up of a number of different species but the second largest genus seems to be made up of one species. There’s also bird family lets look to see what the difference is
Well over half the dataset is one particular type of family. Im no bird expert so I wont go any deeper into that. An interesting question to look at might be how the number of bird collisions have changed.
I have used the separate function from tidyr in order to split the date column into 3 different columns. I then grouped by year and summed up the total collisions for each year.
That’s not quite what I had in mind. What has happened as each year has became a factor and not a continuous variable. Therefore I need to go back and code the year column as numeric then hopefully it will be a continuous variable.
There we go there clearly more bird strikes as the years have gone on. I think this is maybe a very poor visualisation for showing that. I wanted to try something different to a bar graph maybe if i try this:
Ok that is a little better and you can clearly see the later years – the lighter colours are the later years. So this problem is getting worse however the bird population could also be getting bigger and therefore you would expect more strikes. Next question is there a particular time of year the strikes happen
As previous I have used separate to split the date column and then unite to create just a column with the month and day to get the totals for that
The plot however isn’t very good. You cant see the x axis. Sometimes you get to dead end and you learn areas where you can work to improve. This is one of them for me I need to go back and review how I can make this plot better. Which is one of the reasons why I do this. There is also the stratums column which denotes what area the bird occupies. Lets see how the number of strikes has changed for them over the years
Birds that occupy both stratums, strikes have increased since 1979. Birds who mostly live in the canopies of trees the upper stratum saw a big increase after 2000.
Thats it for this overview of some of my workflow working through a new dataset. It’s a lot raw’er then I normally would do but hopefully you ind it useful.