Data Analyst and Lover of Baseball and Beer
By Doug Duffy | 22/02/2016
This short “How To” section explains in plain terms how the data for the Guns, Crime and Suicide Series was acquired, organized and plotted. I will NOT get into any coding here, but for those interested the Github is here and should be fully reproducible.
Overall, this project was straightforward, obtain the desired data from multiple sources, clean and merge together. At the state level, crime data came from the FBI’s Uniform Crime Reporting Statistics database, age-adjusted suicide rates came from the CDC and gun ownership rates came from an article in the British Medical Journal. Both the crime rate and suicide rates are expressed as per 100,000 residents, while the gun ownership rates are expressed as a percentage of the population.
For the country level data, the UN Office on Drugs and Crime published reports on Global Homicide Rates in 2013 and on Firearms Ownership in 2015. Suicide rates were obtained from the World Health Organization (WHO), and like the US data are age-adjusted and for the year 2012. The data from the various sources was simply copy and pasted into Excel to prepare a file to be read into R.
Once the data was read into R there was minimal cleaning to match the country names listed in various sources. Most of these changes were something like “Côte d’Ivoire” to “Ivory Coast” or “Dem. Rep. Congo” to “Democratic Republic of the Congo”, but I also found out there’s a country called Timor-Leste. Ranks for each state or country statistic were also added to provide additional context to the numbers.
Similar to the MLB Birth Map, these USA and Country Maps were produced mainly through the use of two software packages in R, leaflet and shiny. Shiny is a web application framework for R, and allows making all types of graphs and maps interactive, that is, it does things based on user inputs. All drop down menus, slider bars, buttons and check boxes are created by Shiny. In short, Shiny is the backbone on which all of the interactives on this site have been made.
Plotting the shaded map, called a chloropleth, is simple with Leaflet, the somewhat challenging part was adding the data for each state or country to the existing JSON file. These JSON files are simply a listing of all the latitude and longitude points required to outline a state or country, and also may contain other information, such as place name. Although manipulating JSON files in R is somewhat strange the task itself is simple, we must add whatever statistics we have in R as a property to the JSON object by matching the state or country names. A hover effect was added to highlight whichever state or country is hovered over and a customizable HTML box displays the statistics and ranks related to that place.
Making the Graphs
Once the data has been cleaned and organized, making the interactive graphs is fairly trivial. This was performed with the use of the ggvis R package, again in conjunction with Shiny. The color of the points is determined by that place’s region and both the size and y-axis variables are left as user inputs. Checkboxes all the user to add or remove point labels and a trendline accompanied by a standard error region. For the country level data, there is additionally an option to subset to the various groups of nations.