Spatial Statistics: Hot spot analysis

Hot spot analysis is a great tool that allows us to pinpoint the location of clustering and dispersion in our data. This is especially helpful when we are dealing with lots of data incidents, such as crime data over time, where many incidents overlap one another, making it difficult to visually determine exactly where the “hot” and “cold” spots are in our data. It is also useful for temporal analysis, helping us determine seasonal locational shifts in the data being examined.

For this workshop, we will be using crime data downloaded from the Los
Angeles Data Portal:

For the purposes of the workshop, the data has been cleaned up, divided into separate layers per year, and converted into a geodatabase. Download the following class data to your local drive.

  • Class geodatabase
    • la_neighborhoods
    • la_crime_2010
    • la_crime_2011
    • la_crime_2012
    • la_crime_2013
    • la_crime_2014
    • la_crime_2015
    • la_crime_2016
    • la_crime_2017

Is your data projected?

Geoprocessing should always be conducted with projected data. If your data is not projected, ie, it is in a geographic coordinate system (with coordinates in decimal degrees), make sure to project your data first. The data in this tutorial was originally downloaded from the LA Open Data Portal, with crime incidents recorded in decimal degrees (latitude and longitude degrees). This data was then projected to  UTM N Zone 11 to conform to a preferred projection for data in Los Angeles. If you are using your own data, make sure to project it to the region your data belongs to.

Setting up your project

Through the process of this workshop, you will be creating many new data layers. It is always good practice to designate a path to the geodatabase that you will use to store the layers.

  1. Go to File, Map Document Properties…
  2. Change the default geodatabase by finding the path to the workshop geodatabase you just downloaded.
  3. Also click on the checkbox next to Store relative pathnames to data sources

We will also be performing various geoprocessing tasks. In order to make it easy for us to interpret our results in real time, let’s “disable” background processing of our geoprocessing tasks.

  1. Click on Geoprocessing from the menu, and go to Geoprocessing options
  2. Make sure that Background Processing is unchecked

Is there clustering?

In order to begin hot spot analysis, we must first determine whether, statistically speaking, there is clustering evident in our data. One approach to do so is to run our data through the Spatial Autocorrelation (Global Moran’s I). This tool helps us determine whether or not our data is randomly distributed or not. In other words, what are the chances that the incidents in your data are located where they are randomly? Or perhaps, are there certain incidents located closer to other incidents? And what may explain this clustering? Let’s find out if the data we will use in this class (crime data in Los Angeles) has evidence of clustering or dispersion.

Choose a neighborhood

We could perform our hot spot analysis on the entire dataset, but two reasons prevent us from doing so. One, the data is very big (hundreds of thousands of records), and performing large scale statistical analysis on this amount of data will be very time consuming. Second, the scale is too big, meaning, that we would not get much variation at the local level. For the purposes of this tutorial, we will work at the neighborhood level, to allow us to see hot spots within individual neighborhood. Let’s begin by load the following layers to our map:

  • lacity_neighborhoods
  • la_crime_2010

Once loaded, turn off the crime data for better efficiency and visibility (it’s a huge dataset, so only turn it on when necessary). Next, select a neighborhood to analyze. For example, to choose Downtown:

    1. Click on the select tool
    2. Click on the Downtown polygon on the map

Select all the crime incidents that occurred within the downtown boundaries.

  1. Go to Selection, Select by Location, and enter the following information:
  2. Turn on the crime layer. You should see the crime incidents inside the downtown polygon selected. In the table of contents, right click on la_crime_2010 and Export Data:
  3. Export the data into your geodatabase:
  4. Click “yes” to add the data to the map

Count overlapping incidents

The LAPD records arrest locations to the closest intersection of where it occurred. What this means is that many incidents that happen close-by are visually stacked on top of one another, appearing as a single point. In order to provide an aggregate of overlapping points, let’s run the Collect Events tool.

  1. ArcToolbox, Spatial Statistics Tools, Utilities, Collect Events
  2. Enter the following information:

Determine whether there is clustering

  1. Run the Spatial Autocorrelation tool on the selected geography (in this case, downtown Los Angeles)
    1. ArcToolbox, Spatial Statistics Tools, Analyzing Patterns, Spatial Autocorrelation
    2. Enter the following information:
    3. To see your results, go to Geoprocessing, Results

    4. Then expand Current Session, Spatial Autocorrelation, and double click on the Report File

What do the results tell us?

Hot Spot Analysis

Now that we have determined that there is, indeed, statistically significant spatial clustering in our data, let’s find out where there are hot spots and cold spots in our data. Hot spots are areas that show statistically higher tendencies to cluster spatially. This is determined by looking at each incident within the context of neighboring features. In other words, a single point with high values isn’t necessarily a hot spot. It becomes a hot spot only when its neighbors also have high values.

Let’s run the hot spot analysis tool on our downtown crime data. In your Spatial Statistics Tools, expand Mapping Clusters, and double click on Optimized Hot Spot Analysis.

In the pop up window, select downtown_crime_2010, and make sure that COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS is selected. This will create a bunch of grid cells wherever there are incidents of crime present.

Nice! We have now converted our overlapping incidents into color coded grid cells.

Notice the legend for our results.

Also open the attribute table. Right click on downtown_crime_2010_HotSpot, and Open Attribute Table

The table represents each displayed cell.

  • JOIN_COUNT tells us how many incidents fall within a cell
  • GiZScore tells us the Z score (positive values indicates it is proportionally above the values for the entire dataset)
  • GiPValue gives us the P value
  • NNeighbors is the number of neighboring incidents that it has taken into account to compare to the sum of the entire dataset
  • Gi_Bin gives us a number that is associated to the confidence level displayed on the map

Finally, let’s label the grid cells with the JOIN_COUNT to give us an idea of why certain areas are hot, and others are cold.

  1. Right click on downtown_crime_2010_HotSpot, click on Properties, and click on the Labels tab.
  2. Check the box to label the features, and choose JOIN_COUNT for the label field

What do these numbers tell us?

Resources

Data Sources