2.11

# Regression analysis exercise (Optional)

[Note: The exercise presented in this step requires Microsoft Excel. If you already know how to run linear regression using a different computer program, you can just skim through the instructions and move on. If you do not know how to run linear regression, it may be a good idea to learn more about it from other online courses in order to better appreciate research findings presented in later weeks. If you do not have Microsoft Excel and cannot participate in the exercise, just keep in mind the main point of this step: when you have data on two variables (say, crime and unemployment rates), it is very easy to visualize and quantify the relationship between them using a computer program. This exercise is optional after all.]

Let’s see how we can use Microsoft Excel to run a simple linear regression and obtain the best fitting line for the equation below.

$\quad\quad\quad\quad\quad\quad$ $larceny=α_0+α_1 unemployment$

Open the attached data file. This file contains unemployment and larceny rates from 200 largest U.S. counties in year 2000. County FIPS code is a unique identifier for each U.S. county.

• Step 1) First, we want to visualize the relationship between unemployment and larceny. Draw a scatter plot using the built-in feature in the Microsoft Excel. (Refer to the Microsoft Office Support page if you don’t know how to.) Put the unemployment rate on the x-axis and larceny rate on the y-axis. At the end, you should obtain a figure like the one below:
##### [Figure 1: Unemployment and Larceny, Large U.S. Counties in Year 2000]

• Step 2) From the graph, it seems like unemployment and larceny rates are positively related, but we want to quantify this relationship. Again, we will use the built-in feature in Microsoft Excel to draw a trend line and obtain the best fitting line. Refer to the Microsoft Office Support page if necessary.

You should be able to obtain the figure below, and find that the best fitting line for our data is:

$\quad\quad\quad\quad\quad$ $larceny=1778.1+15415*unemployment$

##### [Figure 2: Unemployment and Larceny, Large U.S. Counties in Year 2000]

With our regression result (larceny=1778.1+15415*unemployment), we know that a 10 percentage point increase in unemployment rate is associated with an increase of larceny rates by 1541.5 (0.1 * 15415) in our data. In other words, if there are two counties, one of which has 10% unemployment rate and another has 20% unemployment rate, we would expect the latter to have 1541.5 more larcenies per 100,000 than the former.

However, this result should not be viewed as the causal effect of unemployment on larceny. Places with high unemployment rates are usually different from places with low unemployment rates in many other ways, and we cannot attribute the difference in crime rates as the causal effect of unemployment only.