Panel data and fixed effect regression exercise (Optional)
In this exercise, we will use data on crime rates and economic conditions in large U.S. counties in 1990 and 2000 to further investigate the relationship between crime and economic conditions.
Previously, we used data on crime and economics conditions from year 2000 (“cross-sectional data”), but it turns out that we can improve our empirical analysis by using data on the same units of observations from multiple points in time (“panel data”).
Step 1: Open the attached data file .
Step 2: As in the previous exercise, we will first use Excel’s Data Analysis feature to run the following linear regression using data from 2000 only (cross-sectional data).
Here, \(assault\)\(i,2000\) represents the aggravated assault rate (per 100,000) of county \(i\) in 2000, \(poverty\)\(i,2000\) represents the poverty rate in county \(i\) in 2000, and \(inequality\)\(i,2000\) represents the level of Gini Coefficient of county \(i\) in 2000. Please refer to Step 2-13 if you forgot how to run linear regression using Excel.
After running your linear regression, you should obtain the following coefficients.
The regression coefficients tell us that the aggravated assault rate is positively correlated with both unemployment rate and the level of inequality. If there are two counties that have the same poverty rate, but one county has the Gini Coefficient of 0.2 and another has the Gini Coefficient of 0.1, we expect that the assault rate would be higher in the former by 74.2.
- Step 3: We will now use data from 1990 and 2000, and include county-fixed effect in our regression as below:
\(assault\)\(i,1990\)\(=α_0+α_1 poverty\)\(i,1990\)\(+α_2 inequality\)\(i,1990\)\(+θ_i\)
\(assault\)\(i,2000\)\(=α_0+α_1 poverty\)\(i,2000\)\(+α_2 inequality\)\(i,2000\)\(+θ_i\)
\(θ_i\) is the county fixed effect, which represents time-invariant characteristics unique to county \(i\) that are not observable to researchers but are relevant to the county crime rate. For example, we know there are important crime-relevant differences between Los Angeles County and New York County which cannot be explained by their difference in poverty rates and inequality levels alone. We can always try to collect more data, but there will always remain some difference between the two counties that cannot be explained by observable data. The inclusion of county-fixed effects takes account of this unobservable difference between counties.
But how can we draw the best fitting lines for the two equations above when the county-fixed effect \((θ_i)\) is unobservable? It turns out that we do not actually have to compute \(θ_i\) if we have multiple observations from the same counties. For example, if we have data on county i from years 1990 and 2000, we can simply subtract the first equation from the second equation and focus on the within-county difference between 1990 and 2000 as below.
Note how we eliminated the county fixed effect from the equation by taking the within-unit difference between 1990 and 2000. Now, to find the best-data-fitting \(α_1\) and \(α_2\) in the equation above, we should first create three new columns that correspond to the changes in the assault rate, poverty rate, and the level of inequality within each county between 1990 and 2000. After regressing the change in assault rates on the changes in poverty rates and inequality levels, you should obtain the following result.
(Note: When you take the difference between 1990 and 2000 data, the constant \(α_0\) is no longer present in the equation. Thus, when running the regression, we have to tell Excel that \(α_0\) must be equal to zero. We can do this by clicking the box next to the “Constant is Zero” when setting the X and Y range.)
The regression coefficients now tell a different story on how the aggravated assault rate is related to poverty and inequality. Aggravate assault is still positively associated with poverty, so that an increase in poverty rate in a county over time is likely to lead to a higher rate of aggravated assault. On the other hand, the relationship between aggravated assault rate and inequality is now negative, and we would expect that places with rising economic inequality will have a lower rate of aggravated assault.
This result may seem counter-intuitive, but in my research paper, I argue that this may be a more accurate description of the relationship between inequality and crime. In the U.S., we observe a lot more crimes taking place in poverty-concentrated neighborhoods (where the level of economic inequality is low) and fewer crimes in more mixed-income neighborhoods (where the level of economic inequality is high). Thus, we may expect that crime would decrease when poverty-concentrated neighborhoods attract more affluent residents and become more economically “unequal”.
But when looking at data at the national- or state-level, we may still find that crime and inequality are positively related. For example, if high inequality at the national level leads to more poverty concentration in a few disadvantaged neighborhoods, the overall crime rate may increase. If high inequality at the national level leads to a larger number of mixed-income neighborhoods, the overall crime rate may decline.
- Kang, Songman. “Inequality and Crime Revisited: Effects of Local Inequality and Economic Segregation on Crime.” Journal of Population Economics 29.2 (2016): 593-626.
© Songman Kang, Hanyang University