5.10

## Hanyang University

Skip to 0 minutes and 1 secondUntil now, we talked about a number of empirical studies that explored potential determinants of crime. Their findings usually come from some sort of regression analysis, such as fixed effect linear regression, difference-in-differences, or regression discontinuity analysis. In this video, I want to make a few comments about the strengths and limitations of these regression analyses. The first point is that, given the availability of statistical software and high quality data these days, running a regression analysis is actually pretty easy. Suppose I want to learn about the relationship between education and crime.

Skip to 0 minutes and 45 secondsOne thing I can do is to collect data on city level education levels and crime rates across different years and different cities and run a linear regression using some software like Stata. Unless your data contains some unusual problem, you will almost certainly find regression coefficient that tells you how education and crime are correlated. In other words, your regression analysis will always give you a number that tells you how the two variables of interest are related. The second point is that, although your regression analysis will always give you the correlation between the two variables, it usually does not give you their causal relationship between the two variables.

Skip to 1 minute and 32 secondsIn other words, your regression will almost always give you correlation, but correlation does not imply causation. For example, when you regress crime on education levels across different cities and obtain a negative coefficient, it simply means that cities with low education are more likely to have more crimes than cities with high education. But you should not take this as evidence that less education causes more crime. As we talked before, the main problem is that cities with high and low education levels may be different in many other aspects as well, and you cannot simply attribute the difference in their crime rates to the difference in their education levels.

Skip to 2 minutes and 21 secondsFor example cities with high education level may have more high paying jobs than cities with low education. In this case, it may be that the difference in crime rates between the two cities is not caused by the difference in their education levels, but is actually caused by the difference in the availability of high paying jobs between the two cities. My last point is a bit more optimistic. If we carefully design our empirical study, we may be able to recover a causal relationship from the regression analysis. For example, suppose I run a simple experiment in which I recruit many subjects and randomly divide them into two groups.

Skip to 3 minutes and 7 secondsI will offer the first group an opportunity to attend a high quality job training program and I will not offer anything the second group. And I will compare the offending rates after a few years. In this case, when I regress the offending rates on the group assignment status, the regression coefficient should reflect a causal effect of the job training program on crime. And that’s because I am pretty sure there was very little difference between the two groups in the beginning except their group assignment status, and the difference in their crime rates mostly likely have come from the fact that only one group could attend the training program and the other could not.

Skip to 3 minutes and 52 secondsDo we have to have a random experiment for the regression analysis to have a causal interpretation? The answer is no. For example, when we run a regression to compare the offending rates between individuals who are just above 18 and individuals who are just below 18, we can plausibly take the regression result as a causal effect of having to face a lot more severe punishment on the criminal behavior of young adults. And that’s because individuals who are just above age 18 and individuals who are just below 18 should be highly comparable in most aspects, except that one group is subject to a more lenient juvenile court system and the other is not.

Skip to 4 minutes and 37 secondsHere, the age cutoff for legal minority and majority creates a variation between otherwise comparable individuals, and this is what allows our regression analysis to identify a causal effect. And we call this an identifying variation. Whether we run a simple linear regression, difference-in-differences, regression discontinuity, or some other types of regression analyses, the strength of the research design often hinges upon the strength of the identifying variation used.

# When does regression analysis work?

When applied to a suitable empirical setting, simple linear regression (fixed effects, difference-in-difference, and regression discontinuity) can provide researchers with valuable information on the causal relationship of interest.