2.14

## Hanyang University

Skip to 0 minutes and 1 secondIn the previous video, I said using panel data can help us improve our regression analyses. Let’s say I collect data on crime and inequality in cities in the United States, and regress crime rates on inequality. Linear regression will always give me the best fitting line and I will get some value for α1, but I would not believe that α1 represent the causal effect of inequality on crime. One obvious problem is that there may be other factors that need to be included in the equation but were not included. As you can imagine, areas with high inequality can be very different from areas with low inequality in many other ways.

Skip to 0 minutes and 43 secondsDemographics may be different, the quality of schools may be different, the availability of jobs may be different between these areas with high inequality and low inequality. So when we observe that high inequality areas have high crime rates, how can we be sure that it is high inequality that caused high crime rates? It is possible that differences in other factors may have contributed to the difference in crime rates. So when we run our linear regression, we usually regress our outcome variable on the main explanatory variable, as well as the set of other variables that we believe may be closely related to our outcome.

Skip to 1 minute and 24 secondsFor example, when trying to find out the effect of inequality on crime, I would regress crime rates on inequality and other factors. But the data we can find is often limited, and sometimes the data we want to have is not available at all. For example, suppose I want to regress crime rates on inequality using data from cities in the United States. I also want to include the variables on demographics, socioeconomic conditions, and other key factors in the regression equation. But no matter how hard I try, some data I want to include may not be available.

Skip to 2 minutes and 2 secondsFor example, I may want to include data on illegal drug markets and gang membership, but such data are usually kept secretive by drug dealers and gang members for obvious reasons, making the data very hard to obtain. And this lack of data can potentially damage the validity of my regression analysis. But if we have data on crime and inequality and other variables in American cities from multiple years, we can improve our analysis. Instead of comparing, let’s say, Boston, Chicago, and LA in 2000, we can compare Boston in 1990 with Boston in 2000, Chicago in 1990 with Chicago in 2000, and LA in 1990 with LA in 2000.

Skip to 2 minutes and 49 secondsThe main idea is to compare the variation within the same unit of observation over time, instead of making a comparison across different units of time. Just to recap, the main concern I raised was that there may be something special about Boston, Chicago, and LA that we cannot capture using available data. And in that case, our regression analysis will not tell us the effect of inequality on crime. But assuming that this unobservable characteristics unique to each city stays constant between 1990 and 2000, we can eliminate this problem.

Skip to 3 minutes and 26 secondsConsider this regression equation, which assumes that the rate of assault in city i in year t is a linear function of inequality, the shares of youth population and low-income population, and other characteristics that are unique to each city i but we cannot observe. Let’s call this unobservable characteristics θ. If you have data on inequality, the shares of youth population, and low-income population in LA in 1990 and 2000, I can predict the rate of assault in LA in the following way. Let’s take a difference between these two equations, and we will obtain this equation. There are two things to note about this equation.

Skip to 4 minutes and 11 secondsFirst, we are now comparing the change in assault rate in LA between 1990 and 2000 with the change in inequality, youth population, and low-income population in LA between 1990 and 2000. Second, the unobserved characteristics unique to LA, θLA, is now disappeared from the equation. This is great, because the problem we had earlier was that there may be some unique characteristics for each city that we cannot observe. But when we take this difference within the same city over time, we can eliminate θi from the equation and do not have to worry about it. Taking the within-unit difference can give us a very different picture. Let’s take a look at the figures.

Skip to 4 minutes and 57 secondsThis figure shows us the relationship between the assault rate and Gini coefficient, in 200 largest U.S. counties in 2000. I did not draw the best fitting line, but it is clear that the relationship is positive here. In places with high inequality, we would expect assault rates to be high as well. However, when we look at the difference within each county, the picture looks very different. Here, the x-axis represent the change in Gini coefficient within each county, and the y-axis represent the change in assault rate within each county between 1990 and 2000. If higher inequality really causes more assault, we would expect that areas that experience bigger increases in inequality to have bigger increases in assault rate.

Skip to 5 minutes and 49 secondsBut that’s not what we see here. The change in inequality level does not seem to be closely related to the change in assault rate, and our prediction that higher inequality will causes more crimes does not seem to be supported by this picture.

# Panel data and fixed effect regression

Looking at a within-unit comparison instead of an across-unit comparison may be a better way to study a causal relationship of interest.

For example, if we compare the size of police forces across different cities, we would observe that cities with larger police forces also have higher crime rates. But we would not interpret this as more police causing more crimes.

What do you think you will see if you compare the size of police forces and crime rates within each city over time? Would you be more comfortable interpreting the relationship as the causal effect of police on crime?