Thanks for the comments on my Walk Score model! Per a few reader requests, here are the full results. I should have thought to provide them initially but didn’t realize there would be interest. Also, I don’t know a good way to put STATA or Excel charts here, so apologies for the screenshots.
Here are the results from the OLS model. The 259 datapoints represent all cities with population greater than 100,000 for which there is Walk Score data, except for two or three for which I couldn’t find the MSA data The unemployment is the 5-year January moving average at 2010.
And here are the results of the IV regression, where the instrument is the year that the city was founded. First stage:
And the second stage:
awp says
Shouldn’t you leave in all your controls for the 2sls regression.
I am actually quite shocked that the relationship between year founded and walkscore is negative. I would have expected Urban Form in older cities to be less car based and thus more walkable.
I would have also expected year founded to be a rather weak/noisy instrument. Year the MSA (using current boundaries) reached some percentage of maximum population would more readily explain the current urban form of a city. City of Houston was founded in 1836 but didn’t really exist until after the Galveston storm/Ship Channel and most of the current built form came after oil in Beaumont/Air Conditioning.
Run again leaving out New York(as it is a pretty extreme outlier in the U.S.). This is the first response to any cross city regression by Urban Economists I have dealt with, often include dropping L.A. and Chicago.
Emily Washington says
The negative relationship does mean that older cities are more walkable on average since an earlier year predicts a higher Walk Score. The year a city reached a certain level of population would definitely be a better instrument. Do you know if that dataset exists?
My understanding of using IV is that only the independent variable of interest and the instrumental variable are used. Is that not right?
Finally, dropping NY, LA, and Chicago does change the results. The coefficient on Walk Score stays about .03 but the F-stat drops to 8.1.
awp says
sorry, I had age stuck in my mind.
I don’t know if the dataset exists, but I have seen similar used. McMillen and Smith “Number of Subcenters in Large Urban Areas” use year when central city reaches 25% of its 1990 population. Using central city might be easier than trying to calculate using the county datasets, but gets back to the problem of changing boundaries and the fact political boundaries are pretty meaningless in this type of analysis. Still seems like quite a bit of work especially with all the MSAs your using. McMillen and Smith only look at ~60 Urban Areas. You want to include all explanatory variables in your regression. I can’t remember my econometrics class well enough to explain why and will have appeal to the authority of my hazy memory. ivregress estimator depvar [varlist1] (varlist2 = varlist_iv) [if] [in] [weight] [, options]which I think should look something like. ivregress 2sls housevalue unemp income (walkscore=year),first