On Mon, 2 Jun 2014, Nick Pretnar wrote:

Hello,

I am having a great amount of difficulty running a simple linear regression 
model with entity and time fixed effects and HAC standard errors. I have a data 
set with 3 million observations and 30 variables. My data is structured as 
follows:

NAME    STATE   YEAR    Y       X1      X2
1               1               2012    1       1       1
2               1               2012    1       2       7
3               1               2012    1       1       2
4               2               2012    2       4       5

etc. ... For every state in every year, there are about 10,000 row vectors corresponding to individual observations. This is not a longitudinal dataset: an individual surveyed in year 2000 in state 1 is never spoken to again. Nonetheless, I still wish to control for geographical and time fixed effects. To do so, I run the following:

If you haven't got a longitudinal or time series data, then I wonder why you want to consider HAC standard errors (which explicitly try to adjust for autocorrelation). I guess that it would be more natural to simply use clustered standard errors, some of which are also robust against certain types of autocorrelation.

I would recommend that you use the "plm" package for your panel data. If you want to employ OLS estimation, you can use plm(..., model = "pooling") and adding certain id or time effects. And then plm has a number of vcov* functions for certain robust covariances: vcovBK, vcovHC, vcovSCC. See the corresponding manual pages for more details.

hth,
Z

load("data.frame.rda")
library(sandwich)
library(pcse)
model <- lm(data.frame$Y ~ data.frame$X1 + data.frame$X2 + 
as.factor(data.frame$state) + as.factor(data.frame$year))
vcovHAC(model, prewhite = FALSE, adjust = FALSE, sandwich = TRUE, ar.method = 
"ols")

R will not return any results, yet acts as if it is computing the results. This 
goes on for 4 hours or more.

I wanted to run the following:

library(pcse)
model <- lm(data.frame$Y ~ data.frame$X1 + data.frame$X2 + 
as.factor(data.frame$state) + as.factor(data.frame$year))
model.pcse <- pcse(model, groupN = data.frame$state, groupT = data.frame$year)

But I get the error:
Error in pcse(model, groupN = BRFSS_OBESEBALANCED$X_STATE, groupT = 
BRFSS_OBESEBALANCED$YEAR) :
 There cannot be more than nCS*nTS rows in the using data!

If there are any workarounds for this problem, I would greatly appreciate 
learning about them.

Thanks,

Nicholas Pretnar
University of Missouri, Economics
npret...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to