You could get lucky here, but strictly speaking, this list is about R programming and statistical issues are typically off topic Someone might respond privately, though.
Cheers, Bert On Thu, Oct 6, 2022 at 4:24 AM Valerio Leone Sciabolazza < sciabola...@gmail.com> wrote: > Good morning, > I am trying to use R to estimate a fixed effects model (i.e., a panel > regression model controlling for unobserved time-invariant > heterogeneities across agents) using different estimation approaches > (e.g. replicating xtreg from Stata, see e.g. > > https://www.stata.com/support/faqs/statistics/intercept-in-fixed-effects-model/ > ). > I have already asked this question on different stacks exchange forums > and contacted package creators who dealt with this issue before, but I > wasn't able to obtain an answer to my doubts. > I hope to have better luck on this list. > > Let me introduce the problem, and note that I am using an unbalanced panel. > > The easiest way to estimate my fixed effect model is using the function lm. > > Example: > > # load packages > library(dplyr) > # set seed for replication purposes > set.seed(123) > # create toy dataset > x <- rnorm(4000) > x2 <- rnorm(length(x)) > id <- factor(sample(500,length(x),replace=TRUE)) > firm <- data.frame(id = id) %>% > group_by(id) %>% > mutate(firm = 1:n()) %>% > pull(firm) > id.eff <- rlnorm(nlevels(id)) > firm.eff <- rexp(length(unique(firm))) > y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] + rnorm(length(x)) > db = data.frame(y = y, x = x, id = id, firm = firm) > rm <- db %>% group_by(id) %>% summarise(firm = max(firm)) %>% > filter(firm == 1) %>% pull(id) > db = db[-which(db$id %in% rm), ] > # Run regression > test <- lm(y ~ x + id, data = db) > > Another approach is demeaning the variables included into the model > specification. > In this way, one can exclude the fixed effects from the model. Of > course, point estimates will be correct, while standard errors will be > not (because we are not accounting for the degrees of freedom used in > the demeaning). > > # demean data > dbm <- as_tibble(db) %>% > group_by(id) %>% > mutate(y = y - mean(y), > x = x - mean(x)) %>% > ungroup() > # run regression > test2 <- lm(y ~ x, data = dbm) > # compare results > summary(test)$coefficients[2,1] > > 0.9753364 > summary(test2)$coefficients[2,1] > > 0.9753364 > > Another way to do this is to demean the variables and add their grand > average (I believe that this is what xtreg from Stata does) > > # create data > n = length(unique(db$id)) > dbh <- dbm %>% > mutate(yh = y + (sum(db$y)/n), > xh = x + (sum(db$x)/n)) > # run regression > test3 <- lm(yh ~ xh, dbh) > # compare results > summary(test)$coefficients[2,1] > > 0.9753364 > summary(test2)$coefficients[2,1] > > 0.9753364 > summary(test3)$coefficients[2,1] > > 0.9753364 > > As one can see, the three approaches report the same point estimates > (again, standard errors will be different instead). > > When I include an additional set of fixed effects in the model > specification, the three approaches no longer return the same point > estimate. However, differences seem to be negligible and they could be > due to rounding. > > db$firm <- as.factor(db$firm) > dbm$firm <- as.factor(dbm$firm) > dbh$firm <- as.factor(dbh$firm) > testB <- lm(y ~ x + id + firm, data = db) > testB2 <- lm(y ~ x + firm, data = dbm) > testB3 <- lm(yh ~ xh + firm, data = dbh) > summary(testB)$coefficients[2,1] > > 0.9834414 > summary(testB2)$coefficients[2,1] > > 0.9833334 > summary(testB3)$coefficients[2,1] > > 0.9833334 > > A similar behavior occurs if I use a dummy variable rather than a > continous one. For the only purpose of the example, I show this by > transforming my target variable x from a continuous to a dummy > variable. > > # create data > x3 <- ifelse(db$x > 0, 1, 0) > db <- db %>% mutate(x3 = x3) > dbm <- dbm %>% > mutate(x3 = x3) %>% > group_by(id) %>% > mutate(x3 = x3 - mean(x3)) %>% > ungroup() > dbh <- dbh %>% mutate(x3 = dbm$x3) %>% > mutate(x3 = x3 + (sum(db$x3)/n)) %>% > ungroup() > # Run regressions > testC <- lm(y ~ x3 + id + firm, data = db) > testC2 <- lm(y ~ x3 + firm, data = dbm) > testC3 <- lm(yh ~ x3 + firm, data = dbh) > summary(testC)$coefficients[2, 1] > > 1.579883 > summary(testC2)$coefficients[2, 1] > > 1.579159 > summary(testC3)$coefficients[2, 1] > > 1.579159 > > Now, I want to estimate both the impact of x when this is higher than > 0 (i.e., x3) and when this is lower or equal to zero (call it x4). > Again, observe that x3 might as well be a real dummy variable, not a > transformation of a continuous variable. > > In order to do that, I exclude the intercept from my model. > Specifically, I do the following: > > # create data > x4 <- ifelse(db$x <= 0, 1, 0) > db <- db %>% mutate(x4 = x4) > dbm <- dbm %>% > mutate(x4 = x4) %>% > group_by(id) %>% > mutate(x4 = x4 - mean(x4)) %>% > ungroup() > dbh <- dbh %>% mutate(x4 = dbm$x4) %>% > mutate(x4 = x4 + (sum(db$x4)/n)) %>% > ungroup() > testD <- lm(y ~ x3 + x4 + id + firm - 1, data = db) > testD2 <- lm(y ~ x3 + x4 + firm - 1, data = dbm) > testD3 <- lm(yh ~ x3 + x4 + firm - 1, data = dbh) > summary(testD)$coefficients[1:2, ] > > 1.2372816 -0.3426011 > summary(testD2) > > Call: > lm(formula = y ~ x3 + x4 + firm - 1, data = dbm) > > Residuals: > Min 1Q Median 3Q Max > -3.8794 -0.7497 0.0010 0.7442 3.8486 > > Coefficients: (1 not defined because of singularities) > Estimate Std. Error t value Pr(>|t|) > x3 1.57916 0.03779 41.788 < 2e-16 *** > x4 NA NA NA NA > ... redacted > summary(testD3)$coefficients[1:2] > > 3.254654 1.675495 > > As you can see, the second approach is not able to estimate the impact > of x4 on y. At the same time, the first and the third approach return > very different point estimates. > > Is anyone able to explain me why I cannot obtain the same point > estimates for this last exercise? > > Is there anything wrong in the way I include the second set of fixed > effects? > Is there anything wrong in the way I include the variables x3 and x4? > Or this is simply a problem due to some internal functions in R? > > Any hint would be much appreciated. > > Best, > Valerio > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.