I have a set of data with ~ 250,000 observations summarized in ~ 1000 rows that 
I'm trying to analyze with mlogit.  Based on the discussion in
https://stat.ethz.ch/pipermail/r-help/2010-June/241161.html
I understand that using weights= does not (fully) do what I need.  I tried 
expanding my data to one row per observation to sidestep this issue but after 
waiting several hours for mlogit to finish I decided this was not a feasible 
strategy and I needed to use weights= and make whatever adjustments are 
necessary for the inferences.

My solution is the following:
Define W = sum(weights) / length(weights)
Multiply the Log-Likelihood by W
Divide the Std. Error's by sqrt(W) (and therefore multiply the t-value's by 
sqrt(W))

Can anyone confirm that this is correct (at least as a large-N approximation)?

The code below provides a test case where I compare duplicating rows to using 
weights and adjusting the inferences (the original code was from Kenneth 
Train's exercises using the mlogit package for R).  The last few lines printed 
(Ratios: ...) show that the coefficients in the two cases are the same to a 
high accuracy and the Log-Likelihood, Std. Error's and t-value's also have the 
expected ratios to a decent accuracy.  However it would be good to know that 
this approach is conceptually sound.

Thanks,
Ron

library("mlogit")
data("Heating", package = "mlogit")
H <- mlogit.data(Heating, shape="wide", choice="depvar", varying=c(3:12))
m <- mlogit(depvar~ic+oc|0, H)
# print(summary(m))

w <- sample(1:200, nrow(Heating), replace=TRUE) # random weights
i <- rep(1:nrow(Heating), times=w) # index vector for duplicating rows 
according to the weights
H2 <- mlogit.data(Heating[i,], shape="wide", choice="depvar", varying=c(3:12))
m2 <- mlogit(depvar~ic+oc|0, H2)
# print(summary(m2))
m3 <- mlogit(depvar~ic+oc|0, H, weights=rep(w,each=5))
# print(summary(m3))
print(all.equal(coef(m2),coef(m3)))

f2 <- fitted(m2)[cumsum(w)]
f3 <- fitted(m3)
names(f2) <- names(f3)
print(all.equal(f2,f3))

cat("\nRatios:", m2$logLik/m3$logLik, sum(w)/length(w), sqrt(sum(w)/length(w)), 
sqrt(length(w)/sum(w)), "\n\n")

s2 <- summary(m2)
s3 <- summary(m3)

print(s2$CoefTable / s3$CoefTable)


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to