>>>>> "HaroldD" == Doran, Harold <[EMAIL PROTECTED]> >>>>> on Mon, 21 Jul 2008 19:15:37 -0400 writes:
HaroldD> Well, yes and no. In R there really isn't a need to create the model matrix because this is done in R from the factors. But, to implement this computational trick Alan is asking about, it requires that he first create the full, dense model matrix and the do the time-demeaning on that matrix. HaroldD> If lm() could go straight from a factor to a sparse HaroldD> model matrix, time-demeaning would not be necessary. Well, lm() is in "stats" would only work with dense matrices anyway. But you are right in what you *meant*: We'd need versions of model.frame() and model.matrix() which from a formula produce a sparse model matrix (aka "X matrix") or its transpose. Doug Bates showed you how to do the latter manually, equivalently to model.matrix(~ 0 + f1 + f2) when f1 and f2 are factors. I'm sure that longer-term we'd want versions of model.matrix() / model.frame() that work with sparse matrices. HaroldD> Doing work as Doug suggests in the other HaroldD> post is what would be best for now, me thinks. Yes. BTW, you mentioned SparseM's "OLS with sparse matrices". The problem there is the same as with 'Matrix': You must somehow get your sparse X matrix and the best currrent tools to that, AFAIK, are the ones in 'Matrix' Doug Bates mentioned (and wrote!). Martin Maechler HaroldD> -----Original Message----- HaroldD> From: Bert Gunter [mailto:[EMAIL PROTECTED] HaroldD> Sent: Mon 7/21/2008 6:45 PM HaroldD> To: Doran, Harold; [EMAIL PROTECTED]; r-help@r-project.org HaroldD> Subject: RE: [R] Large number of dummy variables HaroldD> Unless I'm way off base, dummy variable are never needed (nor are desirable) HaroldD> in R; they should be modelled as factors instead. AN INTRO TO R might, and HaroldD> certainly V&R's MASS and others will, explain this in more detail. HaroldD> -- Bert Gunter HaroldD> Genentech, Inc. HaroldD> -----Original Message----- HaroldD> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On HaroldD> Behalf Of Doran, Harold HaroldD> Sent: Monday, July 21, 2008 3:16 PM HaroldD> To: [EMAIL PROTECTED]; r-help@r-project.org HaroldD> Cc: Douglas Bates HaroldD> Subject: Re: [R] Large number of dummy variables HaroldD> Well, at the risk of entering a debate I really don't have time for (I'm HaroldD> doing it anyway) why not consider a random coefficient model? If your HaroldD> response has anything like, "well, random effects and fixed effects are HaroldD> correlated and so the estimates are biased but OLS is consistent and HaroldD> unbiased via an appeal to Gauss-Markov" then I will probably make time HaroldD> for this discussion :) HaroldD> I have experienced this problem, though. In what you're doing, you are HaroldD> first creating the model matrix and then doing the demeaning, correct? I HaroldD> do recall Doug Bates was, at one point, doing some work where the model HaroldD> matrix for the fixed effects was immediately created as a sparse matrix HaroldD> for OLS models. I think doing the work on the sparse matrix is a better HaroldD> analytical method than time-demeaning. I don't remember where that work HaroldD> is, though. HaroldD> There is a package called sparseM which had functions for doing OLS with HaroldD> sparse matrices. I don't know its status, but vaguely recall the author HaroldD> of sparseM at one point noting that the work of Bates and Maechler would HaroldD> be the go to package for work with large, sparse model matrices. >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Alan Spearot >> Sent: Monday, July 21, 2008 5:59 PM >> To: r-help@r-project.org >> Subject: [R] Large number of dummy variables >> >> Hello, >> >> I'm trying to run a regression predicting trade flows between >> importers and exporters. I wish to include both >> year-importer dummies and year-exporter dummies. The former >> includes 1378 levels, and the latter includes 1390 levels. I >> have roughly 100,000 total observations. >> >> When I'm using lm() to run a simple regression, it give me a >> "cannot allocate ___" error. I've been able to get around >> time-demeaning over one large group, but since I have two, it >> doesn't work in the correct way. Is there a more efficient >> way to handling a model matrix this large in R? >> >> Thanks for your help. >> >> Alan Spearot >> >> -- >> Alan Spearot >> Assistant Professor - International Economics University of >> California - Santa Cruz >> 1156 High Street >> 453 Engineering 2 >> Santa Cruz, CA 95064 >> Office: (831) 459-1530 >> [EMAIL PROTECTED] >> http://people.ucsc.edu/~aspearot >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.