Hello, I am running:
R version 2.10.0 (2009-10-26) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 on a RedHat Linux box with 48Gb of memory. I am trying to create a model.matrix for a big model on a moderately large data set. It seems there is a size limitation to this model.matrix. > dim(coll.train) [1] 677236 128 > coll.1st.model.mat <- model.matrix(coll.1st.formula, data = coll.train) > dim(coll.1st.model.mat) [1] 581618 169 One I saw the resulting model.matrix had fewer rows than the original data.frame I played with the number of input variables in the model: > ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status + + license.category + minor.conviction + driver.training.certificate + + admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 + faq43 + + faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima + maison + + nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve + rabperprg + + rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc, + data = coll.train) dim(ttt) [1] 677236 109 ## OK so far, but if I had one more variable there will be missing rows. > ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status + + license.category + minor.conviction + driver.training.certificate + + admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 + faq43 + + faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima + maison + + nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve + rabperprg + + rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc + + prof.b2, data = coll.train) dim(ttt) [1] 676379 110 Is there a limit to the size of a matrix and of a data.frame. I know the limit for the length of a vector to be 2^31, but we are very far from that here. Am I missing something? Thanks for any support, Gérald Jean Conseiller senior en statistiques, VP Actuariat et Solutions d'assurances, Desjardins Groupe d'Assurances Générales télephone : (418) 835-4900 poste (7639) télecopieur : (418) 835-6657 courrier électronique: gerald.j...@dgag.ca "We believe in God, others must bring Data." W. Edwards Deming Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire. This communication ( and/or the attachments ) is intended for named recipients only and may contain privileged or confidential information which is not to be disclosed. If you received this communication by mistake please destroy all copies. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.