Mike Prager wrote: > Peter-- > > Thank you. Am I correct in understanding, then, that, > > (1) The syntax I asked about is a special case, and the parser > and/or dget() somehow recognize it as such, and > > (2) The syntax 1:15 (where 15 is the number of rows) should > work just as well as c(NA, 15)? > > I ask, again, because I want to ensure the widest possible > compatibility for the way For2R is writing data in emulation of > dput(). > > Essentially yes, but
(1) it is not as much about syntax, but about internal representation (2) Yes, it gives the same result -- the 1:15 is recognized as a vector that can be optimized to c(NA, 15). Needing to have the code check for this case is of course somewhat wasteful. To wit: > dd <- structure(list(x = c(1.19894055844457, -0.476584995973736, 1.90525643132169, -0.726616166810353, 0.590506316214127)), .Names = "x", row.names =1:5, class = "data.frame") - > dput(dd,control="all") structure(list(x = c(1.19894055844457, -0.476584995973736, 1.90525643132169, -0.726616166810353, 0.590506316214127)), .Names = "x", row.names = as.integer(c(NA, 5)), class = "data.frame") > --Mike > > > Peter Dalgaard <[EMAIL PROTECTED]> wrote: > > >> Mike Prager wrote: >> >>> I am trying to understand why syntax used by dput() to write >>> rownames is valid (say, when read by dget()). I ask this >>> because I desire to emulate its actions *reliably* in my For2R >>> routines, and I won't be comfortable until I understand what R >>> is doing. >>> >>> Given data set "fred": >>> >>> >>> >>>> fred >>>> >>>> >>> id var1 >>> 1 1991 0.4388587 >>> 2 1992 0.8772471 >>> 3 1993 0.6230486 >>> 4 1994 0.2340929 >>> 5 1995 0.5005605 >>> >>> we can try this-- >>> >>> >>> >>>> dput(ats, control = "all") >>>> >>>> >>> structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 = >>> c(0.4388587, 0.8772471, 0.6230486, 0.2340929, 0.5005605)), >>> .Names = c("id", "var1"), row.names = as.integer(c(NA, 5)), >>> class = "data.frame") >>> >>> In the above result, why is the following part valid? >>> >>> row.names = as.integer(c(NA, 5)) >>> >>> given that the length of the RHS expression is 2, while the >>> needed length is 5. >>> >>> Moreover, the following doesn't work: >>> >>> >>> >>>> row.names(fred) <- as.integer(c(NA,5)) >>>> >>>> >>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : >>> invalid 'row.names' length >>> >>> Is there any reason why the expression >>> >>> c(NA,5) >>> >>> is better here than the more natural >>> >>> 1:5 >>> >>> here? >>> >>> >>> >> It's mainly a space-saving device. Originally, row.names was a character >> vector, but storage of character vectors is quite inefficient, so we now >> allow integer names and also a very short form where 1:n is stored just >> using the single value n. To distinguish the latter two, we use the >> c(NA, n) form, because row names are not allowed to be missing. >> >> Consider the following and notice how the string row names take up >> roughly 36 bytes per record where the actual data are only 8 bytes per >> record. >> >> > d<-data.frame(x=rnorm(1000)) >> > object.size(d) >> [1] 8392 >> > row.names(d)<-as.character(1:1000) >> > object.size(d) >> [1] 44384 >> > row.names(d)<-1000:1 >> > object.size(d) >> [1] 12384 >> > row.names(d)<-NULL >> > object.size(d) >> [1] 8392 >> >> >> >> >> >>> I will appreciate help from anyone with time to reply. >>> >>> MHP >>> >>> >>> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel