Hin-Tak Leung <[EMAIL PROTECTED]> writes: > Prof Brian Ripley wrote: > > Data frames have unique row names *by definition* (White Book p.57). > > Yes - I happened to have the White Book on my desk (not mine...) > - indeed, the first sentence on page 57 is (quote verbatim, the > "never" is in italic in the book, which I have added the "*" before > and after): > > If all else fails, the row names are just the row numbers. They > are *never* null and must be unique. > > So patching data.frame.R is quite wrong. However, the rowname/colname > overhead is definitely an issue for processing of large data sets, > both for speed and amount of memory consumed. So it is probably best > to extend the data.frame class and call it something else instead, > for those who needs to go that route.
Exactly. I recall from the Insightful people at the DSC in Seattle that something is going to happen with the rownames in S-PLUS or has happened in the latest release, but I don't remember exactly how they did it, and if and how it had to do with their "big dataframe" code. We might want R to follow suit in this respect. Other options might include doing something about the string-storage of rownames, which is quite wasteful in R (every string is an R object, a string vector is really a list of CHARSXP objects). Either one could improve on the internal storage format, or one could allow rownames to be integers with semantics like "virtual strings" so that x["123",] still works. > (What I am doing is already called a different name so it isn't > affected by this argument). > > Hin-Tak > > > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel