Re: [Rd] Documentation examples for lm and glm
Dear All, do you think that use of a data argument is best practice in the example below? regards, Heinz ### trivial example plotwithline <- function(x, y) { plot(x, y) abline(lm(y~x)) ## data argument? } set.seed(25) df0 <- data.frame(x=rnorm(20), y=rnorm(20)) plotwithline(df0[['x']], df0[['y']]) Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21: Dear Martin, I think that everyone agrees that it’s generally preferable to use the data argument to lm() and I have nothing significant to add to the substance of the discussion, but I think that it’s a mistake not to add to the current examples, for the following reasons: (1) Relegating examples using the data argument to “see also” doesn’t suggest that using the argument is a best practice. Most users won’t bother to click the links. (2) In my opinion, an new initial example using the data argument would more clearly suggest that this is the normally the best option. (3) I think that it would also be desirable to add a remark to the explanation of the data argument, something like, “Although the argument is optional, it's generally preferable to specify it explicitly.” And similarly on the help page for glm(). My two (or three) cents. John - John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http::/socserv.mcmaster.ca/jfox On Dec 17, 2018, at 3:05 AM, Martin Maechler wrote: David Hugh-Jones on Sat, 15 Dec 2018 08:47:28 +0100 writes: I would argue examples should encourage good practice. Beginners ought to learn to keep data in data frames and not to overuse attach(). Note there's no attach() there in any of these examples! otherwise at their own risk, but they have less need of explicit examples. The glm examples are nice in sofar they show both uses. I agree the lm() example(s) are "didactically misleading" by not using data frames at all. I disagree that only data frame examples should be shown. If lm() is one of the first R functions a beginneR must use -- because they are in a basic stats class, say -- it may be *better* didactically to focus on lm() in the very first example, and use data frames in a next one ... and instead of next one, we have the pretty clear comment ### less simple examples in "See Also" above I'm not convinced (but you can try more) we should change those examples or add more there. Martin On Fri, 14 Dec 2018 at 14:51, S Ellison wrote: FWIW, before all the examples are changed to data frame variants, I think there's fairly good reason to have at least _one_ example that does _not_ place variables in a data frame. The data argument in lm() is optional. And there is more than one way to manage data in a project. I personally don't much like lots of stray variables lurking about, but if those are the only variables out there and we can be sure they aren't affected by other code, it's hardly essential to create a data frame to hold something you already have. Also, attach() is still part of R, for those folk who have a data frame but want to reference the contents across a wider range of functions without using with() a lot. lm() can reasonably omit the data argument there, too. So while there are good reasons to use data frames, there are also good reasons to provide examples that don't. Steve Ellison -Original Message- > From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Ben > Bolker > Sent: 13 December 2018 20:36 > To: r-devel@r-project.org > Subject: Re: [Rd] Documentation examples for lm and glm Agree. Or just create the data frame with those variables in it > directly ... On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello, something that has been on my mind for a decade or two has > > been the examples for lm() and glm(). They encourage poor style > > because of mismanagement of data frames. Also, having the > > variables in a data frame means that predict() > > is more likely to work properly. For lm(), the variables should be put into a data frame. > > As 2 vectors are assigned first in the general workspace they > > should be deleted afterwards. For the glm(), the data frame d.AD is constructed but not used. Also, > > its 3 components were assigned first in the general workspace, so they > > float around dangerously afterwards like in the lm() example. Rather than attached improved .Rd files here, they are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > > You are welcome to use them! Best, Thomas __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel *** This email and any attachments are confidential. Any u...{{dropped:12}}
Re: [Rd] Documentation examples for lm and glm
Dear John, fully agreed! In the global environment I always keep my "data-variables" in a data.frame. However, if I look in help I like examples that start with the particular aspects of a function. It is important to know, if a function offers a data argument, but in the first line I don't need an example for the use of a data argument each time I look in help. best, Heinz Fox, John wrote/hat geschrieben on/am 17.12.2018 16:23: Dear Heinz, -- On Dec 17, 2018, at 10:19 AM, Heinz Tuechler wrote: Dear All, do you think that use of a data argument is best practice in the example below? No, but it is *normally* or *usually* the best option, in my opinion. Best, John regards, Heinz ### trivial example plotwithline <- function(x, y) { plot(x, y) abline(lm(y~x)) ## data argument? } set.seed(25) df0 <- data.frame(x=rnorm(20), y=rnorm(20)) plotwithline(df0[['x']], df0[['y']]) Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21: Dear Martin, I think that everyone agrees that it’s generally preferable to use the data argument to lm() and I have nothing significant to add to the substance of the discussion, but I think that it’s a mistake not to add to the current examples, for the following reasons: (1) Relegating examples using the data argument to “see also” doesn’t suggest that using the argument is a best practice. Most users won’t bother to click the links. (2) In my opinion, an new initial example using the data argument would more clearly suggest that this is the normally the best option. (3) I think that it would also be desirable to add a remark to the explanation of the data argument, something like, “Although the argument is optional, it's generally preferable to specify it explicitly.” And similarly on the help page for glm(). My two (or three) cents. John - John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http::/socserv.mcmaster.ca/jfox On Dec 17, 2018, at 3:05 AM, Martin Maechler wrote: David Hugh-Jones on Sat, 15 Dec 2018 08:47:28 +0100 writes: I would argue examples should encourage good practice. Beginners ought to learn to keep data in data frames and not to overuse attach(). Note there's no attach() there in any of these examples! otherwise at their own risk, but they have less need of explicit examples. The glm examples are nice in sofar they show both uses. I agree the lm() example(s) are "didactically misleading" by not using data frames at all. I disagree that only data frame examples should be shown. If lm() is one of the first R functions a beginneR must use -- because they are in a basic stats class, say -- it may be *better* didactically to focus on lm() in the very first example, and use data frames in a next one ... and instead of next one, we have the pretty clear comment ### less simple examples in "See Also" above I'm not convinced (but you can try more) we should change those examples or add more there. Martin On Fri, 14 Dec 2018 at 14:51, S Ellison wrote: FWIW, before all the examples are changed to data frame variants, I think there's fairly good reason to have at least _one_ example that does _not_ place variables in a data frame. The data argument in lm() is optional. And there is more than one way to manage data in a project. I personally don't much like lots of stray variables lurking about, but if those are the only variables out there and we can be sure they aren't affected by other code, it's hardly essential to create a data frame to hold something you already have. Also, attach() is still part of R, for those folk who have a data frame but want to reference the contents across a wider range of functions without using with() a lot. lm() can reasonably omit the data argument there, too. So while there are good reasons to use data frames, there are also good reasons to provide examples that don't. Steve Ellison -Original Message- > From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Ben > Bolker > Sent: 13 December 2018 20:36 > To: r-devel@r-project.org > Subject: Re: [Rd] Documentation examples for lm and glm Agree. Or just create the data frame with those variables in it > directly ... On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello, something that has been on my mind for a decade or two has > > been the examples for lm() and glm(). They encourage poor style > > because of mismanagement of data frames. Also, having the > > variables in a data frame means that predict() > > is more likely to work properly. For lm(), the variables should be put into a data frame. > > As 2 vectors are assigned first in the general workspace they > > should be delete
Re: [Rd] rbind on data.frame that contains a column that is also a data.frame
Also Surv objects are matrices and they share the same problem when rbind-ing data.frames. If contained in a data.frame, Surv objects loose their class after rbind and therefore do not more represent Surv objects afterwards. Using rbind with Surv objects outside of data.frames shows a similar problem, but not the same column names. In conclusion, yes, matrices are common in data.frames, but not without problems. Heinz ## example library(survival) ## create example data starttime <- rep(0,5) stoptime <- 1:5 event <- c(1,0,1,1,1) group <- c(1,1,1,2,2) ## build Surv object survobj <- Surv(starttime, stoptime, event) ## build data.frame with Surv object df.test <- data.frame(survobj, group) df.test ## rbind data.frames rbind(df.test, df.test) ## rbind Surv objects rbind(survobj, survobj) At 06.08.2010 09:34 -0700, William Dunlap wrote: > -Original Message- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Nicholas > L Crookston > Sent: Friday, August 06, 2010 8:35 AM > To: Michael Lachmann > Cc: r-devel-boun...@r-project.org; r-devel@r-project.org > Subject: Re: [Rd] rbind on data.frame that contains a column > that is also a data.frame > > OK...I'll put in my 2 cents worth. > > It seems to me that the problem is with this line: > > b$a=a , where "s" is something other than a vector with > length equal to nrow(b). > > I had no idea that a dataframe could hold a dataframe. It is not just > rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll > bet other R commands fail as well. > > My point of view is that a dataframe is a list of vectors > of equal length and various types (this is not exactly what the help > page says, but it is what it suggests to me). > > Hum, I wonder how much code is based on the idea that a > dataframe can hold > a dataframe. I used to think that non-vectors in data.frames were pretty rare things but when I started looking into the details of the modelling code I discovered that matrices in data.frames are common. E.g., > library(splines) > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class) $mpg [1] "numeric" $`ns(hp)` [1] "ns" "basis" "matrix" $`poly(disp, 2)` [1] "poly" "matrix" You may not see these things because you don't call model.frame() directly, but most modelling functions (e.g., lm() and glm()) do call it and use the grouping provided by the matrices to encode how the columns of the design matrix are related to one another. If matrices are allowed, shouldn't data.frames be allowed as well? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > 15 years of using R just isn't enough! But, I can > say that not > one > line of code I've written expects a dataframe to hold a dataframe. > > > Hi, > > > The following was already a topic on r-help, but after > understanding > what is > > going on, I think it fits better in r-devel. > > > The problem is this: > > When a data.frame has another data.frame in it, rbind > doesn't work well. > > Here is an example: > > -- > > > a=data.frame(x=1:10,y=1:10) > > > b=data.frame(z=1:10) > > > b$a=a > > > b > > z a.x a.y > > 1 1 1 1 > > 2 2 2 2 > > 3 3 3 3 > > 4 4 4 4 > > 5 5 5 5 > > 6 6 6 6 > > 7 7 7 7 > > 8 8 8 8 > > 9 9 9 9 > > 10 10 10 10 > > > rbind(b,b) > > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", > "2", "3", "4", > : > > duplicate 'row.names' are not allowed > > In addition: Warning message: > > non-unique values when setting 'row.names': ?1?, ?10?, ?2?, > ?3?, ?4?, > ?5?, > > ?6?, ?7?, ?8?, ?9? > > -- > > > > > Looking at the code of rbind.data.frame, the error comes from the > > lines: > > -- > > xij <- xi[[j]] > > if (has.dim[jj]) { > > value[[jj]][ri, ] <- xij > > rownames(value[[jj]])[ri] <- rownames(xij) # <-- problem is here > > } > > -- > > if the rownames() line is dropped, all works well. What this line > > tries to do is to join the rownames of internal elements of the > > data.frames I try to rbind. So the result, in my case should have a > > column 'a', whose rownames are the rownames of the original > column 'a'. > It > > isn't totally clear to me why this is needed. When would a > data.frame > > have different rownames on the inside vs. the outside? > > > Notice also that rbind takes into account whether the > rownames of the > > data.frames to be joined are simply 1:n, or they are something else. > > If they are 1:n, then the result will have rownames 1:(n+m). If not, > > then the rownames might be kept. > > > I think, more consistent would be to replace the lines above with > > something like: > > if (has.dim[jj]) { > > value[[jj]][ri, ] <- xij > > rnj = rownames(value[[jj]]) > > rnj[ri] = rownames(xij) > > rnj = make.unique(as.character(unlist(rnj)), sep = "") > > rownames(value[[jj]]) <- rnj > > } > > > In this case, the rownames of inside elements will also be > joined, but > > in case they overlap, they will
Re: [Rd] Easily switchable factor levels
To me this is a common situation, especially to switch between two languages. I solve it by separating the coding of values and their labels. Values are coded numerically or as character, and their labels are attached by a value.label attribute. When needed a modified factor function transforms these variable into a factor using the value.labels as labels for the factor. It's, however, no nice code and a drawback is that the value.label attribute has to be copied on subsetting. best regards, Heinz At 23.02.2011 22:23 +, Barry Rowlingson wrote: I've recently been working with some California county-level data. The counties can be referred to as either FIPS codes, eg F060102, friendly names such as "Del Norte County", names without 'County' on the end, names with 'CA' on the end ("Del Norte County, CA"). Different data sets use slightly different forms and putting them all together is a pain. So I was wondering about ways to attach multiple sets of level codes to a factor. It would work something like this: > foo=multifactor(sample(letters,5),levels=letters,levelname="lower") > foo [1] m u i z b Levels: a b c d ... y z > levels(foo,"upper") = LETTERS > uselevels(foo,"upper") > foo [1] M U I Z B Levels: A B C D E FZ > uselevels(foo,"lower") > foo [1] m u i z b Levels: a b c d z In this way you could easily switch your levels from M and F to Male and Female, or Hommes et Dames, without having to do levels(foo) = something and hope to get the ordering right every time. Just do it once, keep the multiple sets of level lables in the object. I'd even throw in a function to print out all the level codes: > levels(foo,all=TRUE) upper lower [1] A a [2] B b etc I can see assorted problems coding this up to cope with dropping levels when making subsets... and possibly problems when code does character matching of levels and expects them to be unchanged... Has anyone bothered to write anything like this yet? Or is the application a bit too rare to be worth it? Barry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: Dimension-sensitive attributes
At 10:01 09.07.2009, SIES 73 wrote: I've also had several use cases where I needed "cell-like" attributes, that is, attributes that have the same dimensions as the original array and are subsetted in the same way --along all its dimensions. So we're talking about a way to add metadata to matrices/arrays at 3 possible levels: 1) at the "whole object" level: attributes that are not dropped on subsetting 2) at the "dimension" level: attributes that behave like "dimnames", i.e. subsetted along each dimension 3) at the "cell" level: attributes that are subsetted in the same way as the original array My proposal would be simpler that Tony's suggestion: like "dimnames", just have reserved attribute names for each case, say "objdata", "dimdata", and "celldata" (or "objattr", "dimattr" and "cellattr"). If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached. On the other hand, Tony's pattern would allow as many attributes of each type as necessary (some multiplicity is already possible with the simpler design as dimdata or celldata could be lists of lists), at the cost of a more complex scheme of attributes that needs to be "parsed" each time. On Tony's suggestion, "attr.keep.on.subset" and "attr.dimname.like" (and possible "attr.cell.like") could be kept on a single list with 3 elements, something like: > attr(x, "attr.subset.with") <- list(object=..., dims=..., cells=...) Would something like this make sense for R-core --either for standard arrays or as a new class-- or would it be better implemented in a package? Enrique -Original Message- From: Tony Plate [mailto:tpl...@acm.org] Sent: miércoles, 08 de julio de 2009 18:01 To: r-devel@r-project.org Cc: Bengoechea Bartolomé Enrique (SIES 73); Henrik Bengtsson Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes There have been times when I've thought this could be useful too. One way to go about it could be to introduce a special attribute that controls how attributes are dealt with in subsetting, e.g., "attr.dimname.like". The contents of this would be character data; on subsetting, any attribute that had a name appearing in this vector would be treated as a dimension. At the same time, it might be nice to also introduce "attr.keep.on.subset", which would specify which attributes should be kept on the result of a subsetting operation (could be useful for attributes that specify units). This of course could be a way of implementing Henrik's suggestion: dimattr(x, "misc") <- value would add "misc" to the "attr.dimname.like" attribute and also set the attribute "misc". The tricky part would be modifying the "[" methods. However, the most useful would probably be the one for ordinary matrices and arrays, and others could be modified when and if their maintainers see the need. -- Tony Plate Bengoechea Bartolomé Enrique (SIES 73) wrote: > Hi, > > I agree with Henrik that his suggestion to have "dimension vector attributes" working like dimnames (see below) would be an extremely useful infrastructure adittion to R. > > If this is not considered for R-core, I am happy to try to implement this in a package, as a new class. And possibly do the same thing for data frames. Should you have any comments, ideas or suggestions about it, please share! > > Best, > > Enrique > > -- > --- > Subject: > From: Henrik Bengtsson Date: Sun, 07 Jun 2009 14:42:08 -0700 > > Hi, > > maybe this has been suggested before, but would it be possible, without not breaking too much existing code, to add other "dimension vector attributes" in addition to 'dimnames'? These attributes would then be subsetted just like dimnames. > > Something like this: > > >> x <- array(1:30, dim=c(2,3,5)) >> dimnames(x) <- list(c("a", "b"), c("a1", "a2", "a3"), NULL); >> dimattr(x, "misc") <- list(1:2, list(x=1:5, y=letters[1:8], z=NA), >> letters[1:5]); >> > > > >> y <- x[,1:2,2:3] >> str(dimnames(y)) >> > > List of 3 > > $ : chr [1:2] "a" "b" > $ : chr [1:2] "a1" "a2" > $ : NULL > > > >> str(dimattr(x, "misc")) >> > > List of 3 > $ : int [1:2] 1 2 > $ :List of 2 > ..$ x: int [1:5] 1 2 3 4 5 > ..$ y: chr [1:8] "a" "b" "c" "d" ... > $ : chr [1:2] "b" "c" > > I can imagine this needs to be added in several places and functions such as is.vector() needs to be updated etc. It is not a quick migration, but is it something worth considering for the future? > > /Henrik > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mail
Re: [Rd] Suggestion: Dimension-sensitive attributes
At 11:14 09.07.2009, SIES 73 wrote: > If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. My proposed design would be that: * "objattr" would be a list of attributes (just preserved on subsetting) * "dimattr" would be a list with as many elements as array dimensions. Each element can be any object whose length matches the corresponding array dimension's length and that can be itself subsetted with "[": so it could be a vector, a list, a data frame... * "cellattr" would be any object whose dimensions match the array dimensions: another array, a data frame... > In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached. Indeed, that's the objective: attaching user-defined metadata that is automatically synchronized with subsetting operations to the actual data. I've had dozens of use cases on my own R programs that needed this type of pattern, and seen it implemented in different ways in several classes (xts, timeSeries, AnnotatedDataFrame, etc.) As you point, this could offer a unified design for a common need. Enrique For my personal use it was sufficient to create a class called "documented" with a corresponding subsetting method and one attribute, also called "documented". This attribute may contain 'varlabel', 'varname', 'value.labels', 'missing.values', 'code.ordered', 'comment', ... It is copied on subsetting. I think attributes concerning e.g. dimensions, i.e. parts of an object should stay in this object-related attribute and be extracted on subsetting. Since subsetting an object leads to a new object, this could then have its own, new persisting attribute. The more difficult part may to be the binding of objects. Heinz -Original Message- From: Heinz Tuechler [mailto:tuech...@gmx.at] Sent: jueves, 09 de julio de 2009 10:56 To: Bengoechea Bartolomé Enrique (SIES 73); Tony Plate; r-devel@r-project.org Cc: Henrik Bengtsson Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes At 10:01 09.07.2009, SIES 73 wrote: >I've also had several use cases where I needed "cell-like" attributes, >that is, attributes that have the same dimensions as the original array >and are subsetted in the same way --along all its dimensions. > >So we're talking about a way to add metadata to matrices/arrays at 3 >possible levels: > > 1) at the "whole object" level: > attributes that are not dropped on subsetting > 2) at the "dimension" level: attributes that behave like > "dimnames", i.e. subsetted along each dimension > 3) at the "cell" level: attributes that are subsetted in the > same way as the original array > >My proposal would be simpler that Tony's >suggestion: like "dimnames", just have reserved attribute names for >each case, say "objdata", "dimdata", and "celldata" (or "objattr", >"dimattr" and "cellattr"). If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached. >On the other hand, Tony's pattern would allow as many attributes of >each type as necessary (some multiplicity is already possible with the >simpler design as dimdata or celldata could be lists of lists), at the >cost of a more complex scheme of attributes that needs to be "parsed" >each time. > >On Tony's suggestion, "attr.keep.on.subset" and "attr.dimname.like" >(and possible >"attr.cell.like") could be kept on a single list with 3 elements, >something like: > > > attr(x, "attr.subset.with") <- list(object=..., dims=..., cells=...) > >Would something like this make sense for R-core --either for standard >arrays or as a new class-- or would it be better implemented in a >package? > >Enrique > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unexpected result of as.character() and unlist() appliedto a data frame
At 17:25 27.03.2007 +0200, Martin Maechler wrote: >> "Herve" == Herve Pages <[EMAIL PROTECTED]> >> on Mon, 26 Mar 2007 20:48:33 -0700 writes: > >Herve> Hi, >>> dd <- data.frame(A=c("b","c","a"), B=3:1) dd >Herve> A B 1 b 3 2 c 2 3 a 1 >>> unlist(dd) >Herve> A1 A2 A3 B1 B2 B3 2 3 1 3 2 1 > >Herve> Someone else might get something different. It all >Herve> depends on the values of its 'stringsAsFactors' option: > >yes, and I don't like that (last) fact either. >IMO, an option should never be allowed to influence such a basic >function as data.frame(). > >I know I would have had time earlier to start discussing this, >but for some (probably good) reasons, I didn't get to it at the >time. >As Andy comments, everything is behaving as it should / is documented, >including the 'stringsAsFactors' option; >but personally, I really would want to consider changing >the default for data.frame()s stringAsFactors back (as >pre-R-2.4.0) to 'TRUE' instead of default.stringsAsFactors() >which is a smart version of getOption("stringsAsFactors"). >I find it ok ("acceptable") if its influencing read.table() >but feel differently for data.frame(). > >Martin > Martin! I see the problem with options influencing "such a basic function as data.frame().", but in my view the difficulty starts earlier. In my understanding data.frame() is _the_ basic way to store empirical source data in R and I found the earlier default behaviour, to change character variables to factors, problematic. If changing character variables to factors were only an internal process, not visible to the user, I would not mind, but to include a character variable in a data frame and get a factor out of it, is somewhat disturbing. A naive user like me was especially confused by the fact that I could read an SPSS file with spss.get (default: charfactor=FALSE) and get a character variable in a data.frame as a character variable but then putting it in a different data.frame it changed to factor. I would wish a data.frame() function that behaves as a "data container" with the idea of rows(=cases) and columns(=variables) but without changing the mode/class of the objects. Heinz > > > > >>> dd2 <- data.frame(A=c("b","c","a"), B=3:1, >>> stringsAsFactors=FALSE) >>> dd2 >Herve> A B 1 b 3 2 c 2 3 a 1 >>> unlist(dd2) >Herve> A1 A2 A3 B1 B2 B3 "b" "c" "a" "3" "2" "1" > >Herve> Same thing with as.character: > >>> as.character(dd) >Herve> [1] "c(2, 3, 1)" "c(3, 2, 1)" >>> as.character(dd2) >Herve> [1] "c(\"b\", \"c\", \"a\")" "c(3, 2, 1)" > >Herve> Bug or "feature"? > >Herve> Note that as.character applied directly on dd$A >Herve> doesn't have this "feature": > >>> as.character(dd$A) >Herve> [1] "b" "c" "a" >>> as.character(dd2$A) >Herve> [1] "b" "c" "a" > >Herve> Cheers, H. > >Herve> __ >Herve> R-devel@r-project.org mailing list >Herve> https://stat.ethz.ch/mailman/listinfo/r-devel > >__ >R-devel@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] raw data documentation
Dear Developers, after several discussions on r-help I got the impression that the "standard" R distribution, including the recommended packages, does not offer much to document raw data, imported into R. Hmisc has some functionality in this respect, and others like Richard Heiberger solved some other aspects, but I think there could be a more unified approach, which, of course needs the support from the core developers to become "standard". In particular I am looking for is a possibility to label variables, label values and add other information. Of course, all this is possible by adding attributes, but most attributes are lost when indexing/subsetting. >From many helpful suggestions of others I learned that this can be resolved by defining a class and corresponding methods, as is done for variable labels in Hmisc. For now I drafted something more general for my personal use, but before continuing on this I want to know, if there is some intention at the core developer team to work on the question of raw data documentation. Greetings, Heinz Tüchler __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel