Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)
> "FrPi" == François Pinard <[EMAIL PROTECTED]> > on Sun, 4 Jun 2006 06:27:53 +0200 (CEST) writes: FrPi> Hi, people. FrPi> xy.coords() does not behave like its documentation says, when given some FrPi> matrices. ?xy.coords says: FrPi> If 'y' is 'NULL' and 'x' is a [...] formula [...] list [...] FrPi> time series [...] matrix with two columns [...] FrPi> In any other case, the 'x' argument is coerced to a vector and FrPi> returned as *y* component [...] FrPi> Now, consider this short transcript: FrPi> ==> >> as.vector(rbind(1, 2, 3)) FrPi> [1] 1 2 3 >> as.vector(cbind(1, 2, 3)) FrPi> [1] 1 2 3 >> xy.coords(rbind(1, 2, 3)) FrPi> $x FrPi> [1] 1 2 3 FrPi> $y FrPi> [1] 1 2 3 FrPi> $xlab FrPi> [1] "Index" FrPi> $ylab FrPi> NULL >> xy.coords(cbind(1, 2, 3)) FrPi> $x FrPi> [1] 1 FrPi> $y FrPi> [1] 2 FrPi> $xlab FrPi> [1] "[,1]" FrPi> $ylab FrPi> [1] "[,2]" FrPi> ==< FrPi> A 3 x 1 matrix and a 1 x 3 matrix both fall in the "In FrPi> any other case" category, but it seems that only the 3 x 1 FrPi> is really "coerced to a vector". yes. So you are right: There's a bug FrPi> The R code for xy.coord() suggests that the documentation should read FrPi> "matrix with at least two columns" instead of "matrix with two columns". FrPi> As a user, I was really expecting the coercion to a FrPi> vector to happen. What triggered me into exploring FrPi> this problem is the fact that plot() showed a single FrPi> point where I was expecting many. If you decide that FrPi> the code is right and the documentation is wrong, then FrPi> I would suggest that the code be a bit more friendly, FrPi> by at least issuing some warning if more than two FrPi> columns are given to a matrix. I agree. I'm not sure what the change should be -- and am asking for useR feedback here : 1) give an error in the case of a matrix (or data.frame) with '> 2' columns 2) give a warning, and use the first 2 columns -- as it happens now 3) silently coerce to vector -- as the current documentation claims. The most clean would be "1)", but given back compatibility, etc, my tendency would go into the direction of "2)". FrPi> Another problem in the same area: the documentation lies about how the FrPi> function acts when given a data.frame. From the code, a data.frame is FrPi> processed as if it was a matrix. From the documentation, while the FrPi> data.frame is not mentioned explicitely, it is implied in the paragraph FrPi> explaining how a list is processed (because a data.frame is a list). FrPi> Some reconciliation is needed here as well. Yes; in this case, I propose to just amend the documentation explainining that data.frames are treated "as matrices". Thanks a lot, Francois, for your careful reading and careful report! [ Though I do slightly mind the word "lies" since I do value the 9th commandment.. Not telling the truth *accidentally* is not "lying" ] Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] more on bug 7924
I see you have found the sexptype listing in Rinternals.h . I believe it was in one of R's FAQ's about R's garbage collector - it doesn't do proper reference-counted garbage collection as you suggested, but does a sort of poor man's garbage collection, by classifying entities in *only* 3 catergories - not-in-use, in-used-by-one, and in-used by-more-than-one. Kevin B. Hendricks wrote: > Hi, > > Okay I threw together a quick dump_object routine and found something > that I don't think is correct when call2 is created. > > > call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)] > > get("call2") > > I use the do_get break to find the SEXP value I want > > Breakpoint 1, do_get (call=0xc2d530, op=0x52bd30, args=0x9e83a8, > rho=Variable "rho" is not available. > ) at ../../../r-devel/r-devel/R/src/main/envir.c:1668 > 1668if (PRIMVAL(op)) { /* have get(.) */ > > > (gdb) print *rval > $2 = {sxpinfo = {type = 6, obj = 0, named = 1, gp = 0, mark = 0, > debug = 0, trace = 0, fin = 0, gcgen = 0, gccls = 0}, attrib = > 0x508818, gengc_next_node = 0x9e7d50, >gengc_prev_node = 0x9e7ce0, u = {primsxp = {offset = 10663048}, > symsxp = {pname = 0xa2b488, value = 0x9e7ce0, internal = 0x508818}, > listsxp = {carval = 0xa2b488, >cdrval = 0x9e7ce0, tagval = 0x508818}, envsxp = {frame = > 0xa2b488, enclos = 0x9e7ce0, hashtab = 0x508818}, closxp = {formals = > 0xa2b488, body = 0x9e7ce0, >env = 0x508818}, promsxp = {value = 0xa2b488, expr = 0x9e7ce0, > env = 0x508818}}} > > > Now I invoke my own dump routine which keeps track of recursion level > and will dump the named and other things inside the newly created > object, the format of the output is > > recursion level: SEXP X TYPEOF(X) and then some object specific values > > > (gdb) call dump_object(rval, 0) > > > 0: 0x9e7d18 LANGSXP Object with length 1, named 1 > f(arg[[1]], arg[[1]], arg[[1]]) > 1: 0xa2b488 SYMSXP name at 0xa29408, value at 0x5087e0, named 0 > f > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > > > > Notice how each LANGSXP subobject reuses the exact same objects/ > addresses (notice the address are the same) 3 times (one for each > entry) but the named value is always 0 for all of them (even though > that address is being re-used (effectively "named") each time. > > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > > > Shouldn't all 3 copies have named set to 1 and not zero since they > are all pointing to the same pieces of memory? And shouldn't that > force the top level LANGSXP object to have named of 2 in this case > and not its current value of 1. > > > How should any assignment to any of those 3 places in the LANGSXP > list ever know they must be duplicated first when all of the named > values are 0 even though they all point to the same block of memory? > > I truly do not understand how named is being used in this case. Why > don't we simply refcount all allocated objects so we know what the > true value of named must be? How else can we get that information? > > Hints welcome especially to reading material that explains more on > this stuff. > > Thanks, > > Kevin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Patch: context stack size in gram.y
Hmm, I think you can "flatten" the for-loop with something like this, without modifying R: for(ParamAll in 1:(length01*length02*length03*length*4...)) { idx1 <- as.integer(ParamAll/(length02*length03...)) Param01 <- Param01Set[idx1] idx2 = as.integer((ParamAll - idx1 * length01)/length03*length04...) Param02 <- Param02Set[idx2] ... central code ... } It is the same way generalizing addressing matrix element[i,j] as element[i*length_j + j], etc. Then you won't have over 50 nested for-loops. If you have something that deeply nested, I would also be writing those idx1 <- as.integer(...) in C for speed ( or use % properly, but it is too early in the morning and my head is a bit wuzzy at the moment...), or even the Param01, etc. e.g. Param01 <- .Call("my_element_selector", c(Parm01Set, Param02Set ...), ParamAll) HTL Thomas Dreibholz wrote: > On Wednesday 31 May 2006 15:26, Prof Brian Ripley wrote: >> On Wed, 31 May 2006, Thomas Dreibholz wrote: >>> Hi! >>> >>> Attached to this mail, you find a patch for gram.y setting a #define >>> CONTEXT_STACK_SIZE for the context stack size and replacing the following >>> constants 50 and 49 by CONTEXT_STACK_SIZE and CONTEXT_STACK_SIZE-1. The >>> new #define makes setting the stack size much more easy; I also have >>> increased it to 500, because 50 is too small (we use R to iterate through >>> sets of simulation parameters, which requires a context stack size of >>> around 100). >> I think you will have to explain in detail why you need this, when for a >> decade R users have not reported a need for it. It is not related to >> iteration in R, rather to the depth of recursion needed to parse R code. > > We use R to create input files for OMNeT++ simulations. The simulation > parameters are defined like this: > param01Set <- c(...) > param02Set <- c(...) > ... > paramXYSet <- c(...) > Most of these sets only contain a single element. > > The input file generation, which should be usable for all simulations, works > as follows: > for(param01 in param01Set) { > for(param02 in param02Set) { > ... >for(paramXY in paramXYSet) { > Generate input file for these parameter settings >} > ... > } > } > > The simulation has more than 50 different parameters, so a "contextstack > overflow" error will be the result. Increasing the context stack size in > gram.y solves this problem. (Clearly, only using "for" iterations for sets > consisting of more than one element would solve the problem - but this > requires a special version of the parameter generation function for every > simulation.) > > > Best regards > > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)
On 6/5/2006 5:30 AM, [EMAIL PROTECTED] wrote: >> "FrPi" == François Pinard <[EMAIL PROTECTED]> >> on Sun, 4 Jun 2006 06:27:53 +0200 (CEST) writes: > > FrPi> Hi, people. > FrPi> xy.coords() does not behave like its documentation says, when given > some > FrPi> matrices. ?xy.coords says: > > FrPi> If 'y' is 'NULL' and 'x' is a [...] formula [...] list [...] > FrPi> time series [...] matrix with two columns [...] > > FrPi> In any other case, the 'x' argument is coerced to a vector and > FrPi> returned as *y* component [...] > > FrPi> Now, consider this short transcript: > > FrPi> > ==> > >> as.vector(rbind(1, 2, 3)) > FrPi> [1] 1 2 3 > >> as.vector(cbind(1, 2, 3)) > FrPi> [1] 1 2 3 > >> xy.coords(rbind(1, 2, 3)) > FrPi> $x > FrPi> [1] 1 2 3 > > FrPi> $y > FrPi> [1] 1 2 3 > > FrPi> $xlab > FrPi> [1] "Index" > > FrPi> $ylab > FrPi> NULL > > >> xy.coords(cbind(1, 2, 3)) > FrPi> $x > FrPi> [1] 1 > > FrPi> $y > FrPi> [1] 2 > > FrPi> $xlab > FrPi> [1] "[,1]" > > FrPi> $ylab > FrPi> [1] "[,2]" > > FrPi> > ==< > > FrPi> A 3 x 1 matrix and a 1 x 3 matrix both fall in the "In > FrPi> any other case" category, but it seems that only the 3 x 1 > FrPi> is really "coerced to a vector". > > yes. So you are right: There's a bug > > FrPi> The R code for xy.coord() suggests that the documentation should > read > FrPi> "matrix with at least two columns" instead of "matrix with two > columns". > > FrPi> As a user, I was really expecting the coercion to a > FrPi> vector to happen. What triggered me into exploring > FrPi> this problem is the fact that plot() showed a single > FrPi> point where I was expecting many. If you decide that > FrPi> the code is right and the documentation is wrong, then > FrPi> I would suggest that the code be a bit more friendly, > FrPi> by at least issuing some warning if more than two > FrPi> columns are given to a matrix. > > I agree. > > I'm not sure what the change should be -- and am asking for useR > feedback here : > > 1) give an error in the case of a matrix (or data.frame) with '> 2' columns > 2) give a warning, and use the first 2 columns -- as it happens now > 3) silently coerce to vector -- as the current documentation claims. > > The most clean would be "1)", but given back compatibility, etc, > my tendency would go into the direction of "2)". I think the current behaviour is reasonable, and shouldn't lead to warnings when executed. If you meant a warning in the man page, that would be fine. I'm not so sure about some undocumented behaviour for formulas: x <- 1:10 y <- 11:20 z <- 21:30 xy.coords(y ~ x+z) will set the x column to the sum of x+z. That's not the usual way formulas are handled. I'd be happier with picking out one column, or generating an error, instead. I think the error message might have been the intention, because there's a test if (inherits(x, "formula") && length(x) == 3) but length(y ~ x+z) is 3. I think the test should be if (inherits(x, "formula") && length(x) == 3 && length(x[[2]]) == 1 && length(x[[3]]) == 1) Duncan Murdoch > > > FrPi> Another problem in the same area: the documentation lies about how > the > FrPi> function acts when given a data.frame. From the code, a data.frame > is > FrPi> processed as if it was a matrix. From the documentation, while the > FrPi> data.frame is not mentioned explicitely, it is implied in the > paragraph > FrPi> explaining how a list is processed (because a data.frame is a list). > FrPi> Some reconciliation is needed here as well. > > Yes; in this case, I propose to just amend the documentation > explainining that data.frames are treated "as matrices". > > Thanks a lot, Francois, for your careful reading and > careful report! >[ Though I do slightly mind the word "lies" since > I do value the 9th commandment.. > Not telling the truth *accidentally* is not "lying" ] > > Martin Maechler, ETH Zurich > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] more on bug 7924
Hin-Tak Leung <[EMAIL PROTECTED]> writes: > I see you have found the sexptype listing in Rinternals.h . I believe > it was in one of R's FAQ's about R's garbage collector - it doesn't do > proper reference-counted garbage collection as you suggested, but does > a sort of poor man's garbage collection, by classifying entities in > *only* 3 catergories - not-in-use, in-used-by-one, and in-used > by-more-than-one. Not quite: more like freshly-made-not-assigned, assigned-but-only-once, assigned-maybe-more-than-once. It's also not so much about GC as about modifiability: In the first case, modify at will. In the 2nd case you can modify in an assignment function. In the 3rd case, you must duplicate the object first. Consider f <- function(x){x[3]<-10; x} f(rnorm(10)) b <- rnorm(10) f(b) In the first case, rnorm() returns an unnamed object. (Well, it could. I'm not too sure it actually does.) When the object is passed to f(), it gets named "x", but it is the only copy and the modification to x[3] can proceed safely. In the second case you first assign to b then pass b to f inside of which it is named "x". This proceeds without duplication, so the same object is now assigned twice. Modifying x at this point would cause b to change as well, which would violate pass-by-value semantics. Hence, we need to create a duplicate of x which we can safely change. Unlike Java and Tcl, R doesn't use its refcounts for garbage collection. Partly it is because it is not a true count that you can decrement and use to throw away the object when the count goes to zero. However, it is also problematic to implement in R because we can have reference loops: Consider g <- function(){...whatever...; e <- environment(); ...} Now when g() is called it creates an environment to hold its local variables, and when g finishes, the environment can be destroyed, provided that there are no references to it from other objects. In the above case, we do have a reference to the environment, but it comes from an object that is inside the environment and would be destroyed along with it. A strict refcounting system would leave such environments hanging around forever. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)
[Martin Maechler] > Thanks a lot, Francois, for your careful reading and careful report! Thanks for being receptive! :-) >FrPi> Another problem in the same area: the documentation lies >FrPi> about how the function acts when given a data.frame. From >FrPi> the code, a data.frame is processed as if it was a matrix. >FrPi> From the documentation, while the data.frame is not mentioned >FrPi> explicitely, it is implied in the paragraph explaining how >FrPi> a list is processed (because a data.frame is a list). Some >FrPi> reconciliation is needed here as well. > [ Though I do slightly mind the word "lies" since > I do value the 9th commandment.. > Not telling the truth *accidentally* is not "lying" ] Of course. You know, I merely forgot a smiley, there. You are right in that we should try a bit to spare the extreme susceptibility of some people! On the other hand, there should be limits to the feeling that we are always walking on eggs while writing to R-help or R-devel, some comfort and happiness is needed, after all. :-) >Yes; in this case, I propose to just amend the documentation >explainining that data.frames are treated "as matrices". Let me add a small comment about data.frames. It would be a bit awkward if a data.frame had two columns "y" and "x" (in that order) and if they were interpreted differently after matrix coercion. I guess the problem would not exist if data.frames were really interpreted as lists, the "x" and "y" columns could even appear anywhere (untested). -- François Pinard http://pinard.progiciels-bpi.ca __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] more on bug 7924
Hi, On Jun 5, 2006, at 8:02 AM, Peter Dalgaard wrote: > Not quite: more like freshly-made-not-assigned, > assigned-but-only-once, assigned-maybe-more-than-once. So for my particular case ... > call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)] > 0: 0x9e7d18 LANGSXP Object with length 1, named 1 > f(arg[[1]], arg[[1]], arg[[1]]) > 1: 0xa2b488 SYMSXP name at 0xa29408, value at 0x5087e0, named 0 > f > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > 1: 0x9e9880 LANGSXP Object with length 1, named 0 > arg[[1]] > 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 > `[[` > 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 > arg > 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 > 1 > The highest level LANGSXP list object has been named (1) but the sub LANGSXP object stored at 0x9e9880 is assigned to 3 places in the same top level LANGSXP list, and yet the named values of that subobject is 0 (in all cases). According to your descriptions above, I would consider this an "error" in setting named when the object is created? Is my interpretation correct? If so, for this particular case, I think that reused subobject should have had named = 2 since it is used in 3 places in the list? What would be the proper setting for the named value for all of the sub- sub- objects of that 0x9e9880 object? Also what are the rules about having subobject with named = 2 inside a higher level object? Should that force the higher level object named value to be 2 or can it stay 1? Any help in understanding this would be greatly appreciated since I can not track down a bug when I am not sure what the "correct" values/ answers really should be and nothing in the R-lang.pdf or R-exts.pdf seem to explain this concept in any detail, especially for compound objects (it is much simpler to understand for objects that are just vectors of reals, integers, or strings, since there really is only one "object" that has a data area which stores all of the values (and AFAIK none of those stored ints, reals, or strings stored inside the vector object has a named property themselves). So would someone please explain what the "proper" values for all of the named values for all of the objects in this "call2" object should be immediately after it is created. Thanks, Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] more on bug 7924
On Mon, 5 Jun 2006, Hin-Tak Leung wrote: > I see you have found the sexptype listing in Rinternals.h . I believe > it was in one of R's FAQ's about R's garbage collector - it doesn't do > proper reference-counted garbage collection as you suggested, but does > a sort of poor man's garbage collection, by classifying entities in > *only* 3 catergories - not-in-use, in-used-by-one, and in-used > by-more-than-one. AFAIK the NAMED field is not used at all by the garbage collector and that certainly isn't what it's there for. The garbage collector is a generational mark-and-sweep collector, not reference counted at all. NAMED is about preserving the "call-by-value illusion" -- an object with NAMED=0 or 1 can be modified without copying it -- which seems to be exactly the problem in PR#7924. -thomas > Kevin B. Hendricks wrote: >> Hi, >> >> Okay I threw together a quick dump_object routine and found something >> that I don't think is correct when call2 is created. >> >> > call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)] >> > get("call2") >> >> I use the do_get break to find the SEXP value I want >> >> Breakpoint 1, do_get (call=0xc2d530, op=0x52bd30, args=0x9e83a8, >> rho=Variable "rho" is not available. >> ) at ../../../r-devel/r-devel/R/src/main/envir.c:1668 >> 1668if (PRIMVAL(op)) { /* have get(.) */ >> >> >> (gdb) print *rval >> $2 = {sxpinfo = {type = 6, obj = 0, named = 1, gp = 0, mark = 0, >> debug = 0, trace = 0, fin = 0, gcgen = 0, gccls = 0}, attrib = >> 0x508818, gengc_next_node = 0x9e7d50, >>gengc_prev_node = 0x9e7ce0, u = {primsxp = {offset = 10663048}, >> symsxp = {pname = 0xa2b488, value = 0x9e7ce0, internal = 0x508818}, >> listsxp = {carval = 0xa2b488, >>cdrval = 0x9e7ce0, tagval = 0x508818}, envsxp = {frame = >> 0xa2b488, enclos = 0x9e7ce0, hashtab = 0x508818}, closxp = {formals = >> 0xa2b488, body = 0x9e7ce0, >>env = 0x508818}, promsxp = {value = 0xa2b488, expr = 0x9e7ce0, >> env = 0x508818}}} >> >> >> Now I invoke my own dump routine which keeps track of recursion level >> and will dump the named and other things inside the newly created >> object, the format of the output is >> >> recursion level: SEXP X TYPEOF(X) and then some object specific values >> >> >> (gdb) call dump_object(rval, 0) >> >> >> 0: 0x9e7d18 LANGSXP Object with length 1, named 1 >> f(arg[[1]], arg[[1]], arg[[1]]) >> 1: 0xa2b488 SYMSXP name at 0xa29408, value at 0x5087e0, named 0 >> f >> 1: 0x9e9880 LANGSXP Object with length 1, named 0 >> arg[[1]] >> 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 >> `[[` >> 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 >> arg >> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 >> 1 >> 1: 0x9e9880 LANGSXP Object with length 1, named 0 >> arg[[1]] >> 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 >> `[[` >> 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 >> arg >> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 >> 1 >> 1: 0x9e9880 LANGSXP Object with length 1, named 0 >> arg[[1]] >> 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 >> `[[` >> 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 >> arg >> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 >> 1 >> >> >> >> Notice how each LANGSXP subobject reuses the exact same objects/ >> addresses (notice the address are the same) 3 times (one for each >> entry) but the named value is always 0 for all of them (even though >> that address is being re-used (effectively "named") each time. >> >> 1: 0x9e9880 LANGSXP Object with length 1, named 0 >> arg[[1]] >> 2: 0x508738 SYMSXP name at 0x51c788, value at 0x527690, named 0 >> `[[` >> 2: 0xc37cc8 SYMSXP name at 0xc376e8, value at 0x5087e0, named 0 >> arg >> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0 >> 1 >> >> >> Shouldn't all 3 copies have named set to 1 and not zero since they >> are all pointing to the same pieces of memory? And shouldn't that >> force the top level LANGSXP object to have named of 2 in this case >> and not its current value of 1. >> >> >> How should any assignment to any of those 3 places in the LANGSXP >> list ever know they must be duplicated first when all of the named >> values are 0 even though they all point to the same block of memory? >> >> I truly do not understand how named is being used in this case. Why >> don't we simply refcount all allocated objects so we know what the >> true value of named must be? How else can we get that information? >> >> Hints welcome especially to reading material that explains more on >> this stuff. >> >> Thanks, >> >> Kevin >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project
[Rd] grep() and factors
Hi all, Based upon an offlist communication this morning, I am somewhat confused (more than I usually am on most Monday mornings...) about the use of grep() with factors as the 'x' argument. The argument guidance in ?grep indicates: x, text a character vector where matches are sought. Coerced to character if possible. and in the Details section: Arguments which should be character strings or character vectors are coerced to character if possible. The wording of both would seem to reasonably lead to the conclusion that a factor could be coerced to a character vector by the use of as.character(FACTOR). In tracing through the C code in character.c for do_grep(), which in turn calls coerceVector() in coerce.c, unless I am mis-reading the code (always possible), I don't see an indication that a factor would be coerced to a character vector. Since a factor -> character coercion would seem at face value, the most logical coercion to take place when using grep(), I am curious if I am missing something, or if perhaps ?grep needs to be more clear in the coercions that will or might take place. Perhaps even the consideration of an error message if a factor is passed as the 'x' argument, if indeed the coercion would not take place. Perhaps the easiest example here might be: # On R Version 2.3.1 (2006-06-01) on FC5 > grep("[a-z]", letters) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 > grep("[a-z]", factor(letters)) numeric(0) Thanks for any comments or any virtual rotten tomatoes coming my way at high speed. :-) Marc Schwartz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)
> "FrPi" == François Pinard <[EMAIL PROTECTED]> > on Mon, 5 Jun 2006 08:11:20 -0400 writes: FrPi> [Martin Maechler] >> Thanks a lot, Francois, for your careful reading and >> careful report! FrPi> Thanks for being receptive! :-) FrPi> Another problem in the same area: the documentation FrPi> lies about how the function acts when given a FrPi> data.frame. From the code, a data.frame is processed FrPi> as if it was a matrix. From the documentation, while FrPi> the data.frame is not mentioned explicitely, it is FrPi> implied in the paragraph explaining how a list is FrPi> processed (because a data.frame is a list). Some FrPi> reconciliation is needed here as well. >> [ Though I do slightly mind the word "lies" since I do >> value the 9th commandment.. Not telling the truth >> *accidentally* is not "lying" ] FrPi> Of course. You know, I merely forgot a smiley, there. FrPi> You are right in that we should try a bit to spare the FrPi> extreme susceptibility of some people! On the other FrPi> hand, there should be limits to the feeling that we FrPi> are always walking on eggs while writing to R-help or FrPi> R-devel, some comfort and happiness is needed, after FrPi> all. :-) >> Yes; in this case, I propose to just amend the >> documentation explainining that data.frames are treated >> "as matrices". FrPi> Let me add a small comment about data.frames. It FrPi> would be a bit awkward if a data.frame had two columns FrPi> "y" and "x" (in that order) and if they were FrPi> interpreted differently after matrix coercion. I FrPi> guess the problem would not exist if data.frames were FrPi> really interpreted as lists, the "x" and "y" columns FrPi> could even appear anywhere (untested). you are right, but, as I now checked. in S xy.coords() also behaves as it does now in R, BTW, in both respects {data.frames and ">2-column" matrices and d.f.s} Hence --- for back compatibility reasons -- I now tend to agree with Duncan Murdoch, and would not change xy.coords() behavior at all, but simply amend the documentation. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grep() and factors
On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote: > Based upon an offlist communication this morning, I am somewhat confused > (more than I usually am on most Monday mornings...) about the use of > grep() with factors as the 'x' argument. > ... > > grep("[a-z]", letters) > [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > [23] 23 24 25 26 > > > grep("[a-z]", factor(letters)) > numeric(0) I was recently surprised by this also. In addition, if R's grep did support factors in this way, what sort of object (factor or character) should it return when value=T? I recently changed Splus's grep to return a character vector in that case. Splus> grep("[def]", letters[26:1]) [1] 21 22 23 Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1])) [1] 21 22 23 Splus> grep("[def]", letters[26:1], value=T) [1] "f" "e" "d" Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T) [1] "f" "e" "d" Splus> class(.Last.value) [1] "character" R does this when grepping an integer vector. R> grep("1", 0:11, value=T) [1] "1" "10" "11" help(grep) says it returns "the matching elements themselves", but doesn't say if "themselves" means before or after the conversion to character. Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position." __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)
[Martin Maechler] >Hence --- for back compatibility reasons -- I now tend to agree >with Duncan Murdoch, and would not change xy.coords() behavior >at all, but simply amend the documentation. It's fine, thanks a lot. -- François Pinard http://pinard.progiciels-bpi.ca __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grep() and factors
On Mon, 2006-06-05 at 13:45 -0700, Bill Dunlap wrote: > On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote: > > > Based upon an offlist communication this morning, I am somewhat confused > > (more than I usually am on most Monday mornings...) about the use of > > grep() with factors as the 'x' argument. > > ... > > > grep("[a-z]", letters) > > [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > > [23] 23 24 25 26 > > > > > grep("[a-z]", factor(letters)) > > numeric(0) > > I was recently surprised by this also. In addition, if > R's grep did support factors in this way, what sort of > object (factor or character) should it return when value=T? > I recently changed Splus's grep to return a character vector in > that case. > >Splus> grep("[def]", letters[26:1]) >[1] 21 22 23 >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1])) >[1] 21 22 23 >Splus> grep("[def]", letters[26:1], value=T) >[1] "f" "e" "d" >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T) >[1] "f" "e" "d" >Splus> class(.Last.value) >[1] "character" > > R does this when grepping an integer vector. >R> grep("1", 0:11, value=T) >[1] "1" "10" "11" > help(grep) says it returns "the matching elements themselves", but > doesn't say if "themselves" means before or after the conversion to > character. Bill, My first inclination for the return value when used on a factor would be the indexed factor elements where grep() would otherwise simply return the indices. This would also maintain the factor levels from the original source factor since "[".factor would normally retain these when drop = FALSE. For example: # Return the indexed values as would otherwise be done # in grep() if the factor to character coercion takes place: # Use the same indices 21:23 as above > factor(letters[26:1], levels = letters[26:1])[21:23] [1] f e d Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a >From my read of the C code in do_grep() in character.c (again, if correct), when 'value = TRUE', the C code appears to first get the indices and then build the returned vector from the indexed values from the source vector in a for() loop. So this should not be a problem philosophically. However, given your example of the coercion of integers, perhaps with grep() at least, consistent behavior would dictate that return values are always character vectors. These could then be coerced manually back to a factor, using the original levels, as may be required: > factor.letters <- factor(letters[26:1], levels=letters[26:1]) > factor.letters [1] z y x w v u t s r q p o n m l k j i h g f e d c b a Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a > grep("[def]", as.character(factor.letters)) [1] 21 22 23 > res <- grep("[def]", as.character(factor.letters), value = TRUE) > res [1] "f" "e" "d" > factor(res, levels = levels(factor.letters)) [1] f e d Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a Which of course is the same result I proposed initially above. I could be convinced either way. The concern of course being that (given the offlist replies I have received today) even experienced users are getting bitten by the current behavior versus their intuitive expectations, which are at least loosely supported by the documentation. HTH, Marc Schwartz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grep() and factors
Marc Schwartz (via MN) wrote: > On Mon, 2006-06-05 at 13:45 -0700, Bill Dunlap wrote: > >>On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote: >> >> >>>Based upon an offlist communication this morning, I am somewhat confused >>>(more than I usually am on most Monday mornings...) about the use of >>>grep() with factors as the 'x' argument. >>> ... >>> grep("[a-z]", letters) >>> >>> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 >>>[23] 23 24 25 26 >>> >>> grep("[a-z]", factor(letters)) >>> >>>numeric(0) >> >>I was recently surprised by this also. In addition, if >>R's grep did support factors in this way, what sort of >>object (factor or character) should it return when value=T? >>I recently changed Splus's grep to return a character vector in >>that case. >> >> Splus> grep("[def]", letters[26:1]) >> [1] 21 22 23 >> Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1])) >> [1] 21 22 23 >> Splus> grep("[def]", letters[26:1], value=T) >> [1] "f" "e" "d" >> Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T) >> [1] "f" "e" "d" >> Splus> class(.Last.value) >> [1] "character" >> >>R does this when grepping an integer vector. >> R> grep("1", 0:11, value=T) >> [1] "1" "10" "11" >>help(grep) says it returns "the matching elements themselves", but >>doesn't say if "themselves" means before or after the conversion to >>character. > > > Bill, > > My first inclination for the return value when used on a factor would be > the indexed factor elements where grep() would otherwise simply return > the indices. This would also maintain the factor levels from the > original source factor since "[".factor would normally retain these when > drop = FALSE. > > For example: > > # Return the indexed values as would otherwise be done > # in grep() if the factor to character coercion takes place: > # Use the same indices 21:23 as above > > >>factor(letters[26:1], levels = letters[26:1])[21:23] > > [1] f e d > Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a > > > >>From my read of the C code in do_grep() in character.c (again, if > correct), when 'value = TRUE', the C code appears to first get the > indices and then build the returned vector from the indexed values from > the source vector in a for() loop. So this should not be a problem > philosophically. > > However, given your example of the coercion of integers, perhaps with > grep() at least, consistent behavior would dictate that return values > are always character vectors. These could then be coerced manually back > to a factor, using the original levels, as may be required: > > >>factor.letters <- factor(letters[26:1], levels=letters[26:1]) >>factor.letters > > [1] z y x w v u t s r q p o n m l k j i h g f e d c b a > Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a > > >>grep("[def]", as.character(factor.letters)) > > [1] 21 22 23 > > >>res <- grep("[def]", as.character(factor.letters), value = TRUE) >>res > > [1] "f" "e" "d" > > >>factor(res, levels = levels(factor.letters)) > > [1] f e d > Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a > > Which of course is the same result I proposed initially above. > > I could be convinced either way. The concern of course being that (given > the offlist replies I have received today) even experienced users are > getting bitten by the current behavior versus their intuitive > expectations, which are at least loosely supported by the documentation. I'll chime in on-list to say that I have had the same experience with expecting grep to coerce to text. Despite the question of return values, I think of grep (not equivalent to the unix command, I understand, but it does have the same name) as operating on "text", not the factor levels themselves. Not a big deal, but it does lead to sometimes hard to track bugs if one is not careful to put in as.character all the time. Sean __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grep() and factors
On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote: > > > > grep("[a-z]", factor(letters)) > > > numeric(0) > > > > I was recently surprised by this also. In addition, if > > R's grep did support factors in this way, what sort of > > object (factor or character) should it return when value=T? > > I recently changed Splus's grep to return a character vector in > > that case. > > > >Splus> grep("[def]", letters[26:1]) > >[1] 21 22 23 > >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1])) > >[1] 21 22 23 > >Splus> grep("[def]", letters[26:1], value=T) > >[1] "f" "e" "d" > >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), > > value=T) > >[1] "f" "e" "d" > >Splus> class(.Last.value) > >[1] "character" > > > > R does this when grepping an integer vector. > >R> grep("1", 0:11, value=T) > >[1] "1" "10" "11" > > help(grep) says it returns "the matching elements themselves", but > > doesn't say if "themselves" means before or after the conversion to > > character. > > Bill, > > My first inclination for the return value when used on a factor would be > the indexed factor elements where grep() would otherwise simply return > the indices. This would also maintain the factor levels from the > original source factor since "[".factor would normally retain these when > drop = FALSE. That would be my first inclination also. I would have expected the output of grep(pattern, text, value=TRUE) to be identical to that of text[grep(pattern, text, value=FALSE)] no matter what class text has. No end users have seen this in Splus so we can change it to anything, but we want to keep it the same as R's. > I could be convinced either way. The concern of course being that (given > the offlist replies I have received today) even experienced users are > getting bitten by the current behavior versus their intuitive > expectations, which are at least loosely supported by the documentation. > > HTH, > > Marc Schwartz Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position." __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grep() and factors
On 6/5/06, Bill Dunlap <[EMAIL PROTECTED]> wrote: > On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote: > > > > > > grep("[a-z]", factor(letters)) > > > > numeric(0) > > > > > > I was recently surprised by this also. In addition, if > > > R's grep did support factors in this way, what sort of > > > object (factor or character) should it return when value=T? > > > I recently changed Splus's grep to return a character vector in > > > that case. > > > > > >Splus> grep("[def]", letters[26:1]) > > >[1] 21 22 23 > > >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1])) > > >[1] 21 22 23 > > >Splus> grep("[def]", letters[26:1], value=T) > > >[1] "f" "e" "d" > > >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), > > > value=T) > > >[1] "f" "e" "d" > > >Splus> class(.Last.value) > > >[1] "character" > > > > > > R does this when grepping an integer vector. > > >R> grep("1", 0:11, value=T) > > >[1] "1" "10" "11" > > > help(grep) says it returns "the matching elements themselves", but > > > doesn't say if "themselves" means before or after the conversion to > > > character. > > > > Bill, > > > > My first inclination for the return value when used on a factor would be > > the indexed factor elements where grep() would otherwise simply return > > the indices. This would also maintain the factor levels from the > > original source factor since "[".factor would normally retain these when > > drop = FALSE. > > That would be my first inclination also. I would have expected the output of > grep(pattern, text, value=TRUE) > to be identical to that of > text[grep(pattern, text, value=FALSE)] > no matter what class text has. > > No end users have seen this in Splus so we can change it to anything, > but we want to keep it the same as R's. > > > I could be convinced either way. The concern of course being that (given > > the offlist replies I have received today) even experienced users are > > getting bitten by the current behavior versus their intuitive > > expectations, which are at least loosely supported by the documentation. > > I would have expected If non-character text arguments are accepted I would have expected that they be coerced to character so that grep(pattern, text, ...) would return the same result as grep(pattern, as.character(text), ...) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] FYI: R-2.3.1pat-win32.exe not on CRAN
FYI, the download link on CRAN for R-2.3.1pat-win32.exe and related files seems to be broken at least since yesterday, e.g. http://cran.at.r-project.org/bin/windows/base/R-2.3.1pat-win32.exe. /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] FYI: R-2.3.1pat-win32.exe not on CRAN
On 6/5/2006 10:04 PM, Henrik Bengtsson wrote: > FYI, the download link on CRAN for R-2.3.1pat-win32.exe and related > files seems to be broken at least since yesterday, e.g. > http://cran.at.r-project.org/bin/windows/base/R-2.3.1pat-win32.exe. Thanks, it was a typo in the upload script. Fixed now, so the files should be available within a day or so. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel