Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)

2006-06-05 Thread maechler
> "FrPi" == François Pinard <[EMAIL PROTECTED]>
> on Sun,  4 Jun 2006 06:27:53 +0200 (CEST) writes:

FrPi> Hi, people.
FrPi> xy.coords() does not behave like its documentation says, when given 
some
FrPi> matrices.   ?xy.coords says:

FrPi> If 'y' is 'NULL' and 'x' is a [...] formula [...] list [...]
FrPi> time series [...] matrix with two columns [...]

FrPi> In any other case, the 'x' argument is coerced to a vector and
FrPi> returned as *y* component [...]

FrPi> Now, consider this short transcript:

FrPi> 
==>
>> as.vector(rbind(1, 2, 3))
FrPi> [1] 1 2 3
>> as.vector(cbind(1, 2, 3))
FrPi> [1] 1 2 3
>> xy.coords(rbind(1, 2, 3))
FrPi> $x
FrPi> [1] 1 2 3

FrPi> $y
FrPi> [1] 1 2 3

FrPi> $xlab
FrPi> [1] "Index"

FrPi> $ylab
FrPi> NULL

>> xy.coords(cbind(1, 2, 3))
FrPi> $x
FrPi> [1] 1

FrPi> $y
FrPi> [1] 2

FrPi> $xlab
FrPi> [1] "[,1]"

FrPi> $ylab
FrPi> [1] "[,2]"

FrPi> 
==<

FrPi> A 3 x 1 matrix and a 1 x 3 matrix both fall in the "In
FrPi> any other case" category, but it seems that only the 3 x 1
FrPi> is really "coerced to a vector".

yes. So you are right: There's a bug

FrPi> The R code for xy.coord() suggests that the documentation should read
FrPi> "matrix with at least two columns" instead of "matrix with two 
columns".

FrPi> As a user, I was really expecting the coercion to a
FrPi> vector to happen.  What triggered me into exploring
FrPi> this problem is the fact that plot() showed a single
FrPi> point where I was expecting many.  If you decide that
FrPi> the code is right and the documentation is wrong, then
FrPi> I would suggest that the code be a bit more friendly,
FrPi> by at least issuing some warning if more than two
FrPi> columns are given to a matrix.

I agree.  

I'm not sure what the change should be -- and am asking for useR
feedback here :

1) give an error in the case of a matrix (or data.frame) with '> 2' columns
2) give a warning, and use the first 2 columns -- as it happens now
3) silently coerce to vector -- as the current documentation claims.

The most clean would be "1)", but given back compatibility, etc,
my tendency would go into the direction of "2)".


FrPi> Another problem in the same area: the documentation lies about how the
FrPi> function acts when given a data.frame.  From the code, a data.frame is
FrPi> processed as if it was a matrix.  From the documentation, while the
FrPi> data.frame is not mentioned explicitely, it is implied in the 
paragraph
FrPi> explaining how a list is processed (because a data.frame is a list).
FrPi> Some reconciliation is needed here as well.

Yes; in this case, I propose to just amend the documentation
explainining that data.frames are treated "as matrices".

Thanks a lot, Francois, for your careful reading and
careful report!
   [ Though I do slightly mind the word "lies" since
 I do value the 9th commandment..
 Not telling the truth *accidentally* is not "lying" ] 

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more on bug 7924

2006-06-05 Thread Hin-Tak Leung
I see you have found the sexptype listing in Rinternals.h . I believe
it was in one of R's FAQ's about R's garbage collector - it doesn't do
proper reference-counted garbage collection as you suggested, but does
a sort of poor man's garbage collection, by classifying entities in
*only* 3 catergories - not-in-use, in-used-by-one, and in-used 
by-more-than-one.

Kevin B. Hendricks wrote:
> Hi,
> 
> Okay I threw together a quick dump_object routine and found something  
> that I don't think is correct when call2 is created.
> 
>  > call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)]
>  > get("call2")
> 
> I use the do_get break to find the SEXP value I want
> 
> Breakpoint 1, do_get (call=0xc2d530, op=0x52bd30, args=0x9e83a8,  
> rho=Variable "rho" is not available.
> ) at ../../../r-devel/r-devel/R/src/main/envir.c:1668
> 1668if (PRIMVAL(op)) { /* have get(.) */
> 
> 
> (gdb) print *rval
> $2 = {sxpinfo = {type = 6, obj = 0, named = 1, gp = 0, mark = 0,  
> debug = 0, trace = 0, fin = 0, gcgen = 0, gccls = 0}, attrib =  
> 0x508818, gengc_next_node = 0x9e7d50,
>gengc_prev_node = 0x9e7ce0, u = {primsxp = {offset = 10663048},  
> symsxp = {pname = 0xa2b488, value = 0x9e7ce0, internal = 0x508818},  
> listsxp = {carval = 0xa2b488,
>cdrval = 0x9e7ce0, tagval = 0x508818}, envsxp = {frame =  
> 0xa2b488, enclos = 0x9e7ce0, hashtab = 0x508818}, closxp = {formals =  
> 0xa2b488, body = 0x9e7ce0,
>env = 0x508818}, promsxp = {value = 0xa2b488, expr = 0x9e7ce0,  
> env = 0x508818}}}
> 
> 
> Now I invoke my own dump routine which keeps track of recursion level  
> and will dump the named and other things inside the newly created  
> object, the format of the output is
> 
> recursion level: SEXP X TYPEOF(X) and then some object specific values
> 
> 
> (gdb) call dump_object(rval, 0)
> 
> 
> 0: 0x9e7d18 LANGSXP Object with length 1, named 1
>  f(arg[[1]], arg[[1]], arg[[1]])
> 1: 0xa2b488 SYMSXP  name at 0xa29408, value at 0x5087e0, named 0
>  f
> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1
> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1
> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1
> 
> 
> 
> Notice how each LANGSXP subobject reuses the exact same objects/ 
> addresses (notice the address are the same) 3 times (one for each  
> entry) but the named value is always 0 for all of them (even though  
> that address is being re-used (effectively "named") each time.
> 
> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1
> 
> 
> Shouldn't all 3 copies have named set to 1 and not zero since they  
> are all pointing to the same pieces of memory?  And shouldn't that  
> force the top level LANGSXP object to have named of 2 in this case  
> and not its current value of 1.
> 
> 
> How should any assignment to any of those 3 places in the LANGSXP  
> list ever know they must be duplicated first when all of the named  
> values are 0 even though they all  point to the same block of memory?
> 
> I truly do not understand how named is being used in this case.  Why  
> don't we simply refcount all allocated objects so we know what the  
> true value of named must be?  How else can we get that information?
> 
> Hints welcome especially to reading material that explains more on  
> this stuff.
> 
> Thanks,
> 
> Kevin
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Patch: context stack size in gram.y

2006-06-05 Thread Hin-Tak Leung
Hmm, I think you can "flatten" the for-loop with something like this, 
without modifying R:

for(ParamAll in 1:(length01*length02*length03*length*4...)) {

idx1 <- as.integer(ParamAll/(length02*length03...))
Param01 <- Param01Set[idx1]
idx2 = as.integer((ParamAll - idx1 * length01)/length03*length04...)
Param02 <- Param02Set[idx2]
...
central code
...
}

It is the same way generalizing addressing matrix element[i,j] as
element[i*length_j + j], etc. Then you won't have over 50 nested
for-loops. If you have something that deeply nested, I would also be 
writing those idx1 <- as.integer(...) in C for speed ( or use
% properly, but it is too early in the morning and my head is a bit 
wuzzy at the moment...), or even the Param01, etc. e.g.

Param01 <- .Call("my_element_selector", c(Parm01Set, Param02Set ...), 
ParamAll)

HTL

Thomas Dreibholz wrote:
> On Wednesday 31 May 2006 15:26, Prof Brian Ripley wrote:
>> On Wed, 31 May 2006, Thomas Dreibholz wrote:
>>> Hi!
>>>
>>> Attached to this mail, you find a patch for gram.y setting a #define
>>> CONTEXT_STACK_SIZE for the context stack size and replacing the following
>>> constants 50 and 49 by CONTEXT_STACK_SIZE and CONTEXT_STACK_SIZE-1. The
>>> new #define makes setting the stack size much more easy; I also have
>>> increased it to 500, because 50 is too small (we use R to iterate through
>>> sets of simulation parameters, which requires a context stack size of
>>> around 100).
>> I think you will have to explain in detail why you need this, when for a
>> decade R users have not reported a need for it.  It is not related to
>> iteration in R, rather to the depth of recursion needed to parse R code.
> 
> We use R to create input files for OMNeT++ simulations. The simulation 
> parameters are defined like this:
> param01Set <- c(...)
> param02Set <- c(...)
> ...
> paramXYSet <- c(...)
> Most of these sets only contain a single element.
> 
> The input file generation, which should be usable for all simulations, works 
> as follows:
> for(param01 in param01Set) {
>  for(param02 in param02Set) {
>   ...
>for(paramXY in paramXYSet) {
>  Generate input file for these parameter settings
>}
>   ...
>  }
> }
> 
> The simulation has more than 50 different parameters, so a "contextstack 
> overflow" error will be the result. Increasing the context stack size in 
> gram.y solves this problem. (Clearly, only using "for" iterations for sets 
> consisting of more than one element would solve the problem - but this 
> requires a special version of the parameter generation function for every 
> simulation.)
> 
> 
> Best regards
> 
> 
> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)

2006-06-05 Thread Duncan Murdoch
On 6/5/2006 5:30 AM, [EMAIL PROTECTED] wrote:
>> "FrPi" == François Pinard <[EMAIL PROTECTED]>
>> on Sun,  4 Jun 2006 06:27:53 +0200 (CEST) writes:
> 
> FrPi> Hi, people.
> FrPi> xy.coords() does not behave like its documentation says, when given 
> some
> FrPi> matrices.   ?xy.coords says:
> 
> FrPi> If 'y' is 'NULL' and 'x' is a [...] formula [...] list [...]
> FrPi> time series [...] matrix with two columns [...]
> 
> FrPi> In any other case, the 'x' argument is coerced to a vector and
> FrPi> returned as *y* component [...]
> 
> FrPi> Now, consider this short transcript:
> 
> FrPi> 
> ==>
> >> as.vector(rbind(1, 2, 3))
> FrPi> [1] 1 2 3
> >> as.vector(cbind(1, 2, 3))
> FrPi> [1] 1 2 3
> >> xy.coords(rbind(1, 2, 3))
> FrPi> $x
> FrPi> [1] 1 2 3
> 
> FrPi> $y
> FrPi> [1] 1 2 3
> 
> FrPi> $xlab
> FrPi> [1] "Index"
> 
> FrPi> $ylab
> FrPi> NULL
> 
> >> xy.coords(cbind(1, 2, 3))
> FrPi> $x
> FrPi> [1] 1
> 
> FrPi> $y
> FrPi> [1] 2
> 
> FrPi> $xlab
> FrPi> [1] "[,1]"
> 
> FrPi> $ylab
> FrPi> [1] "[,2]"
> 
> FrPi> 
> ==<
> 
> FrPi> A 3 x 1 matrix and a 1 x 3 matrix both fall in the "In
> FrPi> any other case" category, but it seems that only the 3 x 1
> FrPi> is really "coerced to a vector".
> 
> yes. So you are right: There's a bug
> 
> FrPi> The R code for xy.coord() suggests that the documentation should 
> read
> FrPi> "matrix with at least two columns" instead of "matrix with two 
> columns".
> 
> FrPi> As a user, I was really expecting the coercion to a
> FrPi> vector to happen.  What triggered me into exploring
> FrPi> this problem is the fact that plot() showed a single
> FrPi> point where I was expecting many.  If you decide that
> FrPi> the code is right and the documentation is wrong, then
> FrPi> I would suggest that the code be a bit more friendly,
> FrPi> by at least issuing some warning if more than two
> FrPi> columns are given to a matrix.
> 
> I agree.  
> 
> I'm not sure what the change should be -- and am asking for useR
> feedback here :
> 
> 1) give an error in the case of a matrix (or data.frame) with '> 2' columns
> 2) give a warning, and use the first 2 columns -- as it happens now
> 3) silently coerce to vector -- as the current documentation claims.
> 
> The most clean would be "1)", but given back compatibility, etc,
> my tendency would go into the direction of "2)".

I think the current behaviour is reasonable, and shouldn't lead to 
warnings when executed.  If you meant a warning in the man page, that 
would be fine.  

I'm not so sure about some undocumented behaviour for formulas:

x <- 1:10
y <- 11:20
z <- 21:30
xy.coords(y ~ x+z)

will set the x column to the sum of x+z.  That's not the usual way 
formulas are handled.  I'd be happier with picking out one column, or 
generating an error, instead.

I think the error message might have been the intention, because there's 
a test

  if (inherits(x, "formula") && length(x) == 3)

but length(y ~ x+z) is 3.  I think the test should be

  if (inherits(x, "formula") && length(x) == 3 && length(x[[2]]) == 1 && 
length(x[[3]]) == 1)

Duncan Murdoch
> 
> 
> FrPi> Another problem in the same area: the documentation lies about how 
> the
> FrPi> function acts when given a data.frame.  From the code, a data.frame 
> is
> FrPi> processed as if it was a matrix.  From the documentation, while the
> FrPi> data.frame is not mentioned explicitely, it is implied in the 
> paragraph
> FrPi> explaining how a list is processed (because a data.frame is a list).
> FrPi> Some reconciliation is needed here as well.
> 
> Yes; in this case, I propose to just amend the documentation
> explainining that data.frames are treated "as matrices".
> 
> Thanks a lot, Francois, for your careful reading and
> careful report!
>[ Though I do slightly mind the word "lies" since
>  I do value the 9th commandment..
>  Not telling the truth *accidentally* is not "lying" ] 
> 
> Martin Maechler, ETH Zurich
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more on bug 7924

2006-06-05 Thread Peter Dalgaard
Hin-Tak Leung <[EMAIL PROTECTED]> writes:

> I see you have found the sexptype listing in Rinternals.h . I believe
> it was in one of R's FAQ's about R's garbage collector - it doesn't do
> proper reference-counted garbage collection as you suggested, but does
> a sort of poor man's garbage collection, by classifying entities in
> *only* 3 catergories - not-in-use, in-used-by-one, and in-used 
> by-more-than-one.

Not quite: more like freshly-made-not-assigned,
assigned-but-only-once, assigned-maybe-more-than-once. 

It's also not so much about GC as about modifiability: In the first
case, modify at will. In the 2nd case you can modify in an assignment
function. In the 3rd case, you must duplicate the object first.

Consider

f <- function(x){x[3]<-10; x}

f(rnorm(10))

b <- rnorm(10)
f(b)

In the first case, rnorm() returns an unnamed object. (Well, it could.
I'm not too sure it actually does.) When the object is passed to f(),
it gets named "x", but it is the only copy and the modification to
x[3] can proceed safely.

In the second case you first assign to b then pass b to f inside of
which it is named "x". This proceeds without duplication, so the same
object is now assigned twice. Modifying x at this point would cause b
to change as well, which would violate pass-by-value semantics. Hence,
we need to create a duplicate of x which we can safely change.

Unlike Java and Tcl, R doesn't use its refcounts for garbage
collection. Partly it is because it is not a true count that you can
decrement and use to throw away the object when the count goes to
zero. However, it is also problematic to implement in R because we can
have reference loops: Consider

g <- function(){...whatever...; e <- environment(); ...}

Now when g() is called it creates an environment to hold its local
variables, and when g finishes, the environment can be destroyed,
provided that there are no references to it from other objects. In the
above case, we do have a reference to the environment,  but it comes
from an object that is inside the environment and would be destroyed
along with it. A strict refcounting system would leave such
environments hanging around forever.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)

2006-06-05 Thread François Pinard
[Martin Maechler]

> Thanks a lot, Francois, for your careful reading and careful report!

Thanks for being receptive! :-)

>FrPi> Another problem in the same area: the documentation lies
>FrPi> about how the function acts when given a data.frame.  From
>FrPi> the code, a data.frame is processed as if it was a matrix.
>FrPi> From the documentation, while the data.frame is not mentioned
>FrPi> explicitely, it is implied in the paragraph explaining how
>FrPi> a list is processed (because a data.frame is a list).  Some
>FrPi> reconciliation is needed here as well.

>   [ Though I do slightly mind the word "lies" since
> I do value the 9th commandment..
> Not telling the truth *accidentally* is not "lying" ]

Of course.  You know, I merely forgot a smiley, there.  You are right in 
that we should try a bit to spare the extreme susceptibility of some 
people!  On the other hand, there should be limits to the feeling that 
we are always walking on eggs while writing to R-help or R-devel, some 
comfort and happiness is needed, after all. :-)

>Yes; in this case, I propose to just amend the documentation
>explainining that data.frames are treated "as matrices".

Let me add a small comment about data.frames.  It would be a bit awkward 
if a data.frame had two columns "y" and "x" (in that order) and if they 
were interpreted differently after matrix coercion.  I guess the problem 
would not exist if data.frames were really interpreted as lists, the "x" 
and "y" columns could even appear anywhere (untested).

-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more on bug 7924

2006-06-05 Thread Kevin B. Hendricks
Hi,

On Jun 5, 2006, at 8:02 AM, Peter Dalgaard wrote:

> Not quite: more like freshly-made-not-assigned,
> assigned-but-only-once, assigned-maybe-more-than-once.


So for my particular case ...

> call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)]


> 0: 0x9e7d18 LANGSXP Object with length 1, named 1
>  f(arg[[1]], arg[[1]], arg[[1]])
> 1: 0xa2b488 SYMSXP  name at 0xa29408, value at 0x5087e0, named 0
>  f

> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1

> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1

> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>  arg[[1]]
> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>  `[[`
> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>  arg
> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>  1
>


The highest level LANGSXP list object has been named (1) but the sub  
LANGSXP object stored at 0x9e9880 is assigned to 3 places in the same  
top level LANGSXP list, and yet the named values of that subobject is  
0 (in all cases).

According to your descriptions above, I would consider this an  
"error" in setting named when the object is created?  Is my  
interpretation correct?

If so, for this particular case, I think that reused subobject should  
have had named = 2 since it is used in 3 places in the list?  What  
would be the proper setting for the named value for all of the  sub- 
sub- objects of that 0x9e9880 object?

Also what are the rules about having subobject with named = 2 inside  
a higher level object?  Should that force the higher level object  
named value to be 2 or can it stay 1?

Any help in understanding this would be greatly appreciated since I  
can not track down a bug when I am not sure what the "correct" values/ 
answers really should be and nothing in the R-lang.pdf or R-exts.pdf  
seem to explain this concept in any detail, especially for compound  
objects (it is much simpler to understand for objects that are just  
vectors of reals, integers, or strings, since there really is only  
one "object" that has a data area which stores all of the values (and  
AFAIK none of those stored ints, reals, or strings stored inside the  
vector object has a named property themselves).

So would someone please explain what the "proper" values for all of  
the named values for all of the objects in this "call2" object should  
be immediately after it is created.

Thanks,


Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more on bug 7924

2006-06-05 Thread Thomas Lumley
On Mon, 5 Jun 2006, Hin-Tak Leung wrote:

> I see you have found the sexptype listing in Rinternals.h . I believe
> it was in one of R's FAQ's about R's garbage collector - it doesn't do
> proper reference-counted garbage collection as you suggested, but does
> a sort of poor man's garbage collection, by classifying entities in
> *only* 3 catergories - not-in-use, in-used-by-one, and in-used
> by-more-than-one.

AFAIK the NAMED field is not used at all by the garbage collector and that 
certainly isn't what it's there for.  The garbage collector is a 
generational mark-and-sweep collector, not reference counted at all.

NAMED is about preserving the "call-by-value illusion" -- an object with 
NAMED=0 or 1 can be modified without copying it -- which seems to be 
exactly the problem in PR#7924.

-thomas

> Kevin B. Hendricks wrote:
>> Hi,
>>
>> Okay I threw together a quick dump_object routine and found something
>> that I don't think is correct when call2 is created.
>>
>> > call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)]
>> > get("call2")
>>
>> I use the do_get break to find the SEXP value I want
>>
>> Breakpoint 1, do_get (call=0xc2d530, op=0x52bd30, args=0x9e83a8,
>> rho=Variable "rho" is not available.
>> ) at ../../../r-devel/r-devel/R/src/main/envir.c:1668
>> 1668if (PRIMVAL(op)) { /* have get(.) */
>>
>>
>> (gdb) print *rval
>> $2 = {sxpinfo = {type = 6, obj = 0, named = 1, gp = 0, mark = 0,
>> debug = 0, trace = 0, fin = 0, gcgen = 0, gccls = 0}, attrib =
>> 0x508818, gengc_next_node = 0x9e7d50,
>>gengc_prev_node = 0x9e7ce0, u = {primsxp = {offset = 10663048},
>> symsxp = {pname = 0xa2b488, value = 0x9e7ce0, internal = 0x508818},
>> listsxp = {carval = 0xa2b488,
>>cdrval = 0x9e7ce0, tagval = 0x508818}, envsxp = {frame =
>> 0xa2b488, enclos = 0x9e7ce0, hashtab = 0x508818}, closxp = {formals =
>> 0xa2b488, body = 0x9e7ce0,
>>env = 0x508818}, promsxp = {value = 0xa2b488, expr = 0x9e7ce0,
>> env = 0x508818}}}
>>
>>
>> Now I invoke my own dump routine which keeps track of recursion level
>> and will dump the named and other things inside the newly created
>> object, the format of the output is
>>
>> recursion level: SEXP X TYPEOF(X) and then some object specific values
>>
>>
>> (gdb) call dump_object(rval, 0)
>>
>>
>> 0: 0x9e7d18 LANGSXP Object with length 1, named 1
>>  f(arg[[1]], arg[[1]], arg[[1]])
>> 1: 0xa2b488 SYMSXP  name at 0xa29408, value at 0x5087e0, named 0
>>  f
>> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>>  arg[[1]]
>> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>>  `[[`
>> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>>  arg
>> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>>  1
>> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>>  arg[[1]]
>> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>>  `[[`
>> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>>  arg
>> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>>  1
>> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>>  arg[[1]]
>> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>>  `[[`
>> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>>  arg
>> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>>  1
>>
>>
>>
>> Notice how each LANGSXP subobject reuses the exact same objects/
>> addresses (notice the address are the same) 3 times (one for each
>> entry) but the named value is always 0 for all of them (even though
>> that address is being re-used (effectively "named") each time.
>>
>> 1: 0x9e9880 LANGSXP Object with length 1, named 0
>>  arg[[1]]
>> 2: 0x508738 SYMSXP  name at 0x51c788, value at 0x527690, named 0
>>  `[[`
>> 2: 0xc37cc8 SYMSXP  name at 0xc376e8, value at 0x5087e0, named 0
>>  arg
>> 2: 0xf94cb8 REALSXP Object, length 1, starting at 0xf94ce0, named 0
>>  1
>>
>>
>> Shouldn't all 3 copies have named set to 1 and not zero since they
>> are all pointing to the same pieces of memory?  And shouldn't that
>> force the top level LANGSXP object to have named of 2 in this case
>> and not its current value of 1.
>>
>>
>> How should any assignment to any of those 3 places in the LANGSXP
>> list ever know they must be duplicated first when all of the named
>> values are 0 even though they all  point to the same block of memory?
>>
>> I truly do not understand how named is being used in this case.  Why
>> don't we simply refcount all allocated objects so we know what the
>> true value of named must be?  How else can we get that information?
>>
>> Hints welcome especially to reading material that explains more on
>> this stuff.
>>
>> Thanks,
>>
>> Kevin
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project

[Rd] grep() and factors

2006-06-05 Thread Marc Schwartz (via MN)
Hi all,

Based upon an offlist communication this morning, I am somewhat confused
(more than I usually am on most Monday mornings...) about the use of
grep() with factors as the 'x' argument.

The argument guidance in ?grep indicates:

x, text a character vector where matches are sought. Coerced to
character if possible.

and in the Details section:

Arguments which should be character strings or character vectors are
coerced to character if possible.


The wording of both would seem to reasonably lead to the conclusion that
a factor could be coerced to a character vector by the use of
as.character(FACTOR).

In tracing through the C code in character.c for do_grep(), which in
turn calls coerceVector() in coerce.c, unless I am mis-reading the code
(always possible), I don't see an indication that a factor would be
coerced to a character vector.

Since a factor -> character coercion would seem at face value, the most
logical coercion to take place when using grep(), I am curious if I am
missing something, or if perhaps ?grep needs to be more clear in the
coercions that will or might take place. Perhaps even the consideration
of an error message if a factor is passed as the 'x' argument, if indeed
the coercion would not take place.

Perhaps the easiest example here might be:

# On R Version 2.3.1 (2006-06-01) on FC5

> grep("[a-z]", letters)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26

> grep("[a-z]", factor(letters))
numeric(0)


Thanks for any comments or any virtual rotten tomatoes coming my way at
high speed.  :-)

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)

2006-06-05 Thread maechler
> "FrPi" == François Pinard <[EMAIL PROTECTED]>
> on Mon, 5 Jun 2006 08:11:20 -0400 writes:

FrPi> [Martin Maechler]
>> Thanks a lot, Francois, for your careful reading and
>> careful report!

FrPi> Thanks for being receptive! :-)

FrPi> Another problem in the same area: the documentation
FrPi> lies about how the function acts when given a
FrPi> data.frame.  From the code, a data.frame is processed
FrPi> as if it was a matrix.  From the documentation, while
FrPi> the data.frame is not mentioned explicitely, it is
FrPi> implied in the paragraph explaining how a list is
FrPi> processed (because a data.frame is a list).  Some
FrPi> reconciliation is needed here as well.

>> [ Though I do slightly mind the word "lies" since I do
>> value the 9th commandment..  Not telling the truth
>> *accidentally* is not "lying" ]

FrPi> Of course.  You know, I merely forgot a smiley, there.
FrPi> You are right in that we should try a bit to spare the
FrPi> extreme susceptibility of some people!  On the other
FrPi> hand, there should be limits to the feeling that we
FrPi> are always walking on eggs while writing to R-help or
FrPi> R-devel, some comfort and happiness is needed, after
FrPi> all. :-)

>> Yes; in this case, I propose to just amend the
>> documentation explainining that data.frames are treated
>> "as matrices".

FrPi> Let me add a small comment about data.frames.  It
FrPi> would be a bit awkward if a data.frame had two columns
FrPi> "y" and "x" (in that order) and if they were
FrPi> interpreted differently after matrix coercion.  I
FrPi> guess the problem would not exist if data.frames were
FrPi> really interpreted as lists, the "x" and "y" columns
FrPi> could even appear anywhere (untested).

you are right, but, as I now checked.
in  S  xy.coords() also behaves as it does now in R,
BTW, in both respects {data.frames and ">2-column"  matrices and d.f.s}

Hence --- for back compatibility reasons -- I now tend to agree
with Duncan Murdoch, and would not change xy.coords() behavior
at all, but simply amend the documentation.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grep() and factors

2006-06-05 Thread Bill Dunlap
On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:

> Based upon an offlist communication this morning, I am somewhat confused
> (more than I usually am on most Monday mornings...) about the use of
> grep() with factors as the 'x' argument.
>  ...
> > grep("[a-z]", letters)
>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
> [23] 23 24 25 26
>
> > grep("[a-z]", factor(letters))
> numeric(0)

I was recently surprised by this also.  In addition, if
R's grep did support factors in this way, what sort of
object (factor or character) should it return when value=T?
I recently changed Splus's grep to return a character vector in
that case.

   Splus> grep("[def]", letters[26:1])
   [1] 21 22 23
   Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
   [1] 21 22 23
   Splus> grep("[def]", letters[26:1], value=T)
   [1] "f" "e" "d"
   Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T)
   [1] "f" "e" "d"
   Splus> class(.Last.value)
   [1] "character"

R does this when grepping an integer vector.
   R> grep("1", 0:11, value=T)
   [1] "1"  "10" "11"
help(grep) says it returns "the matching elements themselves", but
doesn't say if "themselves" means before or after the conversion to
character.


Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xy.coords(MATRIX) bug in code or documentation (PR#8937)

2006-06-05 Thread pinard
[Martin Maechler]

>Hence --- for back compatibility reasons -- I now tend to agree
>with Duncan Murdoch, and would not change xy.coords() behavior
>at all, but simply amend the documentation.

It's fine, thanks a lot.

-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grep() and factors

2006-06-05 Thread Marc Schwartz (via MN)
On Mon, 2006-06-05 at 13:45 -0700, Bill Dunlap wrote:
> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
> 
> > Based upon an offlist communication this morning, I am somewhat confused
> > (more than I usually am on most Monday mornings...) about the use of
> > grep() with factors as the 'x' argument.
> >  ...
> > > grep("[a-z]", letters)
> >  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
> > [23] 23 24 25 26
> >
> > > grep("[a-z]", factor(letters))
> > numeric(0)
> 
> I was recently surprised by this also.  In addition, if
> R's grep did support factors in this way, what sort of
> object (factor or character) should it return when value=T?
> I recently changed Splus's grep to return a character vector in
> that case.
> 
>Splus> grep("[def]", letters[26:1])
>[1] 21 22 23
>Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
>[1] 21 22 23
>Splus> grep("[def]", letters[26:1], value=T)
>[1] "f" "e" "d"
>Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T)
>[1] "f" "e" "d"
>Splus> class(.Last.value)
>[1] "character"
> 
> R does this when grepping an integer vector.
>R> grep("1", 0:11, value=T)
>[1] "1"  "10" "11"
> help(grep) says it returns "the matching elements themselves", but
> doesn't say if "themselves" means before or after the conversion to
> character.

Bill,

My first inclination for the return value when used on a factor would be
the indexed factor elements where grep() would otherwise simply return
the indices. This would also maintain the factor levels from the
original source factor since "[".factor would normally retain these when
drop = FALSE.

For example:

# Return the indexed values as would otherwise be done
# in grep() if the factor to character coercion takes place:
# Use the same indices 21:23 as above

> factor(letters[26:1], levels = letters[26:1])[21:23]
[1] f e d
Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a



>From my read of the C code in do_grep() in character.c (again, if
correct), when 'value = TRUE', the C code appears to first get the
indices and then build the returned vector from the indexed values from
the source vector in a for() loop. So this should not be a problem
philosophically.

However, given your example of the coercion of integers, perhaps with
grep() at least, consistent behavior would dictate that return values
are always character vectors. These could then be coerced manually back
to a factor, using the original levels, as may be required:

> factor.letters <- factor(letters[26:1], levels=letters[26:1])
> factor.letters
 [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a

> grep("[def]", as.character(factor.letters))
[1] 21 22 23

> res <- grep("[def]", as.character(factor.letters), value = TRUE)
> res
[1] "f" "e" "d"

> factor(res, levels = levels(factor.letters))
[1] f e d
Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a

Which of course is the same result I proposed initially above.

I could be convinced either way. The concern of course being that (given
the offlist replies I have received today) even experienced users are
getting bitten by the current behavior versus their intuitive
expectations, which are at least loosely supported by the documentation.

HTH,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grep() and factors

2006-06-05 Thread Sean Davis
Marc Schwartz (via MN) wrote:
> On Mon, 2006-06-05 at 13:45 -0700, Bill Dunlap wrote:
> 
>>On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
>>
>>
>>>Based upon an offlist communication this morning, I am somewhat confused
>>>(more than I usually am on most Monday mornings...) about the use of
>>>grep() with factors as the 'x' argument.
>>> ...
>>>
grep("[a-z]", letters)
>>>
>>> [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
>>>[23] 23 24 25 26
>>>
>>>
grep("[a-z]", factor(letters))
>>>
>>>numeric(0)
>>
>>I was recently surprised by this also.  In addition, if
>>R's grep did support factors in this way, what sort of
>>object (factor or character) should it return when value=T?
>>I recently changed Splus's grep to return a character vector in
>>that case.
>>
>>   Splus> grep("[def]", letters[26:1])
>>   [1] 21 22 23
>>   Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
>>   [1] 21 22 23
>>   Splus> grep("[def]", letters[26:1], value=T)
>>   [1] "f" "e" "d"
>>   Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T)
>>   [1] "f" "e" "d"
>>   Splus> class(.Last.value)
>>   [1] "character"
>>
>>R does this when grepping an integer vector.
>>   R> grep("1", 0:11, value=T)
>>   [1] "1"  "10" "11"
>>help(grep) says it returns "the matching elements themselves", but
>>doesn't say if "themselves" means before or after the conversion to
>>character.
> 
> 
> Bill,
> 
> My first inclination for the return value when used on a factor would be
> the indexed factor elements where grep() would otherwise simply return
> the indices. This would also maintain the factor levels from the
> original source factor since "[".factor would normally retain these when
> drop = FALSE.
> 
> For example:
> 
> # Return the indexed values as would otherwise be done
> # in grep() if the factor to character coercion takes place:
> # Use the same indices 21:23 as above
> 
> 
>>factor(letters[26:1], levels = letters[26:1])[21:23]
> 
> [1] f e d
> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
> 
> 
> 
>>From my read of the C code in do_grep() in character.c (again, if
> correct), when 'value = TRUE', the C code appears to first get the
> indices and then build the returned vector from the indexed values from
> the source vector in a for() loop. So this should not be a problem
> philosophically.
> 
> However, given your example of the coercion of integers, perhaps with
> grep() at least, consistent behavior would dictate that return values
> are always character vectors. These could then be coerced manually back
> to a factor, using the original levels, as may be required:
> 
> 
>>factor.letters <- factor(letters[26:1], levels=letters[26:1])
>>factor.letters
> 
>  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
> 
> 
>>grep("[def]", as.character(factor.letters))
> 
> [1] 21 22 23
> 
> 
>>res <- grep("[def]", as.character(factor.letters), value = TRUE)
>>res
> 
> [1] "f" "e" "d"
> 
> 
>>factor(res, levels = levels(factor.letters))
> 
> [1] f e d
> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
> 
> Which of course is the same result I proposed initially above.
> 
> I could be convinced either way. The concern of course being that (given
> the offlist replies I have received today) even experienced users are
> getting bitten by the current behavior versus their intuitive
> expectations, which are at least loosely supported by the documentation.

I'll chime in on-list to say that I have had the same experience with 
expecting grep to coerce to text.  Despite the question of return 
values, I think of grep (not equivalent to the unix command, I 
understand, but it does have the same name) as operating on "text", not 
the factor levels themselves.  Not a big deal, but it does lead to 
sometimes hard to track bugs if one is not careful to put in 
as.character all the time.

Sean

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grep() and factors

2006-06-05 Thread Bill Dunlap
On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:

> > > > grep("[a-z]", factor(letters))
> > > numeric(0)
> >
> > I was recently surprised by this also.  In addition, if
> > R's grep did support factors in this way, what sort of
> > object (factor or character) should it return when value=T?
> > I recently changed Splus's grep to return a character vector in
> > that case.
> >
> >Splus> grep("[def]", letters[26:1])
> >[1] 21 22 23
> >Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
> >[1] 21 22 23
> >Splus> grep("[def]", letters[26:1], value=T)
> >[1] "f" "e" "d"
> >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), 
> > value=T)
> >[1] "f" "e" "d"
> >Splus> class(.Last.value)
> >[1] "character"
> >
> > R does this when grepping an integer vector.
> >R> grep("1", 0:11, value=T)
> >[1] "1"  "10" "11"
> > help(grep) says it returns "the matching elements themselves", but
> > doesn't say if "themselves" means before or after the conversion to
> > character.
>
> Bill,
>
> My first inclination for the return value when used on a factor would be
> the indexed factor elements where grep() would otherwise simply return
> the indices. This would also maintain the factor levels from the
> original source factor since "[".factor would normally retain these when
> drop = FALSE.

That would be my first inclination also.  I would have expected the output of
   grep(pattern, text, value=TRUE)
to be identical to that of
   text[grep(pattern, text, value=FALSE)]
no matter what class text has.

No end users have seen this in Splus so we can change it to anything,
but we want to keep it the same as R's.

> I could be convinced either way. The concern of course being that (given
> the offlist replies I have received today) even experienced users are
> getting bitten by the current behavior versus their intuitive
> expectations, which are at least loosely supported by the documentation.
>
> HTH,
>
> Marc Schwartz


Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grep() and factors

2006-06-05 Thread Gabor Grothendieck
On 6/5/06, Bill Dunlap <[EMAIL PROTECTED]> wrote:
> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
>
> > > > > grep("[a-z]", factor(letters))
> > > > numeric(0)
> > >
> > > I was recently surprised by this also.  In addition, if
> > > R's grep did support factors in this way, what sort of
> > > object (factor or character) should it return when value=T?
> > > I recently changed Splus's grep to return a character vector in
> > > that case.
> > >
> > >Splus> grep("[def]", letters[26:1])
> > >[1] 21 22 23
> > >Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
> > >[1] 21 22 23
> > >Splus> grep("[def]", letters[26:1], value=T)
> > >[1] "f" "e" "d"
> > >Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), 
> > > value=T)
> > >[1] "f" "e" "d"
> > >Splus> class(.Last.value)
> > >[1] "character"
> > >
> > > R does this when grepping an integer vector.
> > >R> grep("1", 0:11, value=T)
> > >[1] "1"  "10" "11"
> > > help(grep) says it returns "the matching elements themselves", but
> > > doesn't say if "themselves" means before or after the conversion to
> > > character.
> >
> > Bill,
> >
> > My first inclination for the return value when used on a factor would be
> > the indexed factor elements where grep() would otherwise simply return
> > the indices. This would also maintain the factor levels from the
> > original source factor since "[".factor would normally retain these when
> > drop = FALSE.
>
> That would be my first inclination also.  I would have expected the output of
>   grep(pattern, text, value=TRUE)
> to be identical to that of
>   text[grep(pattern, text, value=FALSE)]
> no matter what class text has.
>
> No end users have seen this in Splus so we can change it to anything,
> but we want to keep it the same as R's.
>
> > I could be convinced either way. The concern of course being that (given
> > the offlist replies I have received today) even experienced users are
> > getting bitten by the current behavior versus their intuitive
> > expectations, which are at least loosely supported by the documentation.
> >

I would have expected

If non-character text arguments are accepted I would have expected
that they be coerced to character so that
grep(pattern, text, ...) would return the same result as
grep(pattern, as.character(text), ...)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] FYI: R-2.3.1pat-win32.exe not on CRAN

2006-06-05 Thread Henrik Bengtsson
FYI, the download link on CRAN for R-2.3.1pat-win32.exe and related
files seems to be broken at least since yesterday, e.g.
http://cran.at.r-project.org/bin/windows/base/R-2.3.1pat-win32.exe.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FYI: R-2.3.1pat-win32.exe not on CRAN

2006-06-05 Thread Duncan Murdoch
On 6/5/2006 10:04 PM, Henrik Bengtsson wrote:
> FYI, the download link on CRAN for R-2.3.1pat-win32.exe and related
> files seems to be broken at least since yesterday, e.g.
> http://cran.at.r-project.org/bin/windows/base/R-2.3.1pat-win32.exe.

Thanks, it was a typo in the upload script.  Fixed now, so the files 
should be available within a day or so.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel