Re: [Rd] Why does the lexical analyzer drop comments ?

2009-03-23 Thread Romain Francois

Duncan Murdoch wrote:

On 22/03/2009 4:50 PM, Romain Francois wrote:

Romain Francois wrote:

Peter Dalgaard wrote:

Duncan Murdoch wrote:

On 3/20/2009 2:56 PM, romain.franc...@dbmail.com wrote:

It happens in the token function in gram.c:
    c = SkipSpace();
    if (c == '#') c = SkipComment();

and then SkipComment goes like that:
static int SkipComment(void)
{
    int c;
    while ((c = xxgetc()) != '\n' && c != R_EOF) ;
    if (c == R_EOF) EndOfFile = 2;
    return c;
}

which effectively drops comments.

Would it be possible to keep the information somewhere ?
The source code says this:
 *  The function yylex() scans the input, breaking it into
 *  tokens which are then passed to the parser.  The lexical
 *  analyser maintains a symbol table (in a very messy fashion).

so my question is could we use this symbol table to keep track 
of, say, COMMENT tokens.

Why would I even care about that ? I'm writing a package that will
perform syntax highlighting of R source code based on the output 
of the

parser, and it seems a waste to drop the comments.
An also, when you print a function to the R console, you don't 
get the comments, and some of them might be useful to the user.


Am I mad if I contemplate looking into this ? 
Comments are syntactically the same as whitespace.  You don't want 
them to affect the parsing.

Well, you might, but there is quite some madness lying that way.

Back in the bronze age, we did actually try to keep comments 
attached to (AFAIR) the preceding token. One problem is that the 
elements of the parse tree typically involve multiple tokens, and 
if comments after different tokens get stored in the same place 
something is not going back where it came from when deparsing. So 
we had problems with comments moving from one end of a loop the 
other and the like.

Ouch. That helps picturing the kind of madness ...

Another way could be to record comments separately (similarly to 
srcfile attribute for example) instead of dropping them entirely, 
but I guess this is the same as Duncan's idea, which is easier to 
set up.


You could try extending the scheme by encoding which part of a 
syntactic structure the comment belongs to, but consider for 
instance how many places in a function call you can stick in a 
comment.


f #here
( #here
a #here (possibly)
= #here
1 #this one belongs to the argument, though
) #but here as well

Coming back on this. I actually get two expressions:

 > p <- parse( "/tmp/parsing.R")
 > str( p )
length 2 expression(f, (a = 1))
 - attr(*, "srcref")=List of 2
  ..$ :Class 'srcref'  atomic [1:6] 1 1 1 1 1 1
  .. .. ..- attr(*, "srcfile")=Class 'srcfile' 
  ..$ :Class 'srcref'  atomic [1:6] 2 1 6 1 1 1
  .. .. ..- attr(*, "srcfile")=Class 'srcfile' 
 - attr(*, "srcfile")=Class 'srcfile' 

But anyway, if I drop the first comment, then I get one expression 
with some srcref information:


 > p <- parse( "/tmp/parsing.R")
 > str( p )
length 1 expression(f(a = 1))
 - attr(*, "srcref")=List of 1
  ..$ :Class 'srcref'  atomic [1:6] 1 1 5 1 1 1
  .. .. ..- attr(*, "srcfile")=Class 'srcfile' 
 - attr(*, "srcfile")=Class 'srcfile' 

but as far as i can see, there is only srcref information for that 
expression as a whole, it does not go beyond, so I am not sure I can 
implement Duncan's proposal without more detailed information from 
the parser, since I will only have the chance to check if a 
whitespace is actually a comment if it is between two expressions 
with a srcref.


Currently srcrefs are only attached to whole statements.  Since your 
source only included one or two statements, you only get one or two 
srcrefs.  It would not be hard to attach a srcref to every 
subexpression; there hasn't been a need for that before, so I didn't 
do it just for the sake of efficiency.


I understand that. I wanted to make sure I did not miss something.

However, it might make sense for you to have your own parser, based on 
the grammar in R's parser, but handling white space differently. 
Certainly it would make sense to do that before making changes to the 
base R one.  The whole source is in src/main/gram.y; if you're not 
familiar with Bison, I can give you a hand.


Thank you, I appreciate your help. Having my own parser is the option I 
am slowly converging to.
I'll start with reading bison documentation. Besides bison documents, is 
there R specific documentation on how the R parser was written ?




Duncan Murdoch



Would it be sensible then to retain the comments and their srcref 
information, but separate from the tokens used for the actual 
parsing, in some other attribute of the output of parse ?


Romain

If you're doing syntax highlighting, you can determine the 
whitespace by
looking at the srcref records, and then parse that to determine 
what isn't being counted as tokens.  (I think you'll find a few 
things there besides whitespace, but it is a fairly limited set, 
so shouldn't be too hard to recognize.)


The Rd parser is differ

[Rd] all.equal is hard to use

2009-03-23 Thread hong shen

Hi,

I have extensive programming experience (Winodws, Unix, scripting, compiled 
languages, you name it) but new to R.

I found that it is quite hard to interpret the results returned by all.equal 
(base). The main problem is that when attributes are compared, they are sorted 
in attr.all.equal but in the result, the index of diff component is from the 
sorted list not the original list. I think that adding the component name to 
the printout may make users' life a little bit easier like

function (target, current, check.attributes = TRUE, ...) 
{
msg <- if (check.attributes) 
# if it is called by attr.all.equal(), target and current
# are lists returned from attributes(original target | current). 
# So attributes of target and current are the attributes of attributes,
# which contains only "names". 
attr.all.equal(target, current, ...)
iseq <- if (length(target) == length(current)) {
# if the length is equal, iseq will be a (1, 2, ... length)
seq_along(target)
}
else {
if (!is.null(msg)) 
# remove old msg about "Lengths"
msg <- msg[-grep("\\bLengths\\b", msg)]
nc <- min(length(target), length(current))
msg <- c(msg, paste("Length mismatch: comparison on first", 
nc, "components"))
# iseq is (1,2, ..., shorter of two lengthes)
seq_len(nc)
}
for (i in iseq) {
# compare each element in the list with all.equal.
mi <- all.equal(target[[i]], current[[i]], check.attributes = 
check.attributes, 
...)
if (is.character(mi)) {
   
    print out name if possible 
   if (!is.null(names(target)[i]) && !is.null(names(current)[i]))
 msg <- c(msg, paste("Component ", i, ": ", mi, "with target name: 
", names(target)[i], ", current name: ", 

names(current)[i], sep = ""))
   else if (!is.null(names(target)[i]))
 msg <- c(msg, paste("Component ", i, ": ", mi, "with target name: 
", names(target)[i], sep = ""))
   else if (!is.null(names(current)[i]))
 msg <- c(msg, paste("Component ", i, ": ", mi, "with current name: 
", names(current)[i], sep = ""))
   else
 msg <- c(msg, paste("Component ", i, ": ", mi, sep = ""))
   
}
}
if (is.null(msg)) 
TRUE
else msg
}

Hong Shen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub') (PR#13617)

2009-03-23 Thread waku
Full_Name: Wacek Kusnierczyk
Version: 2.10.0 r48181
OS: Ubuntu 8.04 Linux 32bit
Submission from: (NULL) (129.241.199.135)


there seems to be something wrong with r's regexing.  consider the following
example:

gregexpr('a*|b', 'ab')
# positions: 1 2
# lengths: 1 1

gsub('a*|b', '.', 'ab')
# ..

where the pattern matches any number of 'a's or one b, and replaces the match
with a dot, globally.  the answer is correct (assuming a dfa engine).  however,

gregexpr('a*|b', 'ab', perl=TRUE)
# positions: 1 2
# lengths: 1 0

gsub('a*|b', '.', 'ab', perl=TRUE)
# .b.

where the pattern is identical, but the result is wrong.  perl uses an nfa (if
it used a dfa, the result would still be wrong), and in the above example it
should find *four* matches, collectively including *all* letters in the input,
thus producing *four* dots (and *only* dots) in the output:

perl -le '
   $input = qq|ab|;
   print qq|match: "$_"| foreach $input =~ /a*|b/g;
   $input =~ s/a*|b/./g;
   print qq|output: "$input"|;'
# match: "a"
# match: ""
# match: "b"
# match: ""
# output: ""

since with perl=TRUE both gregexpr and gsub seem to use pcre, i've checked the
example with pcretest, and also with a trivial c program (available on demand)
using the pcre api;  there were four matches, exactly as in the perl bit above.

the results above are surprising, and suggest a bug in r's use of pcre rather
than in pcre itself.  possibly, the issue is that when an empty sting is matched
(with a*, for example), the next attempt is not trying to match a non-empty
string at the same position, but rather an empty string again at the next
position.  for example,

gsub('a|b|c', '.', 'abc', perl=TRUE)
# "...", correct

gsub('a*|b|c', '.', 'abc', perl=TRUE)
# ".b.c.", wrong

gsub('a|b*|c', '.', 'abc', perl=TRUE)
# "..c.", wrong (but now only 'c' remains)

gsub('a|b*|c', '.', 'aba', perl=TRUE)
# "...", incidentally correct


without detailed analysis of the code, i guess the bug is located somewhere in
src/main/pcre.c, and is distributed among the do_p* functions, so that multiple
fixes may be needed.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
somewhat related to a previous discussion [1] on how 'names<-' would
sometimes modify its argument in place, and sometimes produce a modified
copy without changing the original, here's another example of how it
becomes visible to the user when r makes or doesn't make a copy of an
object:

x = NULL
dput(x)
# NULL
class(x) = 'integer'
# error: invalid (NULL) left side of assignment

x = c()
dput(x)
# NULL
class(x) = 'integer'
dput(x)
# integer(0)

in both cases, x ends up with the value NULL (the no-value object).  in
both cases, dput explains that x is NULL.  in both cases, an attempt is
made to make x be an empty integer vector.  the first fails, because it
tries to modify NULL itself, the latter apparently does not and succeeds.

however, the following has a different pattern:

x = NULL
dput(x)
# NULL
names(x) = character(0)
# error: attempt to set an attribute on NULL

x = c()
dput(x)
# NULL
names(x) = character(0)
# error: attempt to set an attribute on NULL

and also:

x = c()
class(x) = 'integer'
# fine
class(x) = 'foo'
# error: attempt to set an attribute on NULL

how come?  the behaviour can obviously be explained by looking at the
source code (hardly surprisingly, because it is as it is because the
source is as it is), and referring to the NAMED property (i.e., the
sxpinfo.named field of a SEXPREC struct).  but can the *design* be
justified?  can the apparent incoherences visible above the interface be
defended? 

why should the first example above be unable to produce an empty integer
vector? 

why is it possible to set a class attribute, but not a names attribute,
on c()? 

why is it possible to set the class attribute in c() to 'integer', but
not to 'foo'? 

why are there different error messages for apparently the same problem?


vQ


[1] search the rd archives for 'surprising behaviour of names<-'

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Martin Maechler
> "WK" == Wacek Kusnierczyk 
> on Mon, 23 Mar 2009 09:52:19 +0100 writes:

WK> somewhat related to a previous discussion [1] on how 'names<-' would
WK> sometimes modify its argument in place, and sometimes produce a modified
WK> copy without changing the original, here's another example of how it
WK> becomes visible to the user when r makes or doesn't make a copy of an
WK> object:

WK> x = NULL
WK> dput(x)
WK> # NULL
WK> class(x) = 'integer'
WK> # error: invalid (NULL) left side of assignment

does not happen for me in R-2.8.1,  R-patched or newer

So you must be using your own patched version of  R ?




WK> x = c()
WK> dput(x)
WK> # NULL
WK> class(x) = 'integer'
WK> dput(x)
WK> # integer(0)

WK> in both cases, x ends up with the value NULL (the no-value object).  in
WK> both cases, dput explains that x is NULL.  in both cases, an attempt is
WK> made to make x be an empty integer vector.  the first fails, because it
WK> tries to modify NULL itself, the latter apparently does not and 
succeeds.

WK> however, the following has a different pattern:

WK> x = NULL
WK> dput(x)
WK> # NULL
WK> names(x) = character(0)
WK> # error: attempt to set an attribute on NULL

WK> x = c()
WK> dput(x)
WK> # NULL
WK> names(x) = character(0)
WK> # error: attempt to set an attribute on NULL

WK> and also:

WK> x = c()
WK> class(x) = 'integer'
WK> # fine
WK> class(x) = 'foo'
WK> # error: attempt to set an attribute on NULL

WK> how come?  the behaviour can obviously be explained by looking at the
WK> source code (hardly surprisingly, because it is as it is because the
WK> source is as it is), and referring to the NAMED property (i.e., the
WK> sxpinfo.named field of a SEXPREC struct).  but can the *design* be
WK> justified?  can the apparent incoherences visible above the interface be
WK> defended? 

WK> why should the first example above be unable to produce an empty integer
WK> vector? 

WK> why is it possible to set a class attribute, but not a names attribute,
WK> on c()? 

WK> why is it possible to set the class attribute in c() to 'integer', but
WK> not to 'foo'? 

WK> why are there different error messages for apparently the same problem?


WK> vQ


WK> [1] search the rd archives for 'surprising behaviour of names<-'

WK> __
WK> R-devel@r-project.org mailing list
WK> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
Martin Maechler wrote:
>> "WK" == Wacek Kusnierczyk 
>>
>> 
> WK> somewhat related to a previous discussion [1] on how 'names<-' would
> WK> sometimes modify its argument in place, and sometimes produce a 
> modified
> WK> copy without changing the original, here's another example of how it
> WK> becomes visible to the user when r makes or doesn't make a copy of an
> WK> object:
>
> WK> x = NULL
> WK> dput(x)
> WK> # NULL
> WK> class(x) = 'integer'
> WK> # error: invalid (NULL) left side of assignment
>
> does not happen for me in R-2.8.1,  R-patched or newer
>
> So you must be using your own patched version of  R ?
>   

oops, i meant to use 2.8.1 or devel for testing.  you're right, in this
example there is no error reported in > 2.8.0, but see below.

>
> WK> x = c()
> WK> dput(x)
> WK> # NULL
> WK> class(x) = 'integer'
> WK> dput(x)
> WK> # integer(0)
>
> WK> in both cases, x ends up with the value NULL (the no-value object).  
> in
> WK> both cases, dput explains that x is NULL.  in both cases, an attempt 
> is
> WK> made to make x be an empty integer vector.  the first fails, because 
> it
> WK> tries to modify NULL itself, the latter apparently does not and 
> succeeds.
>
> WK> however, the following has a different pattern:
>
> WK> x = NULL
> WK> dput(x)
> WK> # NULL
> WK> names(x) = character(0)
> WK> # error: attempt to set an attribute on NULL
>   

i get the error in devel.


> WK> x = c()
> WK> dput(x)
> WK> # NULL
> WK> names(x) = character(0)
> WK> # error: attempt to set an attribute on NULL
>   

i get the error in devel.

> WK> and also:
>
> WK> x = c()
> WK> class(x) = 'integer'
> WK> # fine
> WK> class(x) = 'foo'
> WK> # error: attempt to set an attribute on NULL
>   

i get the error in devel.

it doesn't seem coherent to me:  why can i set the class, but not names
attribute on both NULL and c()?  why can i set the class attribute to
'integer', but not to 'foo', as i could on a non-empty vector:

x = 1
class(x) = 'foo'
# just fine

i'd naively expect to be able to create an empty vector classed 'foo',
displayed perhaps as

# speculation
x = NULL
class(x) = 'foo'
x
# foo(0)

or maybe as

x
# NULL
# attr(, "class")
# [1] "foo"

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why does the lexical analyzer drop comments ?

2009-03-23 Thread Duncan Murdoch

On 23/03/2009 3:10 AM, Romain Francois wrote:

Duncan Murdoch wrote:

 
However, it might make sense for you to have your own parser, based on 
the grammar in R's parser, but handling white space differently. 
Certainly it would make sense to do that before making changes to the 
base R one.  The whole source is in src/main/gram.y; if you're not 
familiar with Bison, I can give you a hand.


Thank you, I appreciate your help. Having my own parser is the option I 
am slowly converging to.
I'll start with reading bison documentation. Besides bison documents, is 
there R specific documentation on how the R parser was written ?


I don't think so.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Martin Maechler
> "WK" == Wacek Kusnierczyk 
> on Mon, 23 Mar 2009 10:56:37 +0100 writes:

WK> Martin Maechler wrote:
>>> "WK" == Wacek Kusnierczyk 
>>> 
>>> 
WK> somewhat related to a previous discussion [1] on how 'names<-' would
WK> sometimes modify its argument in place, and sometimes produce a modified
WK> copy without changing the original, here's another example of how it
WK> becomes visible to the user when r makes or doesn't make a copy of an
WK> object:
>> 
WK> x = NULL
WK> dput(x)
WK> # NULL
WK> class(x) = 'integer'
WK> # error: invalid (NULL) left side of assignment
>> 
>> does not happen for me in R-2.8.1,  R-patched or newer
>> 
>> So you must be using your own patched version of  R ?
>> 

WK> oops, i meant to use 2.8.1 or devel for testing.  you're right, in this
WK> example there is no error reported in > 2.8.0, but see below.

ok

 [.. omitted part no longer relevant ]

WK> however, the following has a different pattern:
>> 
WK> x = NULL
WK> dput(x)
WK> # NULL
WK> names(x) = character(0)
WK> # error: attempt to set an attribute on NULL
>> 

WK> i get the error in devel.

Yes,  NULL is NULL is NULL !   Do read  ?NULL !   [ ;-) ]

more verbously,  all NULL objects in R are identical, or as the
help page says, there's only ``*The* NULL Object'' in R,
i.e., NULL cannot get any attributes.

WK> x = c()
WK> dput(x)
WK> # NULL
WK> names(x) = character(0)
WK> # error: attempt to set an attribute on NULL
>> 

WK> i get the error in devel.

of course!  
   [I think *you* should have noticed that  NULL and c()  *are* identical]

WK> and also:
>> 
WK> x = c()
WK> class(x) = 'integer'
WK> # fine
"fine" yes; 
here, the convention has been to change NULL into integer(0);
and no, this won't change, if you find it inconsistent.


WK> class(x) = 'foo'
WK> # error: attempt to set an attribute on NULL
>> 

WK> i get the error in devel.

No, not if you evaluate the statements above (where 'x' has
become  'integer(0)' in the mean time).

But yes, you get in something like

x <- c();  class(x) <- "foo"

and I do agree that there's a buglet : 
The error message should be slightly more precise,
--- improvement proposals are welcome ---
but an error nontheless

WK> it doesn't seem coherent to me:  why can i set the class, 

you cannot set it, you can *change* it.

WK> but not names
WK> attribute on both NULL and c()?  why can i set the class attribute to
WK> 'integer', but not to 'foo', as i could on a non-empty vector:

WK> x = 1
WK> class(x) = 'foo'
WK> # just fine

mainly because 'NULL is NULL is NULL' 
(NULL cannot have attributes)

WK> i'd naively expect to be able to create an empty vector classed 'foo',

yes, but that expectation is wrong

WK> displayed perhaps as

WK> # speculation
WK> x = NULL
WK> class(x) = 'foo'
WK> x
WK> # foo(0)

WK> or maybe as

WK> x
WK> # NULL
WK> # attr(, "class")
WK> # [1] "foo"

WK> vQ

WK> __
WK> R-devel@r-project.org mailing list
WK> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
Martin Maechler wrote:
>
>  [.. omitted part no longer relevant ]
>
> WK> however, the following has a different pattern:
> >> 
> WK> x = NULL
> WK> dput(x)
> WK> # NULL
> WK> names(x) = character(0)
> WK> # error: attempt to set an attribute on NULL
> >> 
>
> WK> i get the error in devel.
>
> Yes,  NULL is NULL is NULL !   Do read  ?NULL !   [ ;-) ]
>
> more verbously,  all NULL objects in R are identical, or as the
> help page says, there's only ``*The* NULL Object'' in R,
> i.e., NULL cannot get any attributes.
>   

yes, but that's not the issue.  the issue is that names(x)<- seems to
try to attach an attribute to NULL, while it could, in principle, do the
same as class(x)<-, i.e., coerce x to some type (and hence attach the
name attribute not to NULL, but to the coerced-to object).

but, as someone else explained to me behind the scenes, the matters are
a little bit, so to speak, untidy:

x = NULL
class(x) = 'integer'
# just fine

x = NULL
attr(x, 'class') = 'integer'
# no go

where class()<-, but not attr(,'class')<-, will try to coerce x to an
object of the storage *mode* 'integer', hence the former succeeds
(because it sets, roughly, the 'integer' class on an empty integer
vector), while the latter fails (because it tries to set the 'integer'
class on NULL itself).

what was not clear to me is not why setting a class on NULL fails here,
but why it is setting on NULL in the first place.  after all,

x = 1
names(x) = 'foo'

is setting names on a *copy* of 1, not on *the* 1, so why could not
class()<- create a 'copy' of NULL, i.e., an empty vector of some type
(perhaps raw, as the lowest in the hierarchy).


> WK> x = c()
> WK> dput(x)
> WK> # NULL
> WK> names(x) = character(0)
> WK> # error: attempt to set an attribute on NULL
> >> 
>
> WK> i get the error in devel.
>
> of course!  
>[I think *you* should have noticed that  NULL and c()  *are* identical]
>
> WK> and also:
> >> 
> WK> x = c()
> WK> class(x) = 'integer'
> WK> # fine
> "fine" yes; 
> here, the convention has been to change NULL into integer(0);
> and no, this won't change, if you find it inconsistent.
>   

that's ok, this is what i'd expect in the other cases, too (modulo the
actual storage mode).


>
> WK> class(x) = 'foo'
> WK> # error: attempt to set an attribute on NULL
> >> 
>
> WK> i get the error in devel.
>
> No, not if you evaluate the statements above (where 'x' has
> become  'integer(0)' in the mean time).
>
> But yes, you get in something like
>
> x <- c();  class(x) <- "foo"
>   

that's what i meant, must have forgotten the x = c().

> and I do agree that there's a buglet : 
> The error message should be slightly more precise,
> --- improvement proposals are welcome ---
> but an error nontheless
>
> WK> it doesn't seem coherent to me:  why can i set the class, 
>
> you cannot set it, you can *change* it.
>   

terminological wars? 

btw. the class of NULL is "NULL";  why can't nullify an object by
setting its class to 'NULL'?

x = 1
class(x) = 'NULL'
x
# *not* NULL

and one more interesting example:

x = 1:2
class(x) = 'NULL'
x
# [1] 1 2
# attr(,"class") "NULL"
x[1]
# 1
x[2]
# 2
is.vector(x)
# FALSE

hurray!!! apparently, i've alchemized a non-vector vector...  (you can
do it in r-devel, for that matter).



> WK> but not names
> WK> attribute on both NULL and c()?  why can i set the class attribute to
> WK> 'integer', but not to 'foo', as i could on a non-empty vector:
>
> WK> x = 1
> WK> class(x) = 'foo'
> WK> # just fine
>
> mainly because 'NULL is NULL is NULL' 
> (NULL cannot have attributes)
>   

yes yes yes;  the question was, once again:  why is x still NULL?

> WK> i'd naively expect to be able to create an empty vector classed 'foo',
>
> yes, but that expectation is wrong
>   

wrt. the actual state of matters, not necessarily wrt. the ideal state
of matters ;)  (i don't insist)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Martin Maechler
> "WK" == Wacek Kusnierczyk 
> on Mon, 23 Mar 2009 16:11:04 +0100 writes:

WK> Martin Maechler wrote:
>> 
>> [.. omitted part no longer relevant ]
>> 
WK> however, the following has a different pattern:
>> >> 
WK> x = NULL
WK> dput(x)
WK> # NULL
WK> names(x) = character(0)
WK> # error: attempt to set an attribute on NULL
>> >> 
>> 
WK> i get the error in devel.
>> 
>> Yes,  NULL is NULL is NULL !   Do read  ?NULL !   [ ;-) ]
>> 
>> more verbously,  all NULL objects in R are identical, or as the
>> help page says, there's only ``*The* NULL Object'' in R,
>> i.e., NULL cannot get any attributes.
>> 

WK> yes, but that's not the issue.  the issue is that names(x)<- seems to
WK> try to attach an attribute to NULL, while it could, in principle, do the
WK> same as class(x)<-, i.e., coerce x to some type (and hence attach the
WK> name attribute not to NULL, but to the coerced-to object).

yes, it could;  but really, the  fact that  'class<-' works is
the exception.  The other variants (with the error message) are
the rule.

WK> but, as someone else explained to me behind the scenes, the matters are
WK> a little bit, so to speak, untidy:

WK> x = NULL
WK> class(x) = 'integer'
WK> # just fine

WK> x = NULL
WK> attr(x, 'class') = 'integer'
WK> # no go

WK> where class()<-, but not attr(,'class')<-, will try to coerce x to an
WK> object of the storage *mode* 'integer', hence the former succeeds
WK> (because it sets, roughly, the 'integer' class on an empty integer
WK> vector), while the latter fails (because it tries to set the 'integer'
WK> class on NULL itself).

WK> what was not clear to me is not why setting a class on NULL fails here,
WK> but why it is setting on NULL in the first place.  after all,

WK> x = 1
WK> names(x) = 'foo'

WK> is setting names on a *copy* of 1, not on *the* 1, so why could not
WK> class()<- create a 'copy' of NULL, i.e., an empty vector of some type
WK> (perhaps raw, as the lowest in the hierarchy).

yes, it could.  I personally don't think this would add any
value to R's behavior;  rather, for most useRs I'd think it
rather helps to get an error in such a case, than a  raw(0)
object.

Also, note (here and further below),
that Using   "class(.) <-  "
is an S3 idiom   and S3 classes  ``don't really exist'', 
the "class" attribute being a useful hack,
and many of us would rather like to work and improve working
with S4 classes (& generics & methods) than to fiddle with  'class<-'.

In S4, you'd  use  setClass(.), new(.) and  setAs(.),
typically, for defining and changing classes of objects.

But maybe I have now lead you into a direction I will later
regret, 

when you start telling us about the perceived inconsistencies of
S4 classes, methods, etc.
BTW: If you go there, please do use  R 2.9.0 (or newer)
 exclusively.

WK> x = c()
WK> dput(x)
WK> # NULL
WK> names(x) = character(0)
WK> # error: attempt to set an attribute on NULL
>> >> 
>> 
WK> i get the error in devel.
>> 
>> of course!  
>> [I think *you* should have noticed that  NULL and c()  *are* identical]
>> 
WK> and also:
>> >> 
WK> x = c()
WK> class(x) = 'integer'
WK> # fine
>> "fine" yes; 
>> here, the convention has been to change NULL into integer(0);
>> and no, this won't change, if you find it inconsistent.
>> 

WK> that's ok, this is what i'd expect in the other cases, too (modulo the
WK> actual storage mode).


>> 
WK> class(x) = 'foo'
WK> # error: attempt to set an attribute on NULL
>> >> 
>> 
WK> i get the error in devel.
>> 
>> No, not if you evaluate the statements above (where 'x' has
>> become  'integer(0)' in the mean time).
>> 
>> But yes, you get in something like
>> 
>> x <- c();  class(x) <- "foo"
>> 

WK> that's what i meant, must have forgotten the x = c().

>> and I do agree that there's a buglet : 
>> The error message should be slightly more precise,
>> --- improvement proposals are welcome ---
>> but an error nontheless
>> 
WK> it doesn't seem coherent to me:  why can i set the class, 
>> 
>> you cannot set it, you can *change* it.
>> 

WK> terminological wars? 

WK> btw. the class of NULL is "NULL";  why can't nullify an object by
WK> setting its class to 'NULL'?

WK> x = 1
WK> class(x) = 'NULL'
WK> x
WK> # *not* NULL

see above {S4 / S3 / ...}; 
If you want to  "nullify", rather use
more (S-language) idiomatic calls like

as(x, "NULL")
or  
as.null(x)

both of which do work.

Regards,
Martin


WK> and one more interesting example:

WK> x = 1:2
WK> class(x) = 'NULL'
WK> x
WK> # [1] 1 2
WK> # attr(,"class") "NULL"
WK> x[1]
WK> # 1
WK> x[2]
   

Re: [Rd] incoherent treatment of NULL

2009-03-23 Thread Wacek Kusnierczyk
Martin Maechler wrote:
>
> >> more verbously,  all NULL objects in R are identical, or as the
> >> help page says, there's only ``*The* NULL Object'' in R,
> >> i.e., NULL cannot get any attributes.
> >> 
>
> WK> yes, but that's not the issue.  the issue is that names(x)<- seems to
> WK> try to attach an attribute to NULL, while it could, in principle, do 
> the
> WK> same as class(x)<-, i.e., coerce x to some type (and hence attach the
> WK> name attribute not to NULL, but to the coerced-to object).
>
> yes, it could;  but really, the  fact that  'class<-' works is
> the exception.  The other variants (with the error message) are
> the rule.
>   

ok.

> Also, note (here and further below),
> that Using   "class(.) <-  "
> is an S3 idiom   and S3 classes  ``don't really exist'', 
> the "class" attribute being a useful hack,
> and many of us would rather like to work and improve working
> with S4 classes (& generics & methods) than to fiddle with  'class<-'.
>
> In S4, you'd  use  setClass(.), new(.) and  setAs(.),
> typically, for defining and changing classes of objects.
>
> But maybe I have now lead you into a direction I will later
> regret, 
> 
> when you start telling us about the perceived inconsistencies of
> S4 classes, methods, etc.
> BTW: If you go there, please do use  R 2.9.0 (or newer)
>   

using latest r-devel for the most part.

i think you will probably not regret your words;  from what i've seen
already, s4 classes are the last thing i'd ever try to learn in r.  but
yes, there would certainly be lots of issues to complain about.  i'll
rather wait for s5.

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Error in Package Description (PR#13618)

2009-03-23 Thread bonner . reed
In the Installer for R.8.1 for Mac OSX Tiger or higher, the  
description of the GNU Fortran package in the customize option writes  
Fortran as "Fotran."  Just a minor error, but should be fixed if  
revisited.

-Bonner Reed
Yale Univ.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] matplot and lend="butt"

2009-03-23 Thread Christophe Genolini

Hi the list,

I am using matplot with the option lend="butt", but only the first line 
(the black) is printed correctly  :


> matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt")

Is it a bug ?
I am using R2.8.1 under windows XP pack3.

Christophe

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] matplot and lend="butt"

2009-03-23 Thread Gabor Grothendieck
It looks to be a bug.  Here is the code and notice that ... is passed to
plot (which plots the first series) but not to lines (which plots the rest):

if (!add) {
ii <- ii[-1]
plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab,
xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1],
pch = pch[1], col = col[1], cex = cex[1], bg = bg[1],
...)
}
for (i in ii) {
lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i],
pch = pch[i], col = col[i], cex = cex[i], bg = bg[i])
}

This is from 2.8.1 patched but I noticed the same thing in
"R version 2.9.0 Under development (unstable) (2009-03-02 r48041)"


On Mon, Mar 23, 2009 at 6:25 PM, Christophe Genolini
 wrote:
> Hi the list,
>
> I am using matplot with the option lend="butt", but only the first line (the
> black) is printed correctly  :
>
>> matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt")
>
> Is it a bug ?
> I am using R2.8.1 under windows XP pack3.
>
> Christophe
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] matplot does not considere the parametre lend (PR#13619)

2009-03-23 Thread cgenolin
Full_Name: Christophe Genolini
Version: 2.8.1, but also 2.9
OS: Windows XP
Submission from: (NULL) (82.225.59.146)


I am using matplot with the option lend="butt", but only the first line (the
black) is printed correctly  :

> matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt")

Gabor Grothendieck find the problem in matplot code:
the ... is passed to plot (which plots the first series) but not to lines (which
plots the rest):

if (!add) {
ii <- ii[-1]
plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab,
xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1],
pch = pch[1], col = col[1], cex = cex[1], bg = bg[1],
...)
}
for (i in ii) {
lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i],
pch = pch[i], col = col[i], cex = cex[i], bg = bg[i])
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] savePlot export "strange" eps (PR#13620)

2009-03-23 Thread cgenolin
Full_Name: Christophe Genolini
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (82.225.59.146)


savePlot export "eps" graph that seems to be incorrect. 

Trying to incorporate them in a LaTeX file, I get : 
++
Cannot determine size of graphics in foo.eps (no BoundingBox)
--

Trying to open them with GSview, I get :
++
GSview 4.9 2007-11-18
AFPL Ghostscript 8.54 (2006-05-17)
Copyright (C) 2005 artofcode LLC, Benicia, CA.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Displaying non DSC file C:/Documents and Settings/Christophe/Mes
documents/Recherche/Trajectoires/kmeal/trajectories/testsDev/toti.eps
Error: /undefined in 
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--  
--nostringval--   2   %stopped_push   --nostringval--   --nostringval--   false 
 1   %stopped_push   1   3   %oparray_pop   1   3   %oparray_pop   1   3  
%oparray_pop   1   3   %oparray_pop   .runexec2   --nostringval--  
--nostringval--   --nostringval--   2   %stopped_push   --nostringval--
Dictionary stack:
   --dict:1130/1686(ro)(G)--   --dict:0/20(G)--   --dict:74/200(L)--
Current allocation mode is local
Last OS error: No such file or directory

--- Begin offending input ---
   €      L   z  f  C  fC   EMF   $6  7     
   l       °    €— ° G r a p h A p p %        €%
       €%        €%        €%        €%        €%        €%       
€%        €%        €%        €%        €K   @   0              
N   N   y  @  N   N   y  @  %        €%        €:      
   _   8      8   8 
   %               
   ;            l   *  6      Z  õ  <      @      f   ï  `  0  %   
    €(         %        €%        €K   @   0               N   N 
 y  @  N   N   y  @  %        €%        €:      
   _   8      8   8 
   %               
   ;            m  ñ  6      Z  »  <      @      g  µ  `  ÷  %   
    €(         %        €%        €K   @   0             
 ¡  ¡  ¡  ¡  %        €%        €:      
   _   8      8   8    
--- End offending input ---
file offset = 1024
gsapi_run_string_continue returns -101

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] matplot does not considere the parametre lend (PR#13619)

2009-03-23 Thread Duncan Murdoch

On 23/03/2009 7:25 PM, cgeno...@u-paris10.fr wrote:

Full_Name: Christophe Genolini
Version: 2.8.1, but also 2.9
OS: Windows XP
Submission from: (NULL) (82.225.59.146)


I am using matplot with the option lend="butt", but only the first line (the
black) is printed correctly  :


matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt")


I'd call this another case where it is performing as documented, but 
should probably be changed (but not by me).  In the meantime, there's 
the simple workaround:


save <- par(lend="butt")
matplot(matrix(1:9,3),type="c",lwd=10,lty=1)
par(save)

Duncan Murdoch



Gabor Grothendieck find the problem in matplot code:
the ... is passed to plot (which plots the first series) but not to lines (which
plots the rest):

if (!add) {
ii <- ii[-1]
plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab,
xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1],
pch = pch[1], col = col[1], cex = cex[1], bg = bg[1],
...)
}
for (i in ii) {
lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i],
pch = pch[i], col = col[i], cex = cex[i], bg = bg[i])
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
> Tested in R 2.8.1 Windows
>
>   
>> ff <- formals(function(x)1)
>> ff1 <- as.list(function(x)1)[1]
>> 
> # ff1 acts the same as ff in the examples below, but is a list rather
> than a pairlist
>
>   
>> dput( ff , control=c("warnIncomplete"))
>> 
> list(x = )
>
> This string is not parsable, but dput does not give a warning as specified.
>
>   

same in 2.10.0 r48200, ubuntu 8.04 linux 32 bit


>> dput( ff , control=c("all","warnIncomplete"))
>> 
> list(x = quote())
>   

likewise.

> This string is parseable, but quote() is not evaluable, and again dput
> does not give a warning as specified.
>
> In fact, I don't know how to write out ff$x.  It appears to be the
> zero-length name:
>
> is.name(ff$x) => TRUE
> as.character(ff$x) => ""
>
> but there is no obvious way to create such an object:
>
> as.name("") => execution error
> quote(``) => parse error
>
> The above examples should either produce a parseable and evaluable
> output (preferable), or give a warning.
>   

interestingly,

quote(NULL)
# NULL

as.name(NULL)
# Error in as.name(NULL) :
#  invalid type/length (symbol/0) in vector allocation

æsj.

vQ

> -s
>
> PS As a matter of comparative linguistics, many versions of Lisp allow
> zero-length symbols/names.  But R coerces strings to symbols/names in
> a way that Lisp does not, so that might be an invitation to obscure
> bugs in R where it is rarely problematic in Lisp.
>
> PPS dput(pairlist(23),control="all") also gives the same output as
> dput(list(23),control="all"), but as I understand it, pairlists will
> become non-user-visible at some point.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics & Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Stavros Macrakis
Tested in R 2.8.1 Windows

> ff <- formals(function(x)1)
> ff1 <- as.list(function(x)1)[1]
# ff1 acts the same as ff in the examples below, but is a list rather
than a pairlist

> dput( ff , control=c("warnIncomplete"))
list(x = )

This string is not parsable, but dput does not give a warning as specified.

> dput( ff , control=c("all","warnIncomplete"))
list(x = quote())

This string is parseable, but quote() is not evaluable, and again dput
does not give a warning as specified.

In fact, I don't know how to write out ff$x.  It appears to be the
zero-length name:

is.name(ff$x) => TRUE
as.character(ff$x) => ""

but there is no obvious way to create such an object:

as.name("") => execution error
quote(``) => parse error

The above examples should either produce a parseable and evaluable
output (preferable), or give a warning.

-s

PS As a matter of comparative linguistics, many versions of Lisp allow
zero-length symbols/names.  But R coerces strings to symbols/names in
a way that Lisp does not, so that might be an invitation to obscure
bugs in R where it is rarely problematic in Lisp.

PPS dput(pairlist(23),control="all") also gives the same output as
dput(list(23),control="all"), but as I understand it, pairlists will
become non-user-visible at some point.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread Wacek Kusnierczyk

(this post suggests a patch to the sources, so i allow myself to divert
it to r-devel)

Bert Gunter wrote:
> x a numeric vector, matrix or data frame. 
> y NULL (default) or a vector, matrix or data frame with compatible
> dimensions to x. The default is equivalent to y = x (but more efficient). 
>
>   
bert points to an interesting fragment of ?var:  it suggests that
computing var(x) is more efficient than computing var(x,x), for any x
valid as input to var.  indeed:

set.seed(0)
x = matrix(rnorm(1), 100, 100)

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
   var(x),
   var(x, x))
#test elapsed
# 1var(x)   1.091
# 2 var(x, x)   2.051

that's of course, so to speak, unreasonable:  for what var(x) does is
actually computing the covariance of x and x, which should be the same
as var(x,x). 

the hack is that if y is given, there's an overhead of memory allocation
for *both* x and y when y is given, as seen in src/main/cov.c:720+.
incidentally, it seems that the problem can be solved with a trivial fix
(see the attached patch), so that

set.seed(0)
x = matrix(rnorm(1), 100, 100)

library(rbenchmark)
benchmark(replications=1000, columns=c('test', 'elapsed'),
   var(x),
   var(x, x))
#test elapsed
# 1var(x)   1.121
# 2 var(x, x)   1.107

with the quick checks

all.equal(var(x), var(x, x))
# TRUE
   
all(var(x) == var(x, x))
# TRUE

and for cor it seems to make cor(x,x) slightly faster than cor(x), while
originally it was twice slower:

# original
benchmark(replications=1000, columns=c('test', 'elapsed'),
   cor(x),
   cor(x, x))
#test elapsed
# 1cor(x)   1.196
# 2 cor(x, x)   2.253
   
# patched
benchmark(replications=1000, columns=c('test', 'elapsed'),
   cor(x),
   cor(x, x))
#test elapsed
# 1cor(x)   1.207
# 2 cor(x, x)   1.204

(there is a visible penalty due to an additional pointer test, but it's
10ms on 1000 replications with 1 data points, which i think is
negligible.)

> This is as clear as I would know how to state. 

i believe bert is right.

however, with the above fix, this can now be rewritten as:

"
x: a numeric vector, matrix or data frame. 
y: a vector, matrix or data frame with dimensions compatible to those of x. 
By default, y = x. 
"

which, to my simple mind, is even more clear than what bert would know
how to state, and less likely to cause the sort of confusion that
originated this thread.

the attached patch suggests modifications to src/main/cov.c and
src/library/stats/man/cor.Rd.
it has been prepared and checked as follows:

svn co https://svn.r-project.org/R/trunk trunk
cd trunk
# edited the sources
svn diff > cov.diff
svn revert -R src
patch -p0 < cov.diff

tools/rsync-recommended
./configure
make
make check
bin/R
# subsequent testing within R

if you happen to consider this patch for a commit, please be sure to
examine and test it carefully first.

vQ
Index: src/library/stats/man/cor.Rd
===
--- src/library/stats/man/cor.Rd	(revision 48200)
+++ src/library/stats/man/cor.Rd	(working copy)
@@ -6,9 +6,9 @@
 \name{cor}
 \title{Correlation, Variance and Covariance (Matrices)}
 \usage{
-var(x, y = NULL, na.rm = FALSE, use)
+var(x, y = x, na.rm = FALSE, use)
 
-cov(x, y = NULL, use = "everything",
+cov(x, y = x, use = "everything",
 method = c("pearson", "kendall", "spearman"))
 
 cor(x, y = NULL, use = "everything",
@@ -32,9 +32,7 @@
 }
 \arguments{
   \item{x}{a numeric vector, matrix or data frame.}
-  \item{y}{\code{NULL} (default) or a vector, matrix or data frame with
-compatible dimensions to \code{x}.   The default is equivalent to
-\code{y = x} (but more efficient).}
+  \item{y}{a vector, matrix or data frame with dimensions compatible to those of \code{x}. By default, y = x.}
   \item{na.rm}{logical. Should missing values be removed?}
   \item{use}{an optional character string giving a
 method for computing covariances in the presence
Index: src/main/cov.c
===
--- src/main/cov.c	(revision 48200)
+++ src/main/cov.c	(working copy)
@@ -689,7 +689,7 @@
 if (ansmat) PROTECT(ans = allocMatrix(REALSXP, ncx, ncy));
 else PROTECT(ans = allocVector(REALSXP, ncx * ncy));
 sd_0 = FALSE;
-if (isNull(y)) {
+if (isNull(y) || (DATAPTR(x) == DATAPTR(y))) {
 	if (everything) { /* NA's are propagated */
 	PROTECT(xm = allocVector(REALSXP, ncx));
 	PROTECT(ind = allocVector(LGLSXP, ncx));
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread Duncan Murdoch

On 23/03/2009 7:37 PM, Stavros Macrakis wrote:

Tested in R 2.8.1 Windows


ff <- formals(function(x)1)
ff1 <- as.list(function(x)1)[1]

# ff1 acts the same as ff in the examples below, but is a list rather
than a pairlist


dput( ff , control=c("warnIncomplete"))

list(x = )

This string is not parsable, but dput does not give a warning as specified.


That's not what "warnIncomplete" is documented to do.  The docs (in 
?.deparseOpts) say


 'warnIncomplete' Some exotic objects such as environments,
  external pointers, etc. can not be deparsed properly.  This
  option causes a warning to be issued if any of those may give
  problems.

  Also, the parser in R < 2.7.0 would only accept strings of up
  to 8192 bytes, and this option gives a warning for longer
  strings.

As far as I can see, none of those conditions apply here:  ff is not one 
of those exotic objects or a very long string.  The really relevant 
comment is in the dput documentation:


"Deparsing an object is difficult, and not always possible."

Yes, it would be nice if deparsing and parsing were mutual inverses, but 
they're not, and are documented not to be.




dput( ff , control=c("all","warnIncomplete"))

list(x = quote())

This string is parseable, but quote() is not evaluable, and again dput
does not give a warning as specified.

In fact, I don't know how to write out ff$x. 


I don't know of any input that will parse to it.


 It appears to be the

zero-length name:

is.name(ff$x) => TRUE
as.character(ff$x) => ""


This may give you a hint:

> y <- ff$x
> y
Error: argument "y" is missing, with no default

It's a special internal thing that triggers the missing value error when 
evaluated.  It probably shouldn't be user visible at all.


Duncan Murdoch



but there is no obvious way to create such an object:

as.name("") => execution error
quote(``) => parse error

The above examples should either produce a parseable and evaluable
output (preferable), or give a warning.

-s

PS As a matter of comparative linguistics, many versions of Lisp allow
zero-length symbols/names.  But R coerces strings to symbols/names in
a way that Lisp does not, so that might be an invitation to obscure
bugs in R where it is rarely problematic in Lisp.

PPS dput(pairlist(23),control="all") also gives the same output as
dput(list(23),control="all"), but as I understand it, pairlists will
become non-user-visible at some point.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput(as.list(function...)...) bug

2009-03-23 Thread William Dunlap
> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
> Sent: Monday, March 23, 2009 5:28 PM
> To: Stavros Macrakis
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] dput(as.list(function...)...) bug
> 
> On 23/03/2009 7:37 PM, Stavros Macrakis wrote:
> > Tested in R 2.8.1 Windows
> > 
> >> ff <- formals(function(x)1)
> >> ff1 <- as.list(function(x)1)[1]
> > # ff1 acts the same as ff in the examples below, but is a 
> list rather
> > than a pairlist
> > 
> >> dput( ff , control=c("warnIncomplete"))
> > list(x = )
> > 
> > This string is not parsable, but dput does not give a 
> warning as specified.

The string "list(x = )" is parsable:
  z <- parse(text="list(x = )")
Evaluating the resulting expression results in a run-time error:
  eval(z)
  Error in eval(expr, envir, enclos) :
element 1 is empty;
 the part of the args list of 'list' being evaluated was:
 (x = )
That is the same sort of error you get from running list(,):
list wants all of its arguments to be present.

With other functions such a construct will run in R, although its result
does not match that of S+ (or SV4):

  > f<-function(x,y,z)c(x=if(missing(x))""else x,
y=if(missing(y))"" else y,
z=if(missing(z))"" else z)
  R> f(x=,2,3)
x   y   z
  "2" "3" ""
  S+> f(x=,2,3)
 x   y   z
   "" "2" "3"
or
  R> f(y=,1,3)
x   y   z
  "1" "3" ""
  S+> f(y=,1,3)
 x   y   z
   "1" "" "3"

R and S+ act the same if you skip an argument by position
  > f(1,,3)
 x   y   z
   "1" "" "3"
but differ if you use name=: in S+ it skips an argument by name
and in R it is ignored by ordinary functions (where
typeof(func)=="closure").

I wouldn't say this is recommended or often used or the point
of the original post.
 
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> 
> That's not what "warnIncomplete" is documented to do.  The docs (in 
> ?.deparseOpts) say
> 
>   'warnIncomplete' Some exotic objects such as environments,
>external pointers, etc. can not be deparsed properly.  This
>option causes a warning to be issued if any of 
> those may give
>problems.
> 
>Also, the parser in R < 2.7.0 would only accept 
> strings of up
>to 8192 bytes, and this option gives a warning for longer
>strings.
> 
> As far as I can see, none of those conditions apply here:  ff 
> is not one 
> of those exotic objects or a very long string.  The really relevant 
> comment is in the dput documentation:
> 
> "Deparsing an object is difficult, and not always possible."
> 
> Yes, it would be nice if deparsing and parsing were mutual 
> inverses, but 
> they're not, and are documented not to be.
> 
> 
> >> dput( ff , control=c("all","warnIncomplete"))
> > list(x = quote())
> > 
> > This string is parseable, but quote() is not evaluable, and 
> again dput
> > does not give a warning as specified.
> > 
> > In fact, I don't know how to write out ff$x. 
> 
> I don't know of any input that will parse to it.
> 
> 
>   It appears to be the
> > zero-length name:
> > 
> > is.name(ff$x) => TRUE
> > as.character(ff$x) => ""
> 
> This may give you a hint:
> 
>  > y <- ff$x
>  > y
> Error: argument "y" is missing, with no default
> 
> It's a special internal thing that triggers the missing value 
> error when 
> evaluated.  It probably shouldn't be user visible at all.
> 
> Duncan Murdoch
> 
> > 
> > but there is no obvious way to create such an object:
> > 
> > as.name("") => execution error
> > quote(``) => parse error
> > 
> > The above examples should either produce a parseable and evaluable
> > output (preferable), or give a warning.
> > 
> > -s
> > 
> > PS As a matter of comparative linguistics, many versions of 
> Lisp allow
> > zero-length symbols/names.  But R coerces strings to 
> symbols/names in
> > a way that Lisp does not, so that might be an invitation to obscure
> > bugs in R where it is rarely problematic in Lisp.
> > 
> > PPS dput(pairlist(23),control="all") also gives the same output as
> > dput(list(23),control="all"), but as I understand it, pairlists will
> > become non-user-visible at some point.
> > 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread William Dunlap
Doesn't Fortran still require that the arguments to
a function not alias each other (in whole or in part)?
I could imagine that var() might call into Fortran code
(BLAS or LAPACK).  Wouldn you want to chance erroneous
results  at a high optimization level to save a bit of
time in an unusual situation?

(I could also imagine someone changing the R interpreter
so that x and x[-length(x)] could share the same memory
block and that could cause Fortran aliasing problems as
well.)

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk
> Sent: Monday, March 23, 2009 4:40 PM
> To: r-devel@r-project.org
> Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter
> Subject: Re: [Rd] [R] variance/mean
> 
> 
> (this post suggests a patch to the sources, so i allow myself 
> to divert
> it to r-devel)
> 
> Bert Gunter wrote:
> > x a numeric vector, matrix or data frame. 
> > y NULL (default) or a vector, matrix or data frame with compatible
> > dimensions to x. The default is equivalent to y = x (but 
> more efficient). 
> >
> >   
> bert points to an interesting fragment of ?var:  it suggests that
> computing var(x) is more efficient than computing var(x,x), for any x
> valid as input to var.  indeed:
> 
> set.seed(0)
> x = matrix(rnorm(1), 100, 100)
> 
> library(rbenchmark)
> benchmark(replications=1000, columns=c('test', 'elapsed'),
>var(x),
>var(x, x))
> #test elapsed
> # 1var(x)   1.091
> # 2 var(x, x)   2.051
> 
> that's of course, so to speak, unreasonable:  for what var(x) does is
> actually computing the covariance of x and x, which should be the same
> as var(x,x). 
> 
> the hack is that if y is given, there's an overhead of memory 
> allocation
> for *both* x and y when y is given, as seen in src/main/cov.c:720+.
> incidentally, it seems that the problem can be solved with a 
> trivial fix
> (see the attached patch), so that
> 
> set.seed(0)
> x = matrix(rnorm(1), 100, 100)
> 
> library(rbenchmark)
> benchmark(replications=1000, columns=c('test', 'elapsed'),
>var(x),
>var(x, x))
> #test elapsed
> # 1var(x)   1.121
> # 2 var(x, x)   1.107
> 
> with the quick checks
> 
> all.equal(var(x), var(x, x))
> # TRUE
>
> all(var(x) == var(x, x))
> # TRUE
> 
> and for cor it seems to make cor(x,x) slightly faster than 
> cor(x), while
> originally it was twice slower:
> 
> # original
> benchmark(replications=1000, columns=c('test', 'elapsed'),
>cor(x),
>cor(x, x))
> #test elapsed
> # 1cor(x)   1.196
> # 2 cor(x, x)   2.253
>
> # patched
> benchmark(replications=1000, columns=c('test', 'elapsed'),
>cor(x),
>cor(x, x))
> #test elapsed
> # 1cor(x)   1.207
> # 2 cor(x, x)   1.204
> 
> (there is a visible penalty due to an additional pointer 
> test, but it's
> 10ms on 1000 replications with 1 data points, which i think is
> negligible.)
> 
> > This is as clear as I would know how to state. 
> 
> i believe bert is right.
> 
> however, with the above fix, this can now be rewritten as:
> 
> "
> x: a numeric vector, matrix or data frame. 
> y: a vector, matrix or data frame with dimensions compatible 
> to those of x. 
> By default, y = x. 
> "
> 
> which, to my simple mind, is even more clear than what bert would know
> how to state, and less likely to cause the sort of confusion that
> originated this thread.
> 
> the attached patch suggests modifications to src/main/cov.c and
> src/library/stats/man/cor.Rd.
> it has been prepared and checked as follows:
> 
> svn co https://svn.r-project.org/R/trunk trunk
> cd trunk
> # edited the sources
> svn diff > cov.diff
> svn revert -R src
> patch -p0 < cov.diff
> 
> tools/rsync-recommended
> ./configure
> make
> make check
> bin/R
> # subsequent testing within R
> 
> if you happen to consider this patch for a commit, please be sure to
> examine and test it carefully first.
> 
> vQ
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] variance/mean

2009-03-23 Thread William Dunlap
Oops, I was thinking backwards.  This sort of
hack could avoid the Fortran aliasing rules, not
run afoul of them.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of William Dunlap
> Sent: Monday, March 23, 2009 6:18 PM
> To: Wacek Kusnierczyk; r-devel@r-project.org
> Subject: Re: [Rd] [R] variance/mean
> 
> Doesn't Fortran still require that the arguments to
> a function not alias each other (in whole or in part)?
> I could imagine that var() might call into Fortran code
> (BLAS or LAPACK).  Wouldn you want to chance erroneous
> results  at a high optimization level to save a bit of
> time in an unusual situation?
> 
> (I could also imagine someone changing the R interpreter
> so that x and x[-length(x)] could share the same memory
> block and that could cause Fortran aliasing problems as
> well.)
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com  
> 
> > -Original Message-
> > From: r-devel-boun...@r-project.org 
> > [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek 
> Kusnierczyk
> > Sent: Monday, March 23, 2009 4:40 PM
> > To: r-devel@r-project.org
> > Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter
> > Subject: Re: [Rd] [R] variance/mean
> > 
> > 
> > (this post suggests a patch to the sources, so i allow myself 
> > to divert
> > it to r-devel)
> > 
> > Bert Gunter wrote:
> > > x a numeric vector, matrix or data frame. 
> > > y NULL (default) or a vector, matrix or data frame with compatible
> > > dimensions to x. The default is equivalent to y = x (but 
> > more efficient). 
> > >
> > >   
> > bert points to an interesting fragment of ?var:  it suggests that
> > computing var(x) is more efficient than computing var(x,x), 
> for any x
> > valid as input to var.  indeed:
> > 
> > set.seed(0)
> > x = matrix(rnorm(1), 100, 100)
> > 
> > library(rbenchmark)
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> >var(x),
> >var(x, x))
> > #test elapsed
> > # 1var(x)   1.091
> > # 2 var(x, x)   2.051
> > 
> > that's of course, so to speak, unreasonable:  for what 
> var(x) does is
> > actually computing the covariance of x and x, which should 
> be the same
> > as var(x,x). 
> > 
> > the hack is that if y is given, there's an overhead of memory 
> > allocation
> > for *both* x and y when y is given, as seen in src/main/cov.c:720+.
> > incidentally, it seems that the problem can be solved with a 
> > trivial fix
> > (see the attached patch), so that
> > 
> > set.seed(0)
> > x = matrix(rnorm(1), 100, 100)
> > 
> > library(rbenchmark)
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> >var(x),
> >var(x, x))
> > #test elapsed
> > # 1var(x)   1.121
> > # 2 var(x, x)   1.107
> > 
> > with the quick checks
> > 
> > all.equal(var(x), var(x, x))
> > # TRUE
> >
> > all(var(x) == var(x, x))
> > # TRUE
> > 
> > and for cor it seems to make cor(x,x) slightly faster than 
> > cor(x), while
> > originally it was twice slower:
> > 
> > # original
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> >cor(x),
> >cor(x, x))
> > #test elapsed
> > # 1cor(x)   1.196
> > # 2 cor(x, x)   2.253
> >
> > # patched
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> >cor(x),
> >cor(x, x))
> > #test elapsed
> > # 1cor(x)   1.207
> > # 2 cor(x, x)   1.204
> > 
> > (there is a visible penalty due to an additional pointer 
> > test, but it's
> > 10ms on 1000 replications with 1 data points, which i think is
> > negligible.)
> > 
> > > This is as clear as I would know how to state. 
> > 
> > i believe bert is right.
> > 
> > however, with the above fix, this can now be rewritten as:
> > 
> > "
> > x: a numeric vector, matrix or data frame. 
> > y: a vector, matrix or data frame with dimensions compatible 
> > to those of x. 
> > By default, y = x. 
> > "
> > 
> > which, to my simple mind, is even more clear than what bert 
> would know
> > how to state, and less likely to cause the sort of confusion that
> > originated this thread.
> > 
> > the attached patch suggests modifications to src/main/cov.c and
> > src/library/stats/man/cor.Rd.
> > it has been prepared and checked as follows:
> > 
> > svn co https://svn.r-project.org/R/trunk trunk
> > cd trunk
> > # edited the sources
> > svn diff > cov.diff
> > svn revert -R src
> > patch -p0 < cov.diff
> > 
> > tools/rsync-recommended
> > ./configure
> > make
> > make check
> > bin/R
> > # subsequent testing within R
> > 
> > if you happen to consider this patch for a commit, please be sure to
> > examine and test it carefully first.
> > 
> > vQ
> > 
> 
> _