Re: [Rd] Why does the lexical analyzer drop comments ?
Duncan Murdoch wrote: On 22/03/2009 4:50 PM, Romain Francois wrote: Romain Francois wrote: Peter Dalgaard wrote: Duncan Murdoch wrote: On 3/20/2009 2:56 PM, romain.franc...@dbmail.com wrote: It happens in the token function in gram.c:    c = SkipSpace();    if (c == '#') c = SkipComment(); and then SkipComment goes like that: static int SkipComment(void) {    int c;    while ((c = xxgetc()) != '\n' && c != R_EOF) ;    if (c == R_EOF) EndOfFile = 2;    return c; } which effectively drops comments. Would it be possible to keep the information somewhere ? The source code says this:  * The function yylex() scans the input, breaking it into  * tokens which are then passed to the parser. The lexical  * analyser maintains a symbol table (in a very messy fashion). so my question is could we use this symbol table to keep track of, say, COMMENT tokens. Why would I even care about that ? I'm writing a package that will perform syntax highlighting of R source code based on the output of the parser, and it seems a waste to drop the comments. An also, when you print a function to the R console, you don't get the comments, and some of them might be useful to the user. Am I mad if I contemplate looking into this ? Comments are syntactically the same as whitespace. You don't want them to affect the parsing. Well, you might, but there is quite some madness lying that way. Back in the bronze age, we did actually try to keep comments attached to (AFAIR) the preceding token. One problem is that the elements of the parse tree typically involve multiple tokens, and if comments after different tokens get stored in the same place something is not going back where it came from when deparsing. So we had problems with comments moving from one end of a loop the other and the like. Ouch. That helps picturing the kind of madness ... Another way could be to record comments separately (similarly to srcfile attribute for example) instead of dropping them entirely, but I guess this is the same as Duncan's idea, which is easier to set up. You could try extending the scheme by encoding which part of a syntactic structure the comment belongs to, but consider for instance how many places in a function call you can stick in a comment. f #here ( #here a #here (possibly) = #here 1 #this one belongs to the argument, though ) #but here as well Coming back on this. I actually get two expressions: > p <- parse( "/tmp/parsing.R") > str( p ) length 2 expression(f, (a = 1)) - attr(*, "srcref")=List of 2 ..$ :Class 'srcref' atomic [1:6] 1 1 1 1 1 1 .. .. ..- attr(*, "srcfile")=Class 'srcfile' ..$ :Class 'srcref' atomic [1:6] 2 1 6 1 1 1 .. .. ..- attr(*, "srcfile")=Class 'srcfile' - attr(*, "srcfile")=Class 'srcfile' But anyway, if I drop the first comment, then I get one expression with some srcref information: > p <- parse( "/tmp/parsing.R") > str( p ) length 1 expression(f(a = 1)) - attr(*, "srcref")=List of 1 ..$ :Class 'srcref' atomic [1:6] 1 1 5 1 1 1 .. .. ..- attr(*, "srcfile")=Class 'srcfile' - attr(*, "srcfile")=Class 'srcfile' but as far as i can see, there is only srcref information for that expression as a whole, it does not go beyond, so I am not sure I can implement Duncan's proposal without more detailed information from the parser, since I will only have the chance to check if a whitespace is actually a comment if it is between two expressions with a srcref. Currently srcrefs are only attached to whole statements. Since your source only included one or two statements, you only get one or two srcrefs. It would not be hard to attach a srcref to every subexpression; there hasn't been a need for that before, so I didn't do it just for the sake of efficiency. I understand that. I wanted to make sure I did not miss something. However, it might make sense for you to have your own parser, based on the grammar in R's parser, but handling white space differently. Certainly it would make sense to do that before making changes to the base R one. The whole source is in src/main/gram.y; if you're not familiar with Bison, I can give you a hand. Thank you, I appreciate your help. Having my own parser is the option I am slowly converging to. I'll start with reading bison documentation. Besides bison documents, is there R specific documentation on how the R parser was written ? Duncan Murdoch Would it be sensible then to retain the comments and their srcref information, but separate from the tokens used for the actual parsing, in some other attribute of the output of parse ? Romain If you're doing syntax highlighting, you can determine the whitespace by looking at the srcref records, and then parse that to determine what isn't being counted as tokens. (I think you'll find a few things there besides whitespace, but it is a fairly limited set, so shouldn't be too hard to recognize.) The Rd parser is differ
[Rd] all.equal is hard to use
Hi, I have extensive programming experience (Winodws, Unix, scripting, compiled languages, you name it) but new to R. I found that it is quite hard to interpret the results returned by all.equal (base). The main problem is that when attributes are compared, they are sorted in attr.all.equal but in the result, the index of diff component is from the sorted list not the original list. I think that adding the component name to the printout may make users' life a little bit easier like function (target, current, check.attributes = TRUE, ...) { msg <- if (check.attributes) # if it is called by attr.all.equal(), target and current # are lists returned from attributes(original target | current). # So attributes of target and current are the attributes of attributes, # which contains only "names". attr.all.equal(target, current, ...) iseq <- if (length(target) == length(current)) { # if the length is equal, iseq will be a (1, 2, ... length) seq_along(target) } else { if (!is.null(msg)) # remove old msg about "Lengths" msg <- msg[-grep("\\bLengths\\b", msg)] nc <- min(length(target), length(current)) msg <- c(msg, paste("Length mismatch: comparison on first", nc, "components")) # iseq is (1,2, ..., shorter of two lengthes) seq_len(nc) } for (i in iseq) { # compare each element in the list with all.equal. mi <- all.equal(target[[i]], current[[i]], check.attributes = check.attributes, ...) if (is.character(mi)) { print out name if possible if (!is.null(names(target)[i]) && !is.null(names(current)[i])) msg <- c(msg, paste("Component ", i, ": ", mi, "with target name: ", names(target)[i], ", current name: ", names(current)[i], sep = "")) else if (!is.null(names(target)[i])) msg <- c(msg, paste("Component ", i, ": ", mi, "with target name: ", names(target)[i], sep = "")) else if (!is.null(names(current)[i])) msg <- c(msg, paste("Component ", i, ": ", mi, "with current name: ", names(current)[i], sep = "")) else msg <- c(msg, paste("Component ", i, ": ", mi, sep = "")) } } if (is.null(msg)) TRUE else msg } Hong Shen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub') (PR#13617)
Full_Name: Wacek Kusnierczyk Version: 2.10.0 r48181 OS: Ubuntu 8.04 Linux 32bit Submission from: (NULL) (129.241.199.135) there seems to be something wrong with r's regexing. consider the following example: gregexpr('a*|b', 'ab') # positions: 1 2 # lengths: 1 1 gsub('a*|b', '.', 'ab') # .. where the pattern matches any number of 'a's or one b, and replaces the match with a dot, globally. the answer is correct (assuming a dfa engine). however, gregexpr('a*|b', 'ab', perl=TRUE) # positions: 1 2 # lengths: 1 0 gsub('a*|b', '.', 'ab', perl=TRUE) # .b. where the pattern is identical, but the result is wrong. perl uses an nfa (if it used a dfa, the result would still be wrong), and in the above example it should find *four* matches, collectively including *all* letters in the input, thus producing *four* dots (and *only* dots) in the output: perl -le ' $input = qq|ab|; print qq|match: "$_"| foreach $input =~ /a*|b/g; $input =~ s/a*|b/./g; print qq|output: "$input"|;' # match: "a" # match: "" # match: "b" # match: "" # output: "" since with perl=TRUE both gregexpr and gsub seem to use pcre, i've checked the example with pcretest, and also with a trivial c program (available on demand) using the pcre api; there were four matches, exactly as in the perl bit above. the results above are surprising, and suggest a bug in r's use of pcre rather than in pcre itself. possibly, the issue is that when an empty sting is matched (with a*, for example), the next attempt is not trying to match a non-empty string at the same position, but rather an empty string again at the next position. for example, gsub('a|b|c', '.', 'abc', perl=TRUE) # "...", correct gsub('a*|b|c', '.', 'abc', perl=TRUE) # ".b.c.", wrong gsub('a|b*|c', '.', 'abc', perl=TRUE) # "..c.", wrong (but now only 'c' remains) gsub('a|b*|c', '.', 'aba', perl=TRUE) # "...", incidentally correct without detailed analysis of the code, i guess the bug is located somewhere in src/main/pcre.c, and is distributed among the do_p* functions, so that multiple fixes may be needed. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] incoherent treatment of NULL
somewhat related to a previous discussion [1] on how 'names<-' would sometimes modify its argument in place, and sometimes produce a modified copy without changing the original, here's another example of how it becomes visible to the user when r makes or doesn't make a copy of an object: x = NULL dput(x) # NULL class(x) = 'integer' # error: invalid (NULL) left side of assignment x = c() dput(x) # NULL class(x) = 'integer' dput(x) # integer(0) in both cases, x ends up with the value NULL (the no-value object). in both cases, dput explains that x is NULL. in both cases, an attempt is made to make x be an empty integer vector. the first fails, because it tries to modify NULL itself, the latter apparently does not and succeeds. however, the following has a different pattern: x = NULL dput(x) # NULL names(x) = character(0) # error: attempt to set an attribute on NULL x = c() dput(x) # NULL names(x) = character(0) # error: attempt to set an attribute on NULL and also: x = c() class(x) = 'integer' # fine class(x) = 'foo' # error: attempt to set an attribute on NULL how come? the behaviour can obviously be explained by looking at the source code (hardly surprisingly, because it is as it is because the source is as it is), and referring to the NAMED property (i.e., the sxpinfo.named field of a SEXPREC struct). but can the *design* be justified? can the apparent incoherences visible above the interface be defended? why should the first example above be unable to produce an empty integer vector? why is it possible to set a class attribute, but not a names attribute, on c()? why is it possible to set the class attribute in c() to 'integer', but not to 'foo'? why are there different error messages for apparently the same problem? vQ [1] search the rd archives for 'surprising behaviour of names<-' __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] incoherent treatment of NULL
> "WK" == Wacek Kusnierczyk > on Mon, 23 Mar 2009 09:52:19 +0100 writes: WK> somewhat related to a previous discussion [1] on how 'names<-' would WK> sometimes modify its argument in place, and sometimes produce a modified WK> copy without changing the original, here's another example of how it WK> becomes visible to the user when r makes or doesn't make a copy of an WK> object: WK> x = NULL WK> dput(x) WK> # NULL WK> class(x) = 'integer' WK> # error: invalid (NULL) left side of assignment does not happen for me in R-2.8.1, R-patched or newer So you must be using your own patched version of R ? WK> x = c() WK> dput(x) WK> # NULL WK> class(x) = 'integer' WK> dput(x) WK> # integer(0) WK> in both cases, x ends up with the value NULL (the no-value object). in WK> both cases, dput explains that x is NULL. in both cases, an attempt is WK> made to make x be an empty integer vector. the first fails, because it WK> tries to modify NULL itself, the latter apparently does not and succeeds. WK> however, the following has a different pattern: WK> x = NULL WK> dput(x) WK> # NULL WK> names(x) = character(0) WK> # error: attempt to set an attribute on NULL WK> x = c() WK> dput(x) WK> # NULL WK> names(x) = character(0) WK> # error: attempt to set an attribute on NULL WK> and also: WK> x = c() WK> class(x) = 'integer' WK> # fine WK> class(x) = 'foo' WK> # error: attempt to set an attribute on NULL WK> how come? the behaviour can obviously be explained by looking at the WK> source code (hardly surprisingly, because it is as it is because the WK> source is as it is), and referring to the NAMED property (i.e., the WK> sxpinfo.named field of a SEXPREC struct). but can the *design* be WK> justified? can the apparent incoherences visible above the interface be WK> defended? WK> why should the first example above be unable to produce an empty integer WK> vector? WK> why is it possible to set a class attribute, but not a names attribute, WK> on c()? WK> why is it possible to set the class attribute in c() to 'integer', but WK> not to 'foo'? WK> why are there different error messages for apparently the same problem? WK> vQ WK> [1] search the rd archives for 'surprising behaviour of names<-' WK> __ WK> R-devel@r-project.org mailing list WK> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] incoherent treatment of NULL
Martin Maechler wrote: >> "WK" == Wacek Kusnierczyk >> >> > WK> somewhat related to a previous discussion [1] on how 'names<-' would > WK> sometimes modify its argument in place, and sometimes produce a > modified > WK> copy without changing the original, here's another example of how it > WK> becomes visible to the user when r makes or doesn't make a copy of an > WK> object: > > WK> x = NULL > WK> dput(x) > WK> # NULL > WK> class(x) = 'integer' > WK> # error: invalid (NULL) left side of assignment > > does not happen for me in R-2.8.1, R-patched or newer > > So you must be using your own patched version of R ? > oops, i meant to use 2.8.1 or devel for testing. you're right, in this example there is no error reported in > 2.8.0, but see below. > > WK> x = c() > WK> dput(x) > WK> # NULL > WK> class(x) = 'integer' > WK> dput(x) > WK> # integer(0) > > WK> in both cases, x ends up with the value NULL (the no-value object). > in > WK> both cases, dput explains that x is NULL. in both cases, an attempt > is > WK> made to make x be an empty integer vector. the first fails, because > it > WK> tries to modify NULL itself, the latter apparently does not and > succeeds. > > WK> however, the following has a different pattern: > > WK> x = NULL > WK> dput(x) > WK> # NULL > WK> names(x) = character(0) > WK> # error: attempt to set an attribute on NULL > i get the error in devel. > WK> x = c() > WK> dput(x) > WK> # NULL > WK> names(x) = character(0) > WK> # error: attempt to set an attribute on NULL > i get the error in devel. > WK> and also: > > WK> x = c() > WK> class(x) = 'integer' > WK> # fine > WK> class(x) = 'foo' > WK> # error: attempt to set an attribute on NULL > i get the error in devel. it doesn't seem coherent to me: why can i set the class, but not names attribute on both NULL and c()? why can i set the class attribute to 'integer', but not to 'foo', as i could on a non-empty vector: x = 1 class(x) = 'foo' # just fine i'd naively expect to be able to create an empty vector classed 'foo', displayed perhaps as # speculation x = NULL class(x) = 'foo' x # foo(0) or maybe as x # NULL # attr(, "class") # [1] "foo" vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why does the lexical analyzer drop comments ?
On 23/03/2009 3:10 AM, Romain Francois wrote: Duncan Murdoch wrote: However, it might make sense for you to have your own parser, based on the grammar in R's parser, but handling white space differently. Certainly it would make sense to do that before making changes to the base R one. The whole source is in src/main/gram.y; if you're not familiar with Bison, I can give you a hand. Thank you, I appreciate your help. Having my own parser is the option I am slowly converging to. I'll start with reading bison documentation. Besides bison documents, is there R specific documentation on how the R parser was written ? I don't think so. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] incoherent treatment of NULL
> "WK" == Wacek Kusnierczyk > on Mon, 23 Mar 2009 10:56:37 +0100 writes: WK> Martin Maechler wrote: >>> "WK" == Wacek Kusnierczyk >>> >>> WK> somewhat related to a previous discussion [1] on how 'names<-' would WK> sometimes modify its argument in place, and sometimes produce a modified WK> copy without changing the original, here's another example of how it WK> becomes visible to the user when r makes or doesn't make a copy of an WK> object: >> WK> x = NULL WK> dput(x) WK> # NULL WK> class(x) = 'integer' WK> # error: invalid (NULL) left side of assignment >> >> does not happen for me in R-2.8.1, R-patched or newer >> >> So you must be using your own patched version of R ? >> WK> oops, i meant to use 2.8.1 or devel for testing. you're right, in this WK> example there is no error reported in > 2.8.0, but see below. ok [.. omitted part no longer relevant ] WK> however, the following has a different pattern: >> WK> x = NULL WK> dput(x) WK> # NULL WK> names(x) = character(0) WK> # error: attempt to set an attribute on NULL >> WK> i get the error in devel. Yes, NULL is NULL is NULL ! Do read ?NULL ! [ ;-) ] more verbously, all NULL objects in R are identical, or as the help page says, there's only ``*The* NULL Object'' in R, i.e., NULL cannot get any attributes. WK> x = c() WK> dput(x) WK> # NULL WK> names(x) = character(0) WK> # error: attempt to set an attribute on NULL >> WK> i get the error in devel. of course! [I think *you* should have noticed that NULL and c() *are* identical] WK> and also: >> WK> x = c() WK> class(x) = 'integer' WK> # fine "fine" yes; here, the convention has been to change NULL into integer(0); and no, this won't change, if you find it inconsistent. WK> class(x) = 'foo' WK> # error: attempt to set an attribute on NULL >> WK> i get the error in devel. No, not if you evaluate the statements above (where 'x' has become 'integer(0)' in the mean time). But yes, you get in something like x <- c(); class(x) <- "foo" and I do agree that there's a buglet : The error message should be slightly more precise, --- improvement proposals are welcome --- but an error nontheless WK> it doesn't seem coherent to me: why can i set the class, you cannot set it, you can *change* it. WK> but not names WK> attribute on both NULL and c()? why can i set the class attribute to WK> 'integer', but not to 'foo', as i could on a non-empty vector: WK> x = 1 WK> class(x) = 'foo' WK> # just fine mainly because 'NULL is NULL is NULL' (NULL cannot have attributes) WK> i'd naively expect to be able to create an empty vector classed 'foo', yes, but that expectation is wrong WK> displayed perhaps as WK> # speculation WK> x = NULL WK> class(x) = 'foo' WK> x WK> # foo(0) WK> or maybe as WK> x WK> # NULL WK> # attr(, "class") WK> # [1] "foo" WK> vQ WK> __ WK> R-devel@r-project.org mailing list WK> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] incoherent treatment of NULL
Martin Maechler wrote: > > [.. omitted part no longer relevant ] > > WK> however, the following has a different pattern: > >> > WK> x = NULL > WK> dput(x) > WK> # NULL > WK> names(x) = character(0) > WK> # error: attempt to set an attribute on NULL > >> > > WK> i get the error in devel. > > Yes, NULL is NULL is NULL ! Do read ?NULL ! [ ;-) ] > > more verbously, all NULL objects in R are identical, or as the > help page says, there's only ``*The* NULL Object'' in R, > i.e., NULL cannot get any attributes. > yes, but that's not the issue. the issue is that names(x)<- seems to try to attach an attribute to NULL, while it could, in principle, do the same as class(x)<-, i.e., coerce x to some type (and hence attach the name attribute not to NULL, but to the coerced-to object). but, as someone else explained to me behind the scenes, the matters are a little bit, so to speak, untidy: x = NULL class(x) = 'integer' # just fine x = NULL attr(x, 'class') = 'integer' # no go where class()<-, but not attr(,'class')<-, will try to coerce x to an object of the storage *mode* 'integer', hence the former succeeds (because it sets, roughly, the 'integer' class on an empty integer vector), while the latter fails (because it tries to set the 'integer' class on NULL itself). what was not clear to me is not why setting a class on NULL fails here, but why it is setting on NULL in the first place. after all, x = 1 names(x) = 'foo' is setting names on a *copy* of 1, not on *the* 1, so why could not class()<- create a 'copy' of NULL, i.e., an empty vector of some type (perhaps raw, as the lowest in the hierarchy). > WK> x = c() > WK> dput(x) > WK> # NULL > WK> names(x) = character(0) > WK> # error: attempt to set an attribute on NULL > >> > > WK> i get the error in devel. > > of course! >[I think *you* should have noticed that NULL and c() *are* identical] > > WK> and also: > >> > WK> x = c() > WK> class(x) = 'integer' > WK> # fine > "fine" yes; > here, the convention has been to change NULL into integer(0); > and no, this won't change, if you find it inconsistent. > that's ok, this is what i'd expect in the other cases, too (modulo the actual storage mode). > > WK> class(x) = 'foo' > WK> # error: attempt to set an attribute on NULL > >> > > WK> i get the error in devel. > > No, not if you evaluate the statements above (where 'x' has > become 'integer(0)' in the mean time). > > But yes, you get in something like > > x <- c(); class(x) <- "foo" > that's what i meant, must have forgotten the x = c(). > and I do agree that there's a buglet : > The error message should be slightly more precise, > --- improvement proposals are welcome --- > but an error nontheless > > WK> it doesn't seem coherent to me: why can i set the class, > > you cannot set it, you can *change* it. > terminological wars? btw. the class of NULL is "NULL"; why can't nullify an object by setting its class to 'NULL'? x = 1 class(x) = 'NULL' x # *not* NULL and one more interesting example: x = 1:2 class(x) = 'NULL' x # [1] 1 2 # attr(,"class") "NULL" x[1] # 1 x[2] # 2 is.vector(x) # FALSE hurray!!! apparently, i've alchemized a non-vector vector... (you can do it in r-devel, for that matter). > WK> but not names > WK> attribute on both NULL and c()? why can i set the class attribute to > WK> 'integer', but not to 'foo', as i could on a non-empty vector: > > WK> x = 1 > WK> class(x) = 'foo' > WK> # just fine > > mainly because 'NULL is NULL is NULL' > (NULL cannot have attributes) > yes yes yes; the question was, once again: why is x still NULL? > WK> i'd naively expect to be able to create an empty vector classed 'foo', > > yes, but that expectation is wrong > wrt. the actual state of matters, not necessarily wrt. the ideal state of matters ;) (i don't insist) vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] incoherent treatment of NULL
> "WK" == Wacek Kusnierczyk > on Mon, 23 Mar 2009 16:11:04 +0100 writes: WK> Martin Maechler wrote: >> >> [.. omitted part no longer relevant ] >> WK> however, the following has a different pattern: >> >> WK> x = NULL WK> dput(x) WK> # NULL WK> names(x) = character(0) WK> # error: attempt to set an attribute on NULL >> >> >> WK> i get the error in devel. >> >> Yes, NULL is NULL is NULL ! Do read ?NULL ! [ ;-) ] >> >> more verbously, all NULL objects in R are identical, or as the >> help page says, there's only ``*The* NULL Object'' in R, >> i.e., NULL cannot get any attributes. >> WK> yes, but that's not the issue. the issue is that names(x)<- seems to WK> try to attach an attribute to NULL, while it could, in principle, do the WK> same as class(x)<-, i.e., coerce x to some type (and hence attach the WK> name attribute not to NULL, but to the coerced-to object). yes, it could; but really, the fact that 'class<-' works is the exception. The other variants (with the error message) are the rule. WK> but, as someone else explained to me behind the scenes, the matters are WK> a little bit, so to speak, untidy: WK> x = NULL WK> class(x) = 'integer' WK> # just fine WK> x = NULL WK> attr(x, 'class') = 'integer' WK> # no go WK> where class()<-, but not attr(,'class')<-, will try to coerce x to an WK> object of the storage *mode* 'integer', hence the former succeeds WK> (because it sets, roughly, the 'integer' class on an empty integer WK> vector), while the latter fails (because it tries to set the 'integer' WK> class on NULL itself). WK> what was not clear to me is not why setting a class on NULL fails here, WK> but why it is setting on NULL in the first place. after all, WK> x = 1 WK> names(x) = 'foo' WK> is setting names on a *copy* of 1, not on *the* 1, so why could not WK> class()<- create a 'copy' of NULL, i.e., an empty vector of some type WK> (perhaps raw, as the lowest in the hierarchy). yes, it could. I personally don't think this would add any value to R's behavior; rather, for most useRs I'd think it rather helps to get an error in such a case, than a raw(0) object. Also, note (here and further below), that Using "class(.) <- " is an S3 idiom and S3 classes ``don't really exist'', the "class" attribute being a useful hack, and many of us would rather like to work and improve working with S4 classes (& generics & methods) than to fiddle with 'class<-'. In S4, you'd use setClass(.), new(.) and setAs(.), typically, for defining and changing classes of objects. But maybe I have now lead you into a direction I will later regret, when you start telling us about the perceived inconsistencies of S4 classes, methods, etc. BTW: If you go there, please do use R 2.9.0 (or newer) exclusively. WK> x = c() WK> dput(x) WK> # NULL WK> names(x) = character(0) WK> # error: attempt to set an attribute on NULL >> >> >> WK> i get the error in devel. >> >> of course! >> [I think *you* should have noticed that NULL and c() *are* identical] >> WK> and also: >> >> WK> x = c() WK> class(x) = 'integer' WK> # fine >> "fine" yes; >> here, the convention has been to change NULL into integer(0); >> and no, this won't change, if you find it inconsistent. >> WK> that's ok, this is what i'd expect in the other cases, too (modulo the WK> actual storage mode). >> WK> class(x) = 'foo' WK> # error: attempt to set an attribute on NULL >> >> >> WK> i get the error in devel. >> >> No, not if you evaluate the statements above (where 'x' has >> become 'integer(0)' in the mean time). >> >> But yes, you get in something like >> >> x <- c(); class(x) <- "foo" >> WK> that's what i meant, must have forgotten the x = c(). >> and I do agree that there's a buglet : >> The error message should be slightly more precise, >> --- improvement proposals are welcome --- >> but an error nontheless >> WK> it doesn't seem coherent to me: why can i set the class, >> >> you cannot set it, you can *change* it. >> WK> terminological wars? WK> btw. the class of NULL is "NULL"; why can't nullify an object by WK> setting its class to 'NULL'? WK> x = 1 WK> class(x) = 'NULL' WK> x WK> # *not* NULL see above {S4 / S3 / ...}; If you want to "nullify", rather use more (S-language) idiomatic calls like as(x, "NULL") or as.null(x) both of which do work. Regards, Martin WK> and one more interesting example: WK> x = 1:2 WK> class(x) = 'NULL' WK> x WK> # [1] 1 2 WK> # attr(,"class") "NULL" WK> x[1] WK> # 1 WK> x[2]
Re: [Rd] incoherent treatment of NULL
Martin Maechler wrote: > > >> more verbously, all NULL objects in R are identical, or as the > >> help page says, there's only ``*The* NULL Object'' in R, > >> i.e., NULL cannot get any attributes. > >> > > WK> yes, but that's not the issue. the issue is that names(x)<- seems to > WK> try to attach an attribute to NULL, while it could, in principle, do > the > WK> same as class(x)<-, i.e., coerce x to some type (and hence attach the > WK> name attribute not to NULL, but to the coerced-to object). > > yes, it could; but really, the fact that 'class<-' works is > the exception. The other variants (with the error message) are > the rule. > ok. > Also, note (here and further below), > that Using "class(.) <- " > is an S3 idiom and S3 classes ``don't really exist'', > the "class" attribute being a useful hack, > and many of us would rather like to work and improve working > with S4 classes (& generics & methods) than to fiddle with 'class<-'. > > In S4, you'd use setClass(.), new(.) and setAs(.), > typically, for defining and changing classes of objects. > > But maybe I have now lead you into a direction I will later > regret, > > when you start telling us about the perceived inconsistencies of > S4 classes, methods, etc. > BTW: If you go there, please do use R 2.9.0 (or newer) > using latest r-devel for the most part. i think you will probably not regret your words; from what i've seen already, s4 classes are the last thing i'd ever try to learn in r. but yes, there would certainly be lots of issues to complain about. i'll rather wait for s5. regards, vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Error in Package Description (PR#13618)
In the Installer for R.8.1 for Mac OSX Tiger or higher, the description of the GNU Fortran package in the customize option writes Fortran as "Fotran." Just a minor error, but should be fixed if revisited. -Bonner Reed Yale Univ. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] matplot and lend="butt"
Hi the list, I am using matplot with the option lend="butt", but only the first line (the black) is printed correctly : > matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt") Is it a bug ? I am using R2.8.1 under windows XP pack3. Christophe __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] matplot and lend="butt"
It looks to be a bug. Here is the code and notice that ... is passed to plot (which plots the first series) but not to lines (which plots the rest): if (!add) { ii <- ii[-1] plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1], pch = pch[1], col = col[1], cex = cex[1], bg = bg[1], ...) } for (i in ii) { lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i], pch = pch[i], col = col[i], cex = cex[i], bg = bg[i]) } This is from 2.8.1 patched but I noticed the same thing in "R version 2.9.0 Under development (unstable) (2009-03-02 r48041)" On Mon, Mar 23, 2009 at 6:25 PM, Christophe Genolini wrote: > Hi the list, > > I am using matplot with the option lend="butt", but only the first line (the > black) is printed correctly : > >> matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt") > > Is it a bug ? > I am using R2.8.1 under windows XP pack3. > > Christophe > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] matplot does not considere the parametre lend (PR#13619)
Full_Name: Christophe Genolini Version: 2.8.1, but also 2.9 OS: Windows XP Submission from: (NULL) (82.225.59.146) I am using matplot with the option lend="butt", but only the first line (the black) is printed correctly : > matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt") Gabor Grothendieck find the problem in matplot code: the ... is passed to plot (which plots the first series) but not to lines (which plots the rest): if (!add) { ii <- ii[-1] plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1], pch = pch[1], col = col[1], cex = cex[1], bg = bg[1], ...) } for (i in ii) { lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i], pch = pch[i], col = col[i], cex = cex[i], bg = bg[i]) } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] savePlot export "strange" eps (PR#13620)
Full_Name: Christophe Genolini Version: 2.8.1 OS: Windows XP Submission from: (NULL) (82.225.59.146) savePlot export "eps" graph that seems to be incorrect. Trying to incorporate them in a LaTeX file, I get : ++ Cannot determine size of graphics in foo.eps (no BoundingBox) -- Trying to open them with GSview, I get : ++ GSview 4.9 2007-11-18 AFPL Ghostscript 8.54 (2006-05-17) Copyright (C) 2005 artofcode LLC, Benicia, CA. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. Displaying non DSC file C:/Documents and Settings/Christophe/Mes documents/Recherche/Trajectoires/kmeal/trajectories/testsDev/toti.eps Error: /undefined in Operand stack: Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- false 1 %stopped_push 1 3 %oparray_pop 1 3 %oparray_pop 1 3 %oparray_pop 1 3 %oparray_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Dictionary stack: --dict:1130/1686(ro)(G)-- --dict:0/20(G)-- --dict:74/200(L)-- Current allocation mode is local Last OS error: No such file or directory --- Begin offending input --- L z f C fC EMF $6 7 l ° ° G r a p h A p p % % % % % % % % % % % % K @ 0 N N y @ N N y @ % % : _ 8 8 8 % ; l * 6 Z õ < @ f ï ` 0 % ( % % K @ 0 N N y @ N N y @ % % : _ 8 8 8 % ; m ñ 6 Z » < @ g µ ` ÷ % ( % % K @ 0 ¡ ¡ ¡ ¡ % % : _ 8 8 8 --- End offending input --- file offset = 1024 gsapi_run_string_continue returns -101 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] matplot does not considere the parametre lend (PR#13619)
On 23/03/2009 7:25 PM, cgeno...@u-paris10.fr wrote: Full_Name: Christophe Genolini Version: 2.8.1, but also 2.9 OS: Windows XP Submission from: (NULL) (82.225.59.146) I am using matplot with the option lend="butt", but only the first line (the black) is printed correctly : matplot(matrix(1:9,3),type="c",lwd=10,lty=1,lend="butt") I'd call this another case where it is performing as documented, but should probably be changed (but not by me). In the meantime, there's the simple workaround: save <- par(lend="butt") matplot(matrix(1:9,3),type="c",lwd=10,lty=1) par(save) Duncan Murdoch Gabor Grothendieck find the problem in matplot code: the ... is passed to plot (which plots the first series) but not to lines (which plots the rest): if (!add) { ii <- ii[-1] plot(x[, 1], y[, 1], type = type[1], xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, lty = lty[1], lwd = lwd[1], pch = pch[1], col = col[1], cex = cex[1], bg = bg[1], ...) } for (i in ii) { lines(x[, i], y[, i], type = type[i], lty = lty[i], lwd = lwd[i], pch = pch[i], col = col[i], cex = cex[i], bg = bg[i]) } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dput(as.list(function...)...) bug
Stavros Macrakis wrote: > Tested in R 2.8.1 Windows > > >> ff <- formals(function(x)1) >> ff1 <- as.list(function(x)1)[1] >> > # ff1 acts the same as ff in the examples below, but is a list rather > than a pairlist > > >> dput( ff , control=c("warnIncomplete")) >> > list(x = ) > > This string is not parsable, but dput does not give a warning as specified. > > same in 2.10.0 r48200, ubuntu 8.04 linux 32 bit >> dput( ff , control=c("all","warnIncomplete")) >> > list(x = quote()) > likewise. > This string is parseable, but quote() is not evaluable, and again dput > does not give a warning as specified. > > In fact, I don't know how to write out ff$x. It appears to be the > zero-length name: > > is.name(ff$x) => TRUE > as.character(ff$x) => "" > > but there is no obvious way to create such an object: > > as.name("") => execution error > quote(``) => parse error > > The above examples should either produce a parseable and evaluable > output (preferable), or give a warning. > interestingly, quote(NULL) # NULL as.name(NULL) # Error in as.name(NULL) : # invalid type/length (symbol/0) in vector allocation æsj. vQ > -s > > PS As a matter of comparative linguistics, many versions of Lisp allow > zero-length symbols/names. But R coerces strings to symbols/names in > a way that Lisp does not, so that might be an invitation to obscure > bugs in R where it is rarely problematic in Lisp. > > PPS dput(pairlist(23),control="all") also gives the same output as > dput(list(23),control="all"), but as I understand it, pairlists will > become non-user-visible at some point. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- --- Wacek Kusnierczyk, MD PhD Email: w...@idi.ntnu.no Phone: +47 73591875, +47 72574609 Department of Computer and Information Science (IDI) Faculty of Information Technology, Mathematics and Electrical Engineering (IME) Norwegian University of Science and Technology (NTNU) Sem Saelands vei 7, 7491 Trondheim, Norway Room itv303 Bioinformatics & Gene Regulation Group Department of Cancer Research and Molecular Medicine (IKM) Faculty of Medicine (DMF) Norwegian University of Science and Technology (NTNU) Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway Room 231.05.060 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] dput(as.list(function...)...) bug
Tested in R 2.8.1 Windows > ff <- formals(function(x)1) > ff1 <- as.list(function(x)1)[1] # ff1 acts the same as ff in the examples below, but is a list rather than a pairlist > dput( ff , control=c("warnIncomplete")) list(x = ) This string is not parsable, but dput does not give a warning as specified. > dput( ff , control=c("all","warnIncomplete")) list(x = quote()) This string is parseable, but quote() is not evaluable, and again dput does not give a warning as specified. In fact, I don't know how to write out ff$x. It appears to be the zero-length name: is.name(ff$x) => TRUE as.character(ff$x) => "" but there is no obvious way to create such an object: as.name("") => execution error quote(``) => parse error The above examples should either produce a parseable and evaluable output (preferable), or give a warning. -s PS As a matter of comparative linguistics, many versions of Lisp allow zero-length symbols/names. But R coerces strings to symbols/names in a way that Lisp does not, so that might be an invitation to obscure bugs in R where it is rarely problematic in Lisp. PPS dput(pairlist(23),control="all") also gives the same output as dput(list(23),control="all"), but as I understand it, pairlists will become non-user-visible at some point. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] variance/mean
(this post suggests a patch to the sources, so i allow myself to divert it to r-devel) Bert Gunter wrote: > x a numeric vector, matrix or data frame. > y NULL (default) or a vector, matrix or data frame with compatible > dimensions to x. The default is equivalent to y = x (but more efficient). > > bert points to an interesting fragment of ?var: it suggests that computing var(x) is more efficient than computing var(x,x), for any x valid as input to var. indeed: set.seed(0) x = matrix(rnorm(1), 100, 100) library(rbenchmark) benchmark(replications=1000, columns=c('test', 'elapsed'), var(x), var(x, x)) #test elapsed # 1var(x) 1.091 # 2 var(x, x) 2.051 that's of course, so to speak, unreasonable: for what var(x) does is actually computing the covariance of x and x, which should be the same as var(x,x). the hack is that if y is given, there's an overhead of memory allocation for *both* x and y when y is given, as seen in src/main/cov.c:720+. incidentally, it seems that the problem can be solved with a trivial fix (see the attached patch), so that set.seed(0) x = matrix(rnorm(1), 100, 100) library(rbenchmark) benchmark(replications=1000, columns=c('test', 'elapsed'), var(x), var(x, x)) #test elapsed # 1var(x) 1.121 # 2 var(x, x) 1.107 with the quick checks all.equal(var(x), var(x, x)) # TRUE all(var(x) == var(x, x)) # TRUE and for cor it seems to make cor(x,x) slightly faster than cor(x), while originally it was twice slower: # original benchmark(replications=1000, columns=c('test', 'elapsed'), cor(x), cor(x, x)) #test elapsed # 1cor(x) 1.196 # 2 cor(x, x) 2.253 # patched benchmark(replications=1000, columns=c('test', 'elapsed'), cor(x), cor(x, x)) #test elapsed # 1cor(x) 1.207 # 2 cor(x, x) 1.204 (there is a visible penalty due to an additional pointer test, but it's 10ms on 1000 replications with 1 data points, which i think is negligible.) > This is as clear as I would know how to state. i believe bert is right. however, with the above fix, this can now be rewritten as: " x: a numeric vector, matrix or data frame. y: a vector, matrix or data frame with dimensions compatible to those of x. By default, y = x. " which, to my simple mind, is even more clear than what bert would know how to state, and less likely to cause the sort of confusion that originated this thread. the attached patch suggests modifications to src/main/cov.c and src/library/stats/man/cor.Rd. it has been prepared and checked as follows: svn co https://svn.r-project.org/R/trunk trunk cd trunk # edited the sources svn diff > cov.diff svn revert -R src patch -p0 < cov.diff tools/rsync-recommended ./configure make make check bin/R # subsequent testing within R if you happen to consider this patch for a commit, please be sure to examine and test it carefully first. vQ Index: src/library/stats/man/cor.Rd === --- src/library/stats/man/cor.Rd (revision 48200) +++ src/library/stats/man/cor.Rd (working copy) @@ -6,9 +6,9 @@ \name{cor} \title{Correlation, Variance and Covariance (Matrices)} \usage{ -var(x, y = NULL, na.rm = FALSE, use) +var(x, y = x, na.rm = FALSE, use) -cov(x, y = NULL, use = "everything", +cov(x, y = x, use = "everything", method = c("pearson", "kendall", "spearman")) cor(x, y = NULL, use = "everything", @@ -32,9 +32,7 @@ } \arguments{ \item{x}{a numeric vector, matrix or data frame.} - \item{y}{\code{NULL} (default) or a vector, matrix or data frame with -compatible dimensions to \code{x}. The default is equivalent to -\code{y = x} (but more efficient).} + \item{y}{a vector, matrix or data frame with dimensions compatible to those of \code{x}. By default, y = x.} \item{na.rm}{logical. Should missing values be removed?} \item{use}{an optional character string giving a method for computing covariances in the presence Index: src/main/cov.c === --- src/main/cov.c (revision 48200) +++ src/main/cov.c (working copy) @@ -689,7 +689,7 @@ if (ansmat) PROTECT(ans = allocMatrix(REALSXP, ncx, ncy)); else PROTECT(ans = allocVector(REALSXP, ncx * ncy)); sd_0 = FALSE; -if (isNull(y)) { +if (isNull(y) || (DATAPTR(x) == DATAPTR(y))) { if (everything) { /* NA's are propagated */ PROTECT(xm = allocVector(REALSXP, ncx)); PROTECT(ind = allocVector(LGLSXP, ncx)); __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dput(as.list(function...)...) bug
On 23/03/2009 7:37 PM, Stavros Macrakis wrote: Tested in R 2.8.1 Windows ff <- formals(function(x)1) ff1 <- as.list(function(x)1)[1] # ff1 acts the same as ff in the examples below, but is a list rather than a pairlist dput( ff , control=c("warnIncomplete")) list(x = ) This string is not parsable, but dput does not give a warning as specified. That's not what "warnIncomplete" is documented to do. The docs (in ?.deparseOpts) say 'warnIncomplete' Some exotic objects such as environments, external pointers, etc. can not be deparsed properly. This option causes a warning to be issued if any of those may give problems. Also, the parser in R < 2.7.0 would only accept strings of up to 8192 bytes, and this option gives a warning for longer strings. As far as I can see, none of those conditions apply here: ff is not one of those exotic objects or a very long string. The really relevant comment is in the dput documentation: "Deparsing an object is difficult, and not always possible." Yes, it would be nice if deparsing and parsing were mutual inverses, but they're not, and are documented not to be. dput( ff , control=c("all","warnIncomplete")) list(x = quote()) This string is parseable, but quote() is not evaluable, and again dput does not give a warning as specified. In fact, I don't know how to write out ff$x. I don't know of any input that will parse to it. It appears to be the zero-length name: is.name(ff$x) => TRUE as.character(ff$x) => "" This may give you a hint: > y <- ff$x > y Error: argument "y" is missing, with no default It's a special internal thing that triggers the missing value error when evaluated. It probably shouldn't be user visible at all. Duncan Murdoch but there is no obvious way to create such an object: as.name("") => execution error quote(``) => parse error The above examples should either produce a parseable and evaluable output (preferable), or give a warning. -s PS As a matter of comparative linguistics, many versions of Lisp allow zero-length symbols/names. But R coerces strings to symbols/names in a way that Lisp does not, so that might be an invitation to obscure bugs in R where it is rarely problematic in Lisp. PPS dput(pairlist(23),control="all") also gives the same output as dput(list(23),control="all"), but as I understand it, pairlists will become non-user-visible at some point. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dput(as.list(function...)...) bug
> -Original Message- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch > Sent: Monday, March 23, 2009 5:28 PM > To: Stavros Macrakis > Cc: r-devel@r-project.org > Subject: Re: [Rd] dput(as.list(function...)...) bug > > On 23/03/2009 7:37 PM, Stavros Macrakis wrote: > > Tested in R 2.8.1 Windows > > > >> ff <- formals(function(x)1) > >> ff1 <- as.list(function(x)1)[1] > > # ff1 acts the same as ff in the examples below, but is a > list rather > > than a pairlist > > > >> dput( ff , control=c("warnIncomplete")) > > list(x = ) > > > > This string is not parsable, but dput does not give a > warning as specified. The string "list(x = )" is parsable: z <- parse(text="list(x = )") Evaluating the resulting expression results in a run-time error: eval(z) Error in eval(expr, envir, enclos) : element 1 is empty; the part of the args list of 'list' being evaluated was: (x = ) That is the same sort of error you get from running list(,): list wants all of its arguments to be present. With other functions such a construct will run in R, although its result does not match that of S+ (or SV4): > f<-function(x,y,z)c(x=if(missing(x))""else x, y=if(missing(y))"" else y, z=if(missing(z))"" else z) R> f(x=,2,3) x y z "2" "3" "" S+> f(x=,2,3) x y z "" "2" "3" or R> f(y=,1,3) x y z "1" "3" "" S+> f(y=,1,3) x y z "1" "" "3" R and S+ act the same if you skip an argument by position > f(1,,3) x y z "1" "" "3" but differ if you use name=: in S+ it skips an argument by name and in R it is ignored by ordinary functions (where typeof(func)=="closure"). I wouldn't say this is recommended or often used or the point of the original post. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > That's not what "warnIncomplete" is documented to do. The docs (in > ?.deparseOpts) say > > 'warnIncomplete' Some exotic objects such as environments, >external pointers, etc. can not be deparsed properly. This >option causes a warning to be issued if any of > those may give >problems. > >Also, the parser in R < 2.7.0 would only accept > strings of up >to 8192 bytes, and this option gives a warning for longer >strings. > > As far as I can see, none of those conditions apply here: ff > is not one > of those exotic objects or a very long string. The really relevant > comment is in the dput documentation: > > "Deparsing an object is difficult, and not always possible." > > Yes, it would be nice if deparsing and parsing were mutual > inverses, but > they're not, and are documented not to be. > > > >> dput( ff , control=c("all","warnIncomplete")) > > list(x = quote()) > > > > This string is parseable, but quote() is not evaluable, and > again dput > > does not give a warning as specified. > > > > In fact, I don't know how to write out ff$x. > > I don't know of any input that will parse to it. > > > It appears to be the > > zero-length name: > > > > is.name(ff$x) => TRUE > > as.character(ff$x) => "" > > This may give you a hint: > > > y <- ff$x > > y > Error: argument "y" is missing, with no default > > It's a special internal thing that triggers the missing value > error when > evaluated. It probably shouldn't be user visible at all. > > Duncan Murdoch > > > > > but there is no obvious way to create such an object: > > > > as.name("") => execution error > > quote(``) => parse error > > > > The above examples should either produce a parseable and evaluable > > output (preferable), or give a warning. > > > > -s > > > > PS As a matter of comparative linguistics, many versions of > Lisp allow > > zero-length symbols/names. But R coerces strings to > symbols/names in > > a way that Lisp does not, so that might be an invitation to obscure > > bugs in R where it is rarely problematic in Lisp. > > > > PPS dput(pairlist(23),control="all") also gives the same output as > > dput(list(23),control="all"), but as I understand it, pairlists will > > become non-user-visible at some point. > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] variance/mean
Doesn't Fortran still require that the arguments to a function not alias each other (in whole or in part)? I could imagine that var() might call into Fortran code (BLAS or LAPACK). Wouldn you want to chance erroneous results at a high optimization level to save a bit of time in an unusual situation? (I could also imagine someone changing the R interpreter so that x and x[-length(x)] could share the same memory block and that could cause Fortran aliasing problems as well.) Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -Original Message- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk > Sent: Monday, March 23, 2009 4:40 PM > To: r-devel@r-project.org > Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter > Subject: Re: [Rd] [R] variance/mean > > > (this post suggests a patch to the sources, so i allow myself > to divert > it to r-devel) > > Bert Gunter wrote: > > x a numeric vector, matrix or data frame. > > y NULL (default) or a vector, matrix or data frame with compatible > > dimensions to x. The default is equivalent to y = x (but > more efficient). > > > > > bert points to an interesting fragment of ?var: it suggests that > computing var(x) is more efficient than computing var(x,x), for any x > valid as input to var. indeed: > > set.seed(0) > x = matrix(rnorm(1), 100, 100) > > library(rbenchmark) > benchmark(replications=1000, columns=c('test', 'elapsed'), >var(x), >var(x, x)) > #test elapsed > # 1var(x) 1.091 > # 2 var(x, x) 2.051 > > that's of course, so to speak, unreasonable: for what var(x) does is > actually computing the covariance of x and x, which should be the same > as var(x,x). > > the hack is that if y is given, there's an overhead of memory > allocation > for *both* x and y when y is given, as seen in src/main/cov.c:720+. > incidentally, it seems that the problem can be solved with a > trivial fix > (see the attached patch), so that > > set.seed(0) > x = matrix(rnorm(1), 100, 100) > > library(rbenchmark) > benchmark(replications=1000, columns=c('test', 'elapsed'), >var(x), >var(x, x)) > #test elapsed > # 1var(x) 1.121 > # 2 var(x, x) 1.107 > > with the quick checks > > all.equal(var(x), var(x, x)) > # TRUE > > all(var(x) == var(x, x)) > # TRUE > > and for cor it seems to make cor(x,x) slightly faster than > cor(x), while > originally it was twice slower: > > # original > benchmark(replications=1000, columns=c('test', 'elapsed'), >cor(x), >cor(x, x)) > #test elapsed > # 1cor(x) 1.196 > # 2 cor(x, x) 2.253 > > # patched > benchmark(replications=1000, columns=c('test', 'elapsed'), >cor(x), >cor(x, x)) > #test elapsed > # 1cor(x) 1.207 > # 2 cor(x, x) 1.204 > > (there is a visible penalty due to an additional pointer > test, but it's > 10ms on 1000 replications with 1 data points, which i think is > negligible.) > > > This is as clear as I would know how to state. > > i believe bert is right. > > however, with the above fix, this can now be rewritten as: > > " > x: a numeric vector, matrix or data frame. > y: a vector, matrix or data frame with dimensions compatible > to those of x. > By default, y = x. > " > > which, to my simple mind, is even more clear than what bert would know > how to state, and less likely to cause the sort of confusion that > originated this thread. > > the attached patch suggests modifications to src/main/cov.c and > src/library/stats/man/cor.Rd. > it has been prepared and checked as follows: > > svn co https://svn.r-project.org/R/trunk trunk > cd trunk > # edited the sources > svn diff > cov.diff > svn revert -R src > patch -p0 < cov.diff > > tools/rsync-recommended > ./configure > make > make check > bin/R > # subsequent testing within R > > if you happen to consider this patch for a commit, please be sure to > examine and test it carefully first. > > vQ > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] variance/mean
Oops, I was thinking backwards. This sort of hack could avoid the Fortran aliasing rules, not run afoul of them. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -Original Message- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of William Dunlap > Sent: Monday, March 23, 2009 6:18 PM > To: Wacek Kusnierczyk; r-devel@r-project.org > Subject: Re: [Rd] [R] variance/mean > > Doesn't Fortran still require that the arguments to > a function not alias each other (in whole or in part)? > I could imagine that var() might call into Fortran code > (BLAS or LAPACK). Wouldn you want to chance erroneous > results at a high optimization level to save a bit of > time in an unusual situation? > > (I could also imagine someone changing the R interpreter > so that x and x[-length(x)] could share the same memory > block and that could cause Fortran aliasing problems as > well.) > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > > -Original Message- > > From: r-devel-boun...@r-project.org > > [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek > Kusnierczyk > > Sent: Monday, March 23, 2009 4:40 PM > > To: r-devel@r-project.org > > Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter > > Subject: Re: [Rd] [R] variance/mean > > > > > > (this post suggests a patch to the sources, so i allow myself > > to divert > > it to r-devel) > > > > Bert Gunter wrote: > > > x a numeric vector, matrix or data frame. > > > y NULL (default) or a vector, matrix or data frame with compatible > > > dimensions to x. The default is equivalent to y = x (but > > more efficient). > > > > > > > > bert points to an interesting fragment of ?var: it suggests that > > computing var(x) is more efficient than computing var(x,x), > for any x > > valid as input to var. indeed: > > > > set.seed(0) > > x = matrix(rnorm(1), 100, 100) > > > > library(rbenchmark) > > benchmark(replications=1000, columns=c('test', 'elapsed'), > >var(x), > >var(x, x)) > > #test elapsed > > # 1var(x) 1.091 > > # 2 var(x, x) 2.051 > > > > that's of course, so to speak, unreasonable: for what > var(x) does is > > actually computing the covariance of x and x, which should > be the same > > as var(x,x). > > > > the hack is that if y is given, there's an overhead of memory > > allocation > > for *both* x and y when y is given, as seen in src/main/cov.c:720+. > > incidentally, it seems that the problem can be solved with a > > trivial fix > > (see the attached patch), so that > > > > set.seed(0) > > x = matrix(rnorm(1), 100, 100) > > > > library(rbenchmark) > > benchmark(replications=1000, columns=c('test', 'elapsed'), > >var(x), > >var(x, x)) > > #test elapsed > > # 1var(x) 1.121 > > # 2 var(x, x) 1.107 > > > > with the quick checks > > > > all.equal(var(x), var(x, x)) > > # TRUE > > > > all(var(x) == var(x, x)) > > # TRUE > > > > and for cor it seems to make cor(x,x) slightly faster than > > cor(x), while > > originally it was twice slower: > > > > # original > > benchmark(replications=1000, columns=c('test', 'elapsed'), > >cor(x), > >cor(x, x)) > > #test elapsed > > # 1cor(x) 1.196 > > # 2 cor(x, x) 2.253 > > > > # patched > > benchmark(replications=1000, columns=c('test', 'elapsed'), > >cor(x), > >cor(x, x)) > > #test elapsed > > # 1cor(x) 1.207 > > # 2 cor(x, x) 1.204 > > > > (there is a visible penalty due to an additional pointer > > test, but it's > > 10ms on 1000 replications with 1 data points, which i think is > > negligible.) > > > > > This is as clear as I would know how to state. > > > > i believe bert is right. > > > > however, with the above fix, this can now be rewritten as: > > > > " > > x: a numeric vector, matrix or data frame. > > y: a vector, matrix or data frame with dimensions compatible > > to those of x. > > By default, y = x. > > " > > > > which, to my simple mind, is even more clear than what bert > would know > > how to state, and less likely to cause the sort of confusion that > > originated this thread. > > > > the attached patch suggests modifications to src/main/cov.c and > > src/library/stats/man/cor.Rd. > > it has been prepared and checked as follows: > > > > svn co https://svn.r-project.org/R/trunk trunk > > cd trunk > > # edited the sources > > svn diff > cov.diff > > svn revert -R src > > patch -p0 < cov.diff > > > > tools/rsync-recommended > > ./configure > > make > > make check > > bin/R > > # subsequent testing within R > > > > if you happen to consider this patch for a commit, please be sure to > > examine and test it carefully first. > > > > vQ > > > > _