[Rd] all.equal() improvements (PR#8191)
--k1lZvvs/B4yU6o8G Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The attached patch against R 2.2.0 makes the following improvements to the all.equal() function: 1. Check names! Stock R all.equal() (unlike S-Plus) ignores names completely on some objects. I consider this wrong - if the names are different, the object is NOT "the same". 2. When a difference is detected, return better output to help the user understand just WHAT is different. Further details are included in the code comments, but in particular, all.equal.list() is much enhanced. By default it still checks list values by postion rather than name, as that behavior is more strict and thus more correct. But when using the by.name="auto" and by.pos=TRUE options (which are the defaults), in addition to by-positing differences, all.equal.list() now also reports by-name differences in those places (and only those places) where doing so should be helpful to the user. Also, optionally, using by.name=TRUE and by.pos=FALSE will give behavior like S-Plus. The attached patch is also available here: http://www.piskorski.com/R/patches/all-equal-patch-20051009.txt If you want to see the entire file rather than a patch against R 2.2.0, that is also available here: http://www.piskorski.com/R/patches/all.equal.R -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ --k1lZvvs/B4yU6o8G Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="all-equal-patch-20051009.txt" Index: all.equal.R === RCS file: /home/cvsroot/dtk/Splus/patches/all.equal.R,v retrieving revision 1.1.1.1 retrieving revision 1.4 diff -u -r1.1.1.1 -r1.4 --- all.equal.R 1 Oct 2005 06:51:06 - 1.1.1.1 +++ all.equal.R 1 Oct 2005 13:10:25 - 1.4 @@ -1,4 +1,75 @@ -all.equal <- function(target, current, ...) UseMethod("all.equal") +# +# This is a copy of "src/library/base/R/all.equal.R" from +# "R-beta_2005-09-24_r35666.tar.gz", plus our modifications. (The +# all.equal.R in that tarball seems to be unchanged at least as far +# back as R 2.1.0.) +# +# Further detail is in the comments in each function, but basically, +# all modifications here involve either of two sorts of improvements: +# +# 1. Check names! Stock R all.equal() (unlike S-Plus) ignores names +#completely on some objects. I consider this bogus, if the names +#are different, the object is NOT "the same". +# +# 2. When the object is different, return more output to help the user +#understand just WHAT is different. +# +# Note: Here in our patches package, we purposely CVS import and than +# override ALL the base all.equal() methods, NOT just the ones we're +# actually modifying. At first I tried only overriding some of them, +# but in that case, even though package:patches was earlier on the +# search path than package:base, base methods appeared to +# preferentially call the original base versions, rather than the +# patches versions that I wanted. So, big hammer it, override +# everything - which will probably make it easier to contribute these +# improvements back to stock R anyway. +# +# [EMAIL PROTECTED], 2005/10/01 02:29 EDT +# +# $Id: all.equal.R,v 1.4 2005/10/01 13:10:25 andy Exp $ + + +# In S-Plus, all.equal() prefers to index objects by name, while in +# stock R, it prefers to index by position. IMO, *NEITHER* of those +# behaviors are fully correct. What we really want is to compare +# things BOTH by name and by position. +# +# Here's ONE example of the effect of these R patches: +# +## S-Plus 6.2, no patches to all.equal(): +#> all.equal(list(a=2,2,x=3,zap=1,foo=42,"NA"=T) ,list(b=1,2,y=4,foo=7,zap=1,"NA"=F)) +#[1] "Names: 4 string mismatches" +#[2] "Components not in target: b, y" +#[3] "Components not in current: a, x" +#[4] "Component foo: Mean relative difference: 0.833" +#[5] "Component NA: Mean relative difference: 1" +# +## R 2.1.0, no patches to all.equal(): +#> all.equal(list(a=2,2,x=3,zap=1,foo=42,"NA"=T) ,list(b=1,2,y=4,foo=7,zap=1,"NA"=F)) +#[1] "Names: 4 string mismatches" +#[2] "Component 1: Mean relative difference: 0.5" +#[3] "Component 3: Mean relative difference: 0.333" +#[4] "Component 4: Mean relative difference: 6" +#[5] "Component 5: Mean relative difference: 0.9761905" +#[6] "Component 6: Mean relative difference: 1" +# +## R 2.1.0 with our patches here: +#> all.equal(list(a=2,2,x=3,zap=1,foo=42,"NA"=T) ,list(b=1,2,y=4,foo=7,zap=1,"NA"=F)) +# [1] "Names: 4 string mismatches" +# [2] "Components not in target: b, y" +# [3] "Components not in current: a, x" +# [4] "Component foo: Mean relative difference: 0.833" +# [5] "Component NA: Mean relative difference: 1" +# [6] "Component 1: Mean relative dif
[Rd] [ subscripting sometimes loses names (PR#8192)
d with this function. } else { if (!missing(drop) && drop && length(nrow(result)) > 0 && nrow(result)==1) { #replicate documented behavior of [.data.frame: drop=T acts #differently then missing drop arg for this case! result <- as.list(result) } } } else { result <- data.frame.original.fcn(x, ..., drop=F) } result } } else { # For R: # First make sure that if you run this twice, you still get the # real original function: # Also remove the obnoxious "drop argument will be ignored" warning # entirely from the function. I would like to regsub out the whole # warning() call, but I can't seem to get that to work. So, just # replace the first warning() call with a call to our dtk.null() # function which does nothing. Fortunately, the warning() call we # want to get rid of is indeed the first (actually the only) one: # [EMAIL PROTECTED], 2005/07/01 17:53 EDT brace.original.fcn <- get("[",pos="package:base") data.frame.original.fcn.0 <- get("[.data.frame",pos="package:base") data.frame.original.fcn <- data.frame.original.fcn.0 body(data.frame.original.fcn) <- parse(text=sub('warning(..?drop argument will be ignored..?)' ,'dtk.null()' ,deparse(body(data.frame.original.fcn.0)) ,ignore.case=T)) # For R (at least version 2.1.0) we need to override BOTH the # "[.data.frame" and "[" functions. # [EMAIL PROTECTED], 2005/07/01 10:11 EDT "[.data.frame" <- function(x ,i ,j ,... ,drop=T) { # The stock R default value for the drop arg is: # drop=(if(missing(i)) TRUE else length(names(x)) == 1) # However, that DOES cause certain differences from S-Plus, so # we do NOT use it. [EMAIL PROTECTED], 2005/07/01 13:18 EDT # TODO: Does above S-Plus problem with lm() also apply here? # [EMAIL PROTECTED], 2005/07/01 10:51 EDT class.x <- class(x) caller <- sys.call(sys.parent())[[2]] if (length(class.x)==1 && class.x=="data.frame" && (mode(caller) != "name" || (caller != "value"))) { # If caller is a name and it is "value", then it is the # lhs case that we just want the original fcn to handle: code <- 'data.frame.original.fcn(x,' if (!missing(i))code <- paste(code ,'i' ,sep="") if (length(dim(x)) > 1 && (missing(i) || length(dim(i)) <= 1)) code <- paste(code ,',' ,sep="") if (!missing(j))code <- paste(code ,'j' ,sep="") if (!missing(...)) code <- paste(code ,',...' ,sep="") code <- paste(code ,',drop=F' ,sep="") code <- paste(code ,')' ,sep="") #cat("Debug: code to eval: ") ; print(code) result <- eval(parse(text=code)) if (drop && length(ncol(result) > 0) && ncol(result)==1) { save.names <- dimnames(result)[[1]] #this approach works for factors too result <- result[[1]] names(result) <- save.names # TODO: Unfortunately still broken for objects with new # style classes, since it does not distinguish among # methods that have or do not have a getnames method. # library(missing) is an example: The multiple imputations # on an object get lost if subscripted with this function. } else { if (!missing(drop) && drop && length(nrow(result)) > 0 && nrow(result)==1) { #replicate documented behavior of [.data.frame: drop=T acts #differently then missing drop arg for this case! result <- as.list(result) } } } else { if (missing(i)) result <- data.frame.original.fcn(x ,... ,drop=F) else if (missing(j)) result <- data.frame.original.fcn(x ,i ,... ,drop=F) else result <- data.frame.original.fcn(x ,i ,j ,... ,drop=F) } result } # R has this problem with NA names: # # # S-Plus 6.2.1: # > foo <- c("a"=1,"b"=2,"c"=3) # > foo[c("a","c","atp")] #a c atp #1 3 NA # # # R 2.0.0, or 2.1.0: # > foo <- c("a"=1,"b"=2,"c"=3) # > foo[c("a","c","atp")] # ac # 13 NA # # This is very very bad, it causes sof
Re: [Rd] [ subscripting sometimes loses names (PR#8192)
On Wed, Oct 19, 2005 at 02:33:50PM +0200, Martin Maechler wrote: > Proper R bug reports provide short "cut & paste" executable > example code {i.e. no prompt, no output} or at least the > transcript of such code {transcript : input (+ prompt) + output}. My patch includes the function dtk.test.brace.names() which demonstrates the problem. If you source just that function into a completely stock R, you can see the losing names problem by running: dtk.test.brace.names(return.results.p=T ,only="all") To make it easier to see just what the problem is, I'll send example output in my next email. > Also your script is for R and S-plus and at least in some places > it seems you think R has a bug because it behaves differently > than S or S-plus. No, I don't think that. If comments in my code give that impression then that's a bug in my comments, it was not my intention. My coworkers and I originally fixed the name losing problem in S-Plus, then later did so in R, so in some places I might have sloppily said, "R is different than S-Plus" when what I REALLY meant was, "Stock R is different than our fixed/patched S-Plus where we've already solved these name-losing problems." Stock S-Plus and R both suffer from losing names when they shouldn't. Since I use both dialects, I've included (ugly) fixes for both. Of course you probably only care about the R part, but I didn't think it would hurt to include both. > Now I'm sure you know from the R-FAQ that there are quite a few > intentional differences between the two dialects of S, Yes, I'm aware of that FAQ. I also just finished porting a large body of code from S-Plus to R a few months ago, so I have a very concrete appreciation of the MANY little S-Plus vs. R differences, many more than are mentioned in that FAQ. Some of those differences are simply arbitrary or accidental, but others are places where S-Plus was basically doing something dumb and the R behavior is better. I have no complaints about this. :) (The converse, where R's behavior is definitely inferior to that of S-Plus, seems to be a lot less common, and are usually more minor.) -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [ subscripting sometimes loses names (PR#8192)
Here is an example of the losing names problem in stock R 2.2.0. Note that below, only stock R packages are loaded, and then I manually source in just my dtk.test.brace.names() testing function, nothing else. Since the list-of-lists output of dtk.test.brace.names() is very lengthy, I've manually cut-and-pasted it into a tabular format to save space and make inspection easier. As you can see, out of its 15 test cases, stock R 2.2.0 fails 4 of them while the other 12 are Ok. Too see what these simple subscripting tests actually DO, please refer to the body of dtk.test.brace.names() from my previous emails above. R : Copyright 2005, The R Foundation for Statistical Computing Version 2.2.0 (2005-10-06 r35749) > search() [1] ".GlobalEnv""package:methods" "package:graphics" [4] "package:grDevices" "package:datasets" "package:utils" [7] "package:stats" "Autoloads" "package:base" > dtk.test.brace.names(return.results.p=T ,only="all") Ok? Actual Result Desired Result --- ---- $vec.1 BAD $vec.1[[1]] $vec.1[[2]] ac a c no 13 NA 1 3 NA $diag.1 Ok $diag.1[[1]] $diag.1[[2]] [1] 1 7 13 19 25[1] 1 7 13 19 25 $diag.2 Ok $diag.2[[1]] $diag.2[[2]] [1] 1 7 13 19 25[1] 1 7 13 19 25 $df.a.1 Ok $df.a.1[[1]] $df.a.1[[2]] a b a b 4 5 4 5 $df.b.1 BAD $df.b.1[[1]] $df.b.1[[2]] [1] 4 5 a b 4 5 $df.a.2 Ok $df.a.2[[1]] $df.a.2[[2]] c b a c b a 6 5 4 6 5 4 $df.b.2 BAD $df.b.2[[1]] $df.b.2[[2]] [1] 6 5 4 c b a 6 5 4 $df.a.3 Ok $df.a.3[[1]] $df.a.3[[2]] a b a b 3 4 3 4 $df.b.3 BAD $df.b.3[[1]] $df.b.3[[2]] [1] 3 4 a b 3 4 $df.a.4 Ok $df.a.4[[1]] $df.a.4[[2]] col1 col2 col1 col2 2424 $df.b.4 Ok $df.b.4[[1]] $df.b.4[[2]] col1 col2 col1 col2 b24 b24 $df.a.5 Ok $df.a.5[[1]] $df.a.5[[2]] col1 col2 col1 col2 2424 $df.b.5 Ok $df.b.5[[1]] $df.b.5[[2]] $df.b.5[[1]]$col1 $df.b.5[[2]]$col1 [1] 2 [1] 2 $df.b.5[[1]]$col2 $df.b.5[[2]]$col2 [1] 4 [1] 4 $df.a.6 Ok $df.a.6[[1]] $df.a.6[[2]] col1 col2 col1 col2 b24 b24 $df.b.6 Ok $df.b.6[[1]] $df.b.6[[2]] col1 col2 col1 col2 b24 b24 -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R segfault in fgets from do_system under high memory use (PR#14008)
Full_Name: Andrew Piskorski Version: R 2.9.2 (Patched), 2009-09-24, svn.rev 49930, x86_64-unknown-linux-gnu OS: Linux, Ubuntu 8.04.3 LTS Submission from: (NULL) (66.31.65.247) I have a large memory test case which segaults R everytime in an fgets call from R's do_system (see below). This appears to be because R does not check the return value of the system popen, and I have a simple patch to src/main/sysutils.c and src/unix/sys-unix.c which fixes the problem. I will attempt to attach the patch after submitting this initial bug report. This is on Linux, Ubuntu 8.04.3 LTS with: R 2.9.2 (Patched), 2009-09-24, svn.rev 49930, x86_64-unknown-linux-gnu Below is some further detail on the problem, from BEFORE applying my patch: Valgrind doesn't seem to find anything unusual until an "Invalid read of size 4" in fgets right before it segfaults. Valgrind is also reporting an "Address 0x0 is not stack'd" message there, which I think means that do_system is passing a 0 address to fgets, which is then causing the segfault. Looking at the fgets all in src/unix/sys-unix.c, the buf argument is statically allocated so I don't see how it could be 0. fp, the 3rd argument to fgets, is set by R_popen(). So, I think the system popen() call is failing to fork or allocate memory or whatever it's trying to do, and is returning a NULL. gdb and Valgrind output from the failure follow: Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x7f735647f4fd in fgets () from /lib/libc.so.6 #1 0x7f7356b39fe3 in do_system (call=, op=, args=, rho=) at ../../../src/unix/sys-unix.c:273 #2 0x7f7356aa1c09 in do_internal (call=, op=, args=0xace1d220, env=0xa90e3820) at ../../../src/main/names.c:1150 #3 0x7f7356a6ec11 in Rf_eval (e=0x872638, rho=0xa90e3820) at ../../../src/main/eval.c:461 [...] #90 0x7f7356a92950 in run_Rmainloop () at ../../../src/main/main.c:966 #91 0x0040088b in main (ac=, av=) at ../../../src/main/Rmain.c:33 #92 0x7f735643a1c4 in __libc_start_main () from /lib/libc.so.6 #93 0x004007a9 in _start () (gdb) q ==27499== Invalid read of size 4 ==27499==at 0x55E84FD: fgets (in /lib/libc-2.7.so) ==27499==by 0x4FB6FB2: do_system (sys-unix.c:273) ==27499==by 0x4F1EBD8: do_internal (names.c:1150) ==27499==by 0x4EEBBE0: Rf_eval (eval.c:461) ==27499==by 0x4EEC9D1: do_begin (eval.c:1191) ==27499==by 0x4EEBBE0: Rf_eval (eval.c:461) ==27499==by 0x4EEE34E: Rf_applyClosure (eval.c:667) ==27499==by 0x4EEBAFB: Rf_eval (eval.c:505) ==27499==by 0x4EEC9D1: do_begin (eval.c:1191) ==27499==by 0x4EEBBE0: Rf_eval (eval.c:461) ==27499==by 0x4EEBBE0: Rf_eval (eval.c:461) ==27499==by 0x4EEC9D1: do_begin (eval.c:1191) ==27499== Address 0x0 is not stack'd, malloc'd or (recently) free'd /home/andy/t/vg-R.sh: line 34: 27499 Segmentation fault __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bugs.r-project.org, Submit Changes does nothing (PR#14009)
Full_Name: Andrew Piskorski Version: not applicable, web-based R bug tracker OS: not applicable, web-based R bug tracker Submission from: (NULL) (66.31.65.247) Once I have submitted a bug via the bugs.r-project.org web interface, I can find the bug and view it. E.g.: http://bugs.r-project.org/cgi-bin/R/incoming?id=14008 That shows me an empty "Notes:" box and a button to "Submit Changes". I entered some text (a patch for the above id=14008 segfault) and clicked Submit, but my entry seems to be silently discarded. There is no error message, but the new data I attempted to enter never shows up on the bug report. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] PR#14008, R segfault in fgets from do_system under high memory use
On this issue: http://bugs.r-project.org/cgi-bin/R/incoming?id=14008 Here's a small patch which fixes the problem: $ svn diff src/main/sysutils.c src/unix/sys-unix.c Index: src/main/sysutils.c === --- src/main/sysutils.c (revision 49961) +++ src/main/sysutils.c (working copy) @@ -260,6 +260,9 @@ #else fp = popen(command, type); #endif +if (NULL == fp) { + error(_("popen failed with errno %i: %s"), errno, strerror(errno)); +} return fp; } #endif /* HAVE_POPEN */ Index: src/unix/sys-unix.c === --- src/unix/sys-unix.c (revision 49961) +++ src/unix/sys-unix.c (working copy) @@ -270,6 +270,11 @@ PROTECT(tlist); fp = R_popen(translateChar(STRING_ELT(CAR(args), 0)), x); +if (NULL == fp) { +UNPROTECT(1); +errorcall(call, _("R_popen returned NULL.")); +return R_NilValue; +} for (i = 0; fgets(buf, INTERN_BUFSIZE, fp); i++) { read = strlen(buf); if(read >= INTERN_BUFSIZE - 1) -- Andrew Piskorski http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R trunk (2.7) build fails with -fpic, needs -fPIC (PR#10372)
On Linux x86-64 (Ubuntu 6.06), the latest R sources from the Subversion trunk fail to build with the following "recompile with -fPIC" error: $ ./configure --with-x=yes --prefix=$inst_dir --enable-R-shlib --with-tcltk=/usr/lib/tcl8.4 --with-tcl-config=/usr/lib/tcl8.4/tclConfig.sh $ make /usr/bin/ld: ../appl/approx.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC ../appl/approx.o: could not read symbols: Bad value This is easy to fix by changing 4 lines in the configure script from "-fpic" to -fPIC", as shown in the patch below. I saw this failure on an Intel x86-64 server, running Ubuntu 6.06: $ uname -srvm Linux 2.6.20.4 #1 SMP PREEMPT Sat Mar 31 07:46:01 EDT 2007 x86_64 $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=6.06 DISTRIB_CODENAME=dapper DISTRIB_DESCRIPTION="Ubuntu 6.06.1 LTS" $ grep name /proc/cpuinfo model name : Intel(R) Xeon(TM) CPU 3.80GHz model name : Intel(R) Xeon(TM) CPU 3.80GHz $ dpkg -l libc6 ||/ Name Version Description +++---= ii libc62.3.6-0ubuntu20.4GNU C Library: Shared libraries and Timezone data $ apt-cache show libc6 | grep Architecture | uniq Architecture: amd64 Here's a patch which fixes the problem: $ svn diff configure Index: configure === --- configure (revision 43265) +++ configure (working copy) @@ -32806,7 +32806,7 @@ cpicflags="-fPIC" ;; *) - cpicflags="-fpic" + cpicflags="-fPIC" ;; esac shlib_ldflags="-shared" @@ -32817,7 +32817,7 @@ fpicflags="-fPIC" ;; *) - fpicflags="-fpic" + fpicflags="-fPIC" ;; esac fi @@ -32827,7 +32827,7 @@ cxxpicflags="-fPIC" ;; *) - cxxpicflags="-fpic" + cxxpicflags="-fPIC" ;; esac shlib_cxxldflags="-shared" @@ -47768,7 +47768,7 @@ fcpicflags="-fPIC" ;; *) - fcpicflags="-fpic" + fcpicflags="-fPIC" ;; esac fi -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel