[Rd] all.equal() improvements (PR#8191)

2005-10-09 Thread atp

--k1lZvvs/B4yU6o8G
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

The attached patch against R 2.2.0 makes the following improvements to
the all.equal() function:

1. Check names!  Stock R all.equal() (unlike S-Plus) ignores names
   completely on some objects.  I consider this wrong - if the names
   are different, the object is NOT "the same".

2. When a difference is detected, return better output to help the
   user understand just WHAT is different.

Further details are included in the code comments, but in particular,
all.equal.list() is much enhanced.  By default it still checks list
values by postion rather than name, as that behavior is more strict
and thus more correct.

But when using the by.name="auto" and by.pos=TRUE options (which are
the defaults), in addition to by-positing differences,
all.equal.list() now also reports by-name differences in those places
(and only those places) where doing so should be helpful to the user.

Also, optionally, using by.name=TRUE and by.pos=FALSE will give
behavior like S-Plus.

The attached patch is also available here:

  http://www.piskorski.com/R/patches/all-equal-patch-20051009.txt

If you want to see the entire file rather than a patch against R
2.2.0, that is also available here:

  http://www.piskorski.com/R/patches/all.equal.R

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

--k1lZvvs/B4yU6o8G
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="all-equal-patch-20051009.txt"

Index: all.equal.R
===
RCS file: /home/cvsroot/dtk/Splus/patches/all.equal.R,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -r1.1.1.1 -r1.4
--- all.equal.R 1 Oct 2005 06:51:06 -   1.1.1.1
+++ all.equal.R 1 Oct 2005 13:10:25 -   1.4
@@ -1,4 +1,75 @@
-all.equal <- function(target, current, ...) UseMethod("all.equal")
+#
+# This is a copy of "src/library/base/R/all.equal.R" from
+# "R-beta_2005-09-24_r35666.tar.gz", plus our modifications.  (The
+# all.equal.R in that tarball seems to be unchanged at least as far
+# back as R 2.1.0.)
+#
+# Further detail is in the comments in each function, but basically,
+# all modifications here involve either of two sorts of improvements:
+#
+# 1. Check names!  Stock R all.equal() (unlike S-Plus) ignores names
+#completely on some objects.  I consider this bogus, if the names
+#are different, the object is NOT "the same".
+#
+# 2. When the object is different, return more output to help the user
+#understand just WHAT is different.
+#
+# Note: Here in our patches package, we purposely CVS import and than
+# override ALL the base all.equal() methods, NOT just the ones we're
+# actually modifying.  At first I tried only overriding some of them,
+# but in that case, even though package:patches was earlier on the
+# search path than package:base, base methods appeared to
+# preferentially call the original base versions, rather than the
+# patches versions that I wanted.  So, big hammer it, override
+# everything - which will probably make it easier to contribute these
+# improvements back to stock R anyway.
+#
+# [EMAIL PROTECTED], 2005/10/01 02:29 EDT
+#
+# $Id: all.equal.R,v 1.4 2005/10/01 13:10:25 andy Exp $
+
+
+# In S-Plus, all.equal() prefers to index objects by name, while in
+# stock R, it prefers to index by position.  IMO, *NEITHER* of those
+# behaviors are fully correct.  What we really want is to compare
+# things BOTH by name and by position.
+#
+# Here's ONE example of the effect of these R patches:
+#
+## S-Plus 6.2, no patches to all.equal():
+#> all.equal(list(a=2,2,x=3,zap=1,foo=42,"NA"=T) 
,list(b=1,2,y=4,foo=7,zap=1,"NA"=F))
+#[1] "Names: 4 string mismatches"
+#[2] "Components not in target: b, y"
+#[3] "Components not in current: a, x"   
+#[4] "Component foo: Mean relative difference: 0.833"
+#[5] "Component NA: Mean relative difference: 1" 
+#
+## R 2.1.0, no patches to all.equal():
+#> all.equal(list(a=2,2,x=3,zap=1,foo=42,"NA"=T) 
,list(b=1,2,y=4,foo=7,zap=1,"NA"=F))
+#[1] "Names: 4 string mismatches"   
+#[2] "Component 1: Mean relative  difference: 0.5"  
+#[3] "Component 3: Mean relative  difference: 0.333"
+#[4] "Component 4: Mean relative  difference: 6"
+#[5] "Component 5: Mean relative  difference: 0.9761905"
+#[6] "Component 6: Mean relative  difference: 1"
+#
+## R 2.1.0 with our patches here:
+#> all.equal(list(a=2,2,x=3,zap=1,foo=42,"NA"=T) 
,list(b=1,2,y=4,foo=7,zap=1,"NA"=F))
+# [1] "Names: 4 string mismatches" 
+# [2] "Components not in target: b, y" 
+# [3] "Components not in current: a, x"
+# [4] "Component foo: Mean relative  difference: 0.833"
+# [5] "Component NA: Mean relative  difference: 1" 
+# [6] "Component 1: Mean relative  dif

[Rd] [ subscripting sometimes loses names (PR#8192)

2005-10-09 Thread atp
d with this function.
 } else {
if (!missing(drop) && drop && length(nrow(result)) > 0 && 
nrow(result)==1) {
   #replicate documented behavior of [.data.frame: drop=T acts
   #differently then missing drop arg for this case!
   result <- as.list(result)
} 
 }
  } else {
 result <- data.frame.original.fcn(x, ..., drop=F) 
  }
  result
   }

} else { # For R:
   # First make sure that if you run this twice, you still get the
   # real original function:

   # Also remove the obnoxious "drop argument will be ignored" warning
   # entirely from the function.  I would like to regsub out the whole
   # warning() call, but I can't seem to get that to work.  So, just
   # replace the first warning() call with a call to our dtk.null()
   # function which does nothing.  Fortunately, the warning() call we
   # want to get rid of is indeed the first (actually the only) one:
   # [EMAIL PROTECTED], 2005/07/01 17:53 EDT

   brace.original.fcn <- get("[",pos="package:base")
   data.frame.original.fcn.0 <- get("[.data.frame",pos="package:base")
   data.frame.original.fcn <- data.frame.original.fcn.0
   body(data.frame.original.fcn) <-
  parse(text=sub('warning(..?drop argument will be ignored..?)'  
,'dtk.null()'
  ,deparse(body(data.frame.original.fcn.0)) ,ignore.case=T))

   # For R (at least version 2.1.0) we need to override BOTH the
   # "[.data.frame" and "[" functions.
   # [EMAIL PROTECTED], 2005/07/01 10:11 EDT

   "[.data.frame" <- function(x ,i ,j ,... ,drop=T) {

  # The stock R default value for the drop arg is:
  #   drop=(if(missing(i)) TRUE else length(names(x)) == 1)
  # However, that DOES cause certain differences from S-Plus, so
  # we do NOT use it.  [EMAIL PROTECTED], 2005/07/01 13:18 EDT

  # TODO: Does above S-Plus problem with lm() also apply here?
  # [EMAIL PROTECTED], 2005/07/01 10:51 EDT

  class.x <- class(x)
  caller <- sys.call(sys.parent())[[2]]
  if (length(class.x)==1 && class.x=="data.frame" &&
  (mode(caller) != "name" || (caller != "value"))) {
 # If caller is a name and it is "value", then it is the
 # lhs case that we just want the original fcn to handle:

 code <- 'data.frame.original.fcn(x,'
 if (!missing(i))code <- paste(code ,'i' ,sep="")
 if (length(dim(x)) > 1 && (missing(i) || length(dim(i)) <= 1))
code <- paste(code ,',' ,sep="")
 if (!missing(j))code <- paste(code ,'j' ,sep="")
 if (!missing(...))  code <- paste(code ,',...' ,sep="")
 code <- paste(code ,',drop=F' ,sep="")
 code <- paste(code ,')' ,sep="")
 #cat("Debug: code to eval:  ") ; print(code)
 result <- eval(parse(text=code))

 if (drop && length(ncol(result) > 0) && ncol(result)==1) {
save.names <- dimnames(result)[[1]]
#this approach works for factors too
result <- result[[1]]
names(result) <- save.names

# TODO: Unfortunately still broken for objects with new
# style classes, since it does not distinguish among
# methods that have or do not have a getnames method.
# library(missing) is an example: The multiple imputations
# on an object get lost if subscripted with this function.
 } else {
if (!missing(drop) && drop && length(nrow(result)) > 0 && 
nrow(result)==1) {
   #replicate documented behavior of [.data.frame: drop=T acts
   #differently then missing drop arg for this case!
   result <- as.list(result)
} 
 }
  } else {
 if (missing(i))
result <- data.frame.original.fcn(x ,... ,drop=F)
 else if (missing(j))
result <- data.frame.original.fcn(x ,i ,... ,drop=F)
 else
result <- data.frame.original.fcn(x ,i ,j ,... ,drop=F)
  }
  result
   }

   # R has this problem with NA names:
   # 
   #   # S-Plus 6.2.1:
   #   > foo <- c("a"=1,"b"=2,"c"=3)
   #   > foo[c("a","c","atp")]
   #a c atp 
   #1 3  NA
   # 
   #   # R 2.0.0, or 2.1.0:
   #   > foo <- c("a"=1,"b"=2,"c"=3)
   #   > foo[c("a","c","atp")]
   #  ac  
   #  13   NA 
   #
   # This is very very bad, it causes sof

Re: [Rd] [ subscripting sometimes loses names (PR#8192)

2005-10-19 Thread atp
On Wed, Oct 19, 2005 at 02:33:50PM +0200, Martin Maechler wrote:

> Proper R bug reports provide short "cut & paste" executable
> example code {i.e. no prompt, no output} or at least the 
> transcript of such code {transcript : input (+ prompt) + output}.

My patch includes the function dtk.test.brace.names() which
demonstrates the problem.  If you source just that function into a
completely stock R, you can see the losing names problem by running:

  dtk.test.brace.names(return.results.p=T ,only="all")

To make it easier to see just what the problem is, I'll send example
output in my next email.

> Also your script is for R and S-plus and at least in some places 
> it seems you think R has a bug because it behaves differently
> than S or S-plus.   

No, I don't think that.  If comments in my code give that impression
then that's a bug in my comments, it was not my intention.

My coworkers and I originally fixed the name losing problem in S-Plus,
then later did so in R, so in some places I might have sloppily said,
"R is different than S-Plus" when what I REALLY meant was, "Stock R is
different than our fixed/patched S-Plus where we've already solved
these name-losing problems."

Stock S-Plus and R both suffer from losing names when they shouldn't.
Since I use both dialects, I've included (ugly) fixes for both.  Of
course you probably only care about the R part, but I didn't think it
would hurt to include both.

> Now I'm sure you know from the R-FAQ that there are quite a few
> intentional differences between the two dialects of S,

Yes, I'm aware of that FAQ.  I also just finished porting a large body
of code from S-Plus to R a few months ago, so I have a very concrete
appreciation of the MANY little S-Plus vs. R differences, many more
than are mentioned in that FAQ.

Some of those differences are simply arbitrary or accidental, but
others are places where S-Plus was basically doing something dumb and
the R behavior is better.  I have no complaints about this.  :)

(The converse, where R's behavior is definitely inferior to that of
S-Plus, seems to be a lot less common, and are usually more minor.)

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [ subscripting sometimes loses names (PR#8192)

2005-10-19 Thread atp

Here is an example of the losing names problem in stock R 2.2.0.  Note
that below, only stock R packages are loaded, and then I manually
source in just my dtk.test.brace.names() testing function, nothing
else.

Since the list-of-lists output of dtk.test.brace.names() is very
lengthy, I've manually cut-and-pasted it into a tabular format to save
space and make inspection easier.  As you can see, out of its 15 test
cases, stock R 2.2.0 fails 4 of them while the other 12 are Ok.

Too see what these simple subscripting tests actually DO, please refer
to the body of dtk.test.brace.names() from my previous emails above.


R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.2.0  (2005-10-06 r35749)
> search()
[1] ".GlobalEnv""package:methods"   "package:graphics" 
[4] "package:grDevices" "package:datasets"  "package:utils"
[7] "package:stats" "Autoloads" "package:base" 

> dtk.test.brace.names(return.results.p=T ,only="all")

Ok?  Actual Result Desired Result
---  ----
 $vec.1
BAD  $vec.1[[1]]   $vec.1[[2]]
ac  a  c no
13   NA 1  3 NA

 $diag.1
Ok   $diag.1[[1]]  $diag.1[[2]]
 [1]  1  7 13 19 25[1]  1  7 13 19 25

 $diag.2
Ok   $diag.2[[1]]  $diag.2[[2]]
 [1]  1  7 13 19 25[1]  1  7 13 19 25

 $df.a.1
Ok   $df.a.1[[1]]  $df.a.1[[2]]
 a b   a b
 4 5   4 5

 $df.b.1
BAD  $df.b.1[[1]]  $df.b.1[[2]]
 [1] 4 5   a b
   4 5

 $df.a.2
Ok   $df.a.2[[1]]  $df.a.2[[2]]
 c b a c b a
 6 5 4 6 5 4

 $df.b.2
BAD  $df.b.2[[1]]  $df.b.2[[2]]
 [1] 6 5 4 c b a
   6 5 4

 $df.a.3
Ok   $df.a.3[[1]]  $df.a.3[[2]]
 a b   a b
 3 4   3 4

 $df.b.3
BAD  $df.b.3[[1]]  $df.b.3[[2]]
 [1] 3 4   a b
   3 4

 $df.a.4
Ok   $df.a.4[[1]]  $df.a.4[[2]]
 col1 col2 col1 col2
2424

 $df.b.4
Ok   $df.b.4[[1]]  $df.b.4[[2]]
   col1 col2 col1 col2
 b24   b24

 $df.a.5
Ok   $df.a.5[[1]]  $df.a.5[[2]]
 col1 col2 col1 col2
2424

 $df.b.5
Ok   $df.b.5[[1]]  $df.b.5[[2]]
 $df.b.5[[1]]$col1 $df.b.5[[2]]$col1
 [1] 2 [1] 2
 $df.b.5[[1]]$col2 $df.b.5[[2]]$col2
 [1] 4 [1] 4

 $df.a.6
Ok   $df.a.6[[1]]  $df.a.6[[2]]
   col1 col2 col1 col2
 b24   b24

 $df.b.6
Ok   $df.b.6[[1]]  $df.b.6[[2]]
   col1 col2 col1 col2
 b24   b24

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R segfault in fgets from do_system under high memory use (PR#14008)

2009-10-15 Thread atp
Full_Name: Andrew Piskorski
Version: R 2.9.2 (Patched), 2009-09-24, svn.rev 49930, x86_64-unknown-linux-gnu
OS: Linux, Ubuntu 8.04.3 LTS
Submission from: (NULL) (66.31.65.247)


I have a large memory test case which segaults R everytime in an fgets
call from R's do_system (see below).

This appears to be because R does not check the return value of the
system popen, and I have a simple patch to src/main/sysutils.c and
src/unix/sys-unix.c which fixes the problem.  I will attempt to attach
the patch after submitting this initial bug report.

This is on Linux, Ubuntu 8.04.3 LTS with:

R 2.9.2 (Patched), 2009-09-24, svn.rev 49930, x86_64-unknown-linux-gnu


Below is some further detail on the problem, from BEFORE applying my
patch:


Valgrind doesn't seem to find anything unusual until an "Invalid read
of size 4" in fgets right before it segfaults.  Valgrind is also
reporting an "Address 0x0 is not stack'd" message there, which I think
means that do_system is passing a 0 address to fgets, which is then
causing the segfault.

Looking at the fgets all in src/unix/sys-unix.c, the buf argument is
statically allocated so I don't see how it could be 0.  fp, the 3rd
argument to fgets, is set by R_popen().  So, I think the system
popen() call is failing to fork or allocate memory or whatever it's
trying to do, and is returning a NULL.

gdb and Valgrind output from the failure follow:


Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x7f735647f4fd in fgets () from /lib/libc.so.6
#1  0x7f7356b39fe3 in do_system (call=,
op=, args=,
rho=) at ../../../src/unix/sys-unix.c:273
#2  0x7f7356aa1c09 in do_internal (call=,
op=, args=0xace1d220, env=0xa90e3820)
at ../../../src/main/names.c:1150
#3  0x7f7356a6ec11 in Rf_eval (e=0x872638, rho=0xa90e3820)
at ../../../src/main/eval.c:461
[...]
#90 0x7f7356a92950 in run_Rmainloop () at ../../../src/main/main.c:966
#91 0x0040088b in main (ac=,
av=) at ../../../src/main/Rmain.c:33
#92 0x7f735643a1c4 in __libc_start_main () from /lib/libc.so.6
#93 0x004007a9 in _start ()
(gdb) q


==27499== Invalid read of size 4
==27499==at 0x55E84FD: fgets (in /lib/libc-2.7.so)
==27499==by 0x4FB6FB2: do_system (sys-unix.c:273)
==27499==by 0x4F1EBD8: do_internal (names.c:1150)
==27499==by 0x4EEBBE0: Rf_eval (eval.c:461)
==27499==by 0x4EEC9D1: do_begin (eval.c:1191)
==27499==by 0x4EEBBE0: Rf_eval (eval.c:461)
==27499==by 0x4EEE34E: Rf_applyClosure (eval.c:667)
==27499==by 0x4EEBAFB: Rf_eval (eval.c:505)
==27499==by 0x4EEC9D1: do_begin (eval.c:1191)
==27499==by 0x4EEBBE0: Rf_eval (eval.c:461)
==27499==by 0x4EEBBE0: Rf_eval (eval.c:461)
==27499==by 0x4EEC9D1: do_begin (eval.c:1191)
==27499==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
/home/andy/t/vg-R.sh: line 34: 27499 Segmentation fault

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bugs.r-project.org, Submit Changes does nothing (PR#14009)

2009-10-15 Thread atp
Full_Name: Andrew Piskorski
Version: not applicable, web-based R bug tracker
OS: not applicable, web-based R bug tracker
Submission from: (NULL) (66.31.65.247)


Once I have submitted a bug via the bugs.r-project.org web interface,
I can find the bug and view it.  E.g.:

  http://bugs.r-project.org/cgi-bin/R/incoming?id=14008

That shows me an empty "Notes:" box and a button to "Submit Changes".

I entered some text (a patch for the above id=14008 segfault) and
clicked Submit, but my entry seems to be silently discarded.  There is
no error message, but the new data I attempted to enter never shows up
on the bug report.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] PR#14008, R segfault in fgets from do_system under high memory use

2009-10-15 Thread atp
On this issue:

  http://bugs.r-project.org/cgi-bin/R/incoming?id=14008

Here's a small patch which fixes the problem:


$ svn diff src/main/sysutils.c src/unix/sys-unix.c
Index: src/main/sysutils.c
===
--- src/main/sysutils.c (revision 49961)
+++ src/main/sysutils.c (working copy)
@@ -260,6 +260,9 @@
 #else
 fp = popen(command, type);
 #endif
+if (NULL == fp) {
+   error(_("popen failed with errno %i: %s"), errno, strerror(errno));
+}
 return fp;
 }
 #endif /* HAVE_POPEN */
Index: src/unix/sys-unix.c
===
--- src/unix/sys-unix.c (revision 49961)
+++ src/unix/sys-unix.c (working copy)
@@ -270,6 +270,11 @@
 
PROTECT(tlist);
fp = R_popen(translateChar(STRING_ELT(CAR(args), 0)), x);
+if (NULL == fp) {
+UNPROTECT(1);
+errorcall(call, _("R_popen returned NULL."));
+return R_NilValue;
+}
 for (i = 0; fgets(buf, INTERN_BUFSIZE, fp); i++) {
 read = strlen(buf);
 if(read >= INTERN_BUFSIZE - 1)

-- 
Andrew Piskorski 
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R trunk (2.7) build fails with -fpic, needs -fPIC (PR#10372)

2007-10-24 Thread atp
On Linux x86-64 (Ubuntu 6.06), the latest R sources from the
Subversion trunk fail to build with the following "recompile with
-fPIC" error:

  $ ./configure --with-x=yes --prefix=$inst_dir --enable-R-shlib 
--with-tcltk=/usr/lib/tcl8.4 --with-tcl-config=/usr/lib/tcl8.4/tclConfig.sh
  $ make

  /usr/bin/ld: ../appl/approx.o: relocation R_X86_64_32 against `a local 
symbol' can not be used when making a shared object; recompile with -fPIC
  ../appl/approx.o: could not read symbols: Bad value

This is easy to fix by changing 4 lines in the configure script from
"-fpic" to -fPIC", as shown in the patch below.

I saw this failure on an Intel x86-64 server, running Ubuntu 6.06:

  $ uname -srvm
  Linux 2.6.20.4 #1 SMP PREEMPT Sat Mar 31 07:46:01 EDT 2007 x86_64

  $ cat /etc/lsb-release
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=6.06
  DISTRIB_CODENAME=dapper
  DISTRIB_DESCRIPTION="Ubuntu 6.06.1 LTS"

  $ grep name /proc/cpuinfo
  model name  :   Intel(R) Xeon(TM) CPU 3.80GHz
  model name  :   Intel(R) Xeon(TM) CPU 3.80GHz

  $ dpkg -l libc6
  ||/ Name Version  Description
  
+++---=
  ii  libc62.3.6-0ubuntu20.4GNU C Library: Shared libraries and 
Timezone data

  $ apt-cache show libc6 | grep Architecture | uniq
  Architecture: amd64

Here's a patch which fixes the problem:


$ svn diff configure
Index: configure
===
--- configure   (revision 43265)
+++ configure   (working copy)
@@ -32806,7 +32806,7 @@
   cpicflags="-fPIC"
   ;;
 *)
-  cpicflags="-fpic"
+  cpicflags="-fPIC"
   ;;
   esac
   shlib_ldflags="-shared"
@@ -32817,7 +32817,7 @@
   fpicflags="-fPIC"
   ;;
 *)
-  fpicflags="-fpic"
+  fpicflags="-fPIC"
   ;;
   esac
 fi
@@ -32827,7 +32827,7 @@
   cxxpicflags="-fPIC"
   ;;
 *)
-  cxxpicflags="-fpic"
+  cxxpicflags="-fPIC"
   ;;
   esac
   shlib_cxxldflags="-shared"
@@ -47768,7 +47768,7 @@
   fcpicflags="-fPIC"
   ;;
 *)
-  fcpicflags="-fpic"
+  fcpicflags="-fPIC"
   ;;
   esac
 fi

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel