[Rd] bug and patch: strptime first-of-month error in (possibly unsupported use of) "%j" format (PR#9577)

2007-03-21 Thread jbrzusto
Full_Name: John Brzustowski
Version: R-devel-trunk
OS: linux (problem under Windows too)
Submission from: (NULL) (74.101.124.238)


(This bug was discovered by Phil Taylor, Acadia University.)
I'm not sure from reading the documentation whether strptime(x, "%j") is meant
to be supported, but if so, there is a bug which prevents it from working on the
first day of months after January:

> strptime(31:33, "%j")
[1] "2007-01-31" NA   "2007-02-02"

# the full extent of R's taunting
strptime(1 + cumsum(c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30)), "%j")
 [1] NA NA NA NA NA NA NA NA NA NA NA
>

The problem is an edge-case comparison in datetime.c:glibc_fix().  This
generates a date like "Jan 32", which validate_tm() catches and NAs.
(Values of field tm->tm_yday start at 0, not 1.)
=
PATCH:

--- R/src/main/datetime.c   (revision 40860)
+++ R/src/main/datetime.c   (working copy)
@@ -796,7 +796,7 @@
 if(tm->tm_yday != NA_INTEGER) {
/* since we have yday, let that take precedence over mon/mday */
int yday = tm->tm_yday, mon = 0;
-   while(yday > (tmp = days_in_month[mon] +
+   while(yday >= (tmp = days_in_month[mon] +
  ((mon==1 && isleap(1900+tm->tm_year))? 1 : 0))) {
yday -= tmp;
mon++;

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] fix for broken largefile seek() on 32-bit linux (PR#9883)

2007-08-27 Thread jbrzusto
Full_Name: John Brzustowski
Version: R-devel-trunk, R-2.4.0
OS: linux
Submission from: (NULL) (206.248.132.197)


DESCRIPTION

seek() on files larger than 2 gigabytes fails for large values of "where" on
i386 linux 2.6.13 (and presumably other 32-bit unix-like platforms).

e.g.:

> f<-file("3gigabytefile.dat", "rb")
> seek(f, 3e9, "start", "r")
[1] 0  ## correct
> seek(f, NA, "start", "r")
[1] 0  ## should be 3e+09

DIAGNOSIS

Typo: the compile-time tests for large file support use "HAVE_SEEKO" instead of
"HAVE_FSEEKO", and so fail.  

The same typo appears in one of the extra/zlib files, so I'm fixing
it in the patch below, but I haven't tested whether that actually
produces a bug.

PATCH
Index: src/extra/zlib/gzio.c
===
--- src/extra/zlib/gzio.c   (revision 42664)
+++ src/extra/zlib/gzio.c   (working copy)
@@ -25,7 +25,7 @@
 #include "zutil.h"
 
 /* R ADDITION */
-#if defined(HAVE_OFF_T) && defined(HAVE_SEEKO)
+#if defined(HAVE_OFF_T) && defined(HAVE_FSEEKO)
 #define f_seek fseeko
 #define f_tell ftello
 #else
Index: src/include/Rconnections.h
===
--- src/include/Rconnections.h  (revision 42664)
+++ src/include/Rconnections.h  (working copy)
@@ -63,7 +63,7 @@
 
 typedef struct fileconn {
 FILE *fp;
-#if defined(HAVE_OFF_T) && defined(HAVE_SEEKO)
+#if defined(HAVE_OFF_T) && defined(HAVE_FSEEKO)
 off_t rpos, wpos;
 #else
 #ifdef Win32
Index: src/main/connections.c
===
--- src/main/connections.c  (revision 42664)
+++ src/main/connections.c  (working copy)
@@ -446,7 +446,7 @@
 
 /* --- file connections - */
 
-#if defined(HAVE_OFF_T) && defined(HAVE_SEEKO)
+#if defined(HAVE_OFF_T) && defined(HAVE_FSEEKO)
 #define f_seek fseeko
 #define f_tell ftello
 #else
@@ -570,7 +570,7 @@
 {
 Rfileconn this = con->private;
 FILE *fp = this->fp;
-#if defined(HAVE_OFF_T) && defined(HAVE_SEEKO)
+#if defined(HAVE_OFF_T) && defined(HAVE_FSEEKO)
 off_t pos;
 #else
 #ifdef Win32

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] "bug" and patch: quadratic running time for strsplit(..., fixed=TRUE) (PR#9902)

2007-09-07 Thread jbrzusto
Full_Name: John Brzustowski
Version: R-devel-trunk, R-2.4.0
OS: linux, gcc 4.0.3
Submission from: (NULL) (206.248.157.184)


This isn't a bug, but an easily-remedied performance issue.

SYMPTOM

> for (i in 1000 * (1:20)) {
   y <- paste(rep("asdf", times=i), collapse=" ")
   t <- system.time(strsplit(y, " ", fixed=TRUE))
   cat(sprintf("i=%5d time=%5d msec\n",i, round(1000*t[1])))
}

i= 1000 time=2 msec
i= 2000 time=9 msec
i= 3000 time=   20 msec
i= 4000 time=   34 msec
i= 5000 time=   57 msec
i= 6000 time=   77 msec
i= 7000 time=  107 msec
i= 8000 time=  136 msec
i= 9000 time=  177 msec
i=1 time=  230 msec
i=11000 time=  275 msec
i=12000 time=  308 msec
i=13000 time=  371 msec
i=14000 time=  446 msec
i=15000 time=  544 msec
i=16000 time=  639 msec
i=17000 time=  726 msec
i=18000 time=  864 msec
i=19000 time=  944 msec
i=2 time= 1106 msec

DIAGNOSIS

strsplit() uses strlen() in the bounds check clause of a for(;;)
statement, which forces a full scan of the source string for each
character in the source string.  Unlike R's LENGTH() macro, strlen for
C strings is an expensive operation, and in this case (at least),
gcc 4.0.3's -O2 level optimizer is not able to recognize the call as a loop
invariant, despite the declaration "const char *buf".

REMEDIED BEHAVIOUR

i= 1000 time=0 msec
i= 2000 time=1 msec
i= 3000 time=1 msec
i= 4000 time=0 msec
i= 5000 time=1 msec
i= 6000 time=1 msec
i= 7000 time=1 msec
i= 8000 time=2 msec
i= 9000 time=2 msec
i=1 time=2 msec
i=11000 time=2 msec
i=12000 time=2 msec
i=13000 time=2 msec
i=14000 time=2 msec
i=15000 time=4 msec
i=16000 time=3 msec
i=17000 time=3 msec
i=18000 time=4 msec
i=19000 time=3 msec
i=2 time=4 msec

RELATED ISSUES

A simple search turns up other instances of this usage in R's source.
For completeness, I'm submitting patches for all of them, but have not
tested whether they in fact cause a detectable performance problem.
In the case of modules/X11/dataentry.c, the patch also fixes a presumably
ineffectual "bug".

$ grep -nR "for *([^;]*;[^;]*strlen *(" *

main/rlocale.c:137: for (i = 0; i < strlen(lc_str) && i < 
sizeof(lc_str); i++)
main/printutils.c:486:  for(j = 0; j < strlen(buf); j++) *q++ = buf[j];
main/sysutils.c:493:for(j = 0; j < strlen(sub); j++) *outbuf++ = 
sub[j];
modules/X11/rotated.c:608:  for(i=0; i 1 && strncmp(bufp, split, slen))) continue;
ntok++;
@@ -480,7 +481,7 @@
/* This is UTF-8 safe since it compares whole strings,
   but it would be more efficient to skip along by chars.
 */
-   for(; bufp - buf < strlen(buf); bufp++) {
+   for(; bufp < ebuf; bufp++) {
if((slen == 1 && *bufp != *split) ||
   (slen > 1 && strncmp(bufp, split, slen))) continue;
if(slen) {
Index: src/main/rlocale.c
===
--- src/main/rlocale.c  (revision 42792)
+++ src/main/rlocale.c  (working copy)
@@ -127,14 +127,14 @@
 int Ri18n_wcwidth(wchar_t c)
 {
 char lc_str[128];
-unsigned int i;
+unsigned int i, j;
 
 static char *lc_cache = "";
 static int lc = 0;
 
 if (0 != strcmp(setlocale(LC_CTYPE, NULL), lc_cache)) {
strncpy(lc_str, setlocale(LC_CTYPE, NULL), sizeof(lc_str));
-   for (i = 0; i < strlen(lc_str) && i < sizeof(lc_str); i++)
+   for (i = 0, j = strlen(lc_str); i < j && i < sizeof(lc_str); i++)
lc_str[i] = toupper(lc_str[i]);
for (i = 0; i < (sizeof(cjk_locale_name)/sizeof(cjk_locale_name_t)); 
 i++) {
Index: src/main/printutils.c
===
--- src/main/printutils.c   (revision 42792)
+++ src/main/printutils.c   (working copy)
@@ -483,7 +483,8 @@
else
 #endif
snprintf(buf, 11, "\\u%04x", k);
-   for(j = 0; j < strlen(buf); j++) *q++ = buf[j];
+   memcpy(q, buf, j = strlen(buf));
+   q += j;
p += res;
}
i += (res - 1);
Index: src/main/sysutils.c
===
--- src/main/sysutils.c (revision 42792)
+++ src/main/sysutils.c (working copy)
@@ -490,8 +490,9 @@
R_AllocStringBuffer(2*cbuff.bufsize, &cbuff);
goto top_of_loop;
}
-   for(j = 0; j < strlen(sub); j++) *outbuf++ = sub[j];
-   outb -= strlen(sub);
+   memcpy(outbuf, sub, j = strlen(sub)); 
+   outbuf += j;
+   outb -= j;
}
inbuf++; inb--;
goto next_char;
Index: src/modules/

[Rd] minor bug and patch: attr(x, "names")<-y when y is a pairlist (PR#10807)

2008-02-20 Thread jbrzusto
Full_Name: John Brzustowski
Version: R-devel trunk and R 2.4.0
OS: linux
Submission from: (NULL) (76.10.152.79)


# Bug: 
> x<-1:3
> attr(x, "names")<-pairlist("a", "b", "c")
> x
a a a 
1 2 3 
# Note that the simpler alternative does work:
> names(x)<-pairlist("a", "b", "c")
> x
a b c 
1 2 3 

# After applying the patch:
> x<-1:3
> attr(x, "names")<-pairlist("a", "b", "c")
> x
a b c 
1 2 3 

# The problem is in src/main/attrib.c: namesgets(), where
# the pointer to the pairlist of names is not advanced.

# Here is a simple patch against R-devel trunk.  A cleaner
# but more complicated approach would be 
# to refactor namesgets() and do_namesgets().
#
# Note that the name list need not be as long as the LHS,
# because subsequent code pads the name list with NAs, if needed.

 Index: trunk/src/main/attrib.c
===
--- trunk/src/main/attrib.c (revision 44547)
+++ trunk/src/main/attrib.c (working copy)
@@ -681,7 +681,7 @@
 SEXP namesgets(SEXP vec, SEXP val)
 {
 int i;
-SEXP s, rval;
+SEXP s, rval, tval;
 
 PROTECT(vec);
 PROTECT(val);
@@ -695,8 +695,8 @@
else {
rval = allocVector(STRSXP, length(vec));
PROTECT(rval);
-   for (i = 0; i < length(vec); i++) {
-   s = coerceVector(CAR(val), STRSXP);
+   for (i = 0, tval=val; i < length(vec) && tval != R_NilValue; i++,
tval=CDR(tval)) {
+   s = coerceVector(CAR(tval), STRSXP);
SET_STRING_ELT(rval, i, STRING_ELT(s, 0));
}
UNPROTECT(1);

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] patch: two minor debugging-related pointer protection stack issues (PR#10832)

2008-02-23 Thread jbrzusto
Full_Name: John Brzustowski
Version: R-devel trunk and R-2.4.0
OS: linux
Submission from: (NULL) (76.10.152.79)


Here are two minor potential issues in pointer protection stack (PPS) code which
could arise in debugging R with valgrind.  Only the second could possibly cause
a problem in normal use of R, and only under laughably implausible
circumstances.

(1) valgrind is given a wrong address when it is told to mark the reserved "red
zone" portion of the PPS as "no access".   This might generate spurious access
errors while debugging under valgrind.

(2) unprotect_ptr always does one more copy than necessary when collapsing the
slot of the pointer it is unprotecting.  This is but a trivial performance
penalty unless the PPS is full, in which case it causes a read of the first word
in the PPS red zone. If issue (1) were corrected, that would trigger a spurious
valgrind error.

If all of the following were to occur:

  a) the PPS overflowed 
  b) the cleanup code filled the PPS red zone
  c) unprotect_ptr was called from the cleanup code while the PPS red zone was
full
and 
  d) the memory malloc'd for the PPS occupied the very end of a readable
segment
 with fewer than sizeof(SEXP) trailing guard bytes

then a segfault would result.  (You are laughing at this point, right?)

In any case, here is a patch against R-devel trunk:

Index: trunk/src/main/memory.c
===
--- trunk/src/main/memory.c (revision 44589)
+++ trunk/src/main/memory.c (working copy)
@@ -1557,7 +1557,7 @@
R_Suicide("couldn't allocate memory for pointer stack");
 R_PPStackTop = 0;
 #if VALGRIND_LEVEL > 1
-VALGRIND_MAKE_NOACCESS(R_PPStackTop+R_PPStackSize,PP_REDZONE_SIZE);
+VALGRIND_MAKE_NOACCESS(R_PPStack+R_PPStackSize,PP_REDZONE_SIZE);
 #endif
 vsfac = sizeof(VECREC);
 R_VSize = (((R_VSize + 1)/ vsfac));
@@ -2329,11 +2329,11 @@
 } while ( R_PPStack[--i] != s );
 
 /* OK, got it, and  i  is indexing its location */
-/* Now drop stack above it */
+/* Now drop stack above it, if any */
 
-do {
-   R_PPStack[i] = R_PPStack[i + 1];
-} while ( i++ < R_PPStackTop );
+while (++i < R_PPStackTop) {
+   R_PPStack[i - 1] = R_PPStack[i];
+}
 
 R_PPStackTop--;
 }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bug/suggestion: debugger should respect option "deparse.max.lines" when printing the call (PR#13647)

2009-04-09 Thread jbrzusto
Full_Name: John Brzustowski
Version: 2.8.1
OS: linux
Submission from: (NULL) (67.71.250.146)


When entering a debug()'ed function, the call printout is not limited by
options()$deparse.max.lines as it is when one uses browser() or trace().

Should it be?  If so, here's a patch:

diff -cr R-2.8.1/src/library/base/man/options.Rd
R-2.8.1-patched/src/library/base/man/options.Rd
*** R-2.8.1/src/library/base/man/options.Rd 2008-11-12 04:23:36.0
-0500
--- R-2.8.1-patched/src/library/base/man/options.Rd 2009-04-09
10:12:31.0 -0400
***
*** 77,84 
initialized (see \code{\link{Startup}}).}
  
  \item{\code{deparse.max.lines}:}{controls the number of lines used
!   when deparsing in \code{\link{traceback}} and
!   \code{\link{browser}}.  Initially unset, and only used if set to
a positive integer.}
  
  \item{\code{digits}:}{controls the number of digits to print when
--- 77,85 
initialized (see \code{\link{Startup}}).}
  
  \item{\code{deparse.max.lines}:}{controls the number of lines used
!   when deparsing in \code{\link{traceback}}, \code{\link{browser}},
!   and upon entry to a function whose debugging flag is set.
!   Initially unset, and only used if set to
a positive integer.}
  
  \item{\code{digits}:}{controls the number of digits to print when
diff -cr R-2.8.1/src/main/eval.c R-2.8.1-patched/src/main/eval.c
*** R-2.8.1/src/main/eval.c 2008-10-05 18:05:02.0 -0400
--- R-2.8.1-patched/src/main/eval.c 2009-04-09 09:45:03.0 -0400
***
*** 526,531 
--- 526,532 
  volatile  SEXP newrho;
  SEXP f, a, tmp;
  RCNTXT cntxt;
+ int itmp;
  
  /* formals = list of formal parameters */
  /* actuals = values to be bound to formals */
***
*** 609,615 
--- 610,621 
  SET_DEBUG(newrho, DEBUG(op));
  if (DEBUG(op)) {
Rprintf("debugging in: ");
+ 
+   itmp = asInteger(GetOption(install("deparse.max.lines"), R_BaseEnv));
+   if(itmp != NA_INTEGER && tmp > 0) R_BrowseLines = itmp;
PrintValueRec(call,rho);
+   R_BrowseLines = 0;
+ 
/* Is the body a bare symbol (PR#6804) */
if (!isSymbol(body) & !isVectorAtomic(body)){
/* Find out if the body is function with only one statement. */

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] corrected patch: debugger should respect deparse.max.lines (PR#13648)

2009-04-09 Thread jbrzusto
Full_Name: John Brzustowski
Version: 2.8.1
OS: linux
Submission from: (NULL) (67.71.250.146)


My apologies - there was a bug in the patch I submitted on this topic earlier
today.

diff -cr R-2.8.1/src/library/base/man/options.Rd
R-2.8.1-patched/src/library/base/man/options.Rd
*** R-2.8.1/src/library/base/man/options.Rd 2008-11-12 04:23:36.0
-0500
--- R-2.8.1-patched/src/library/base/man/options.Rd 2009-04-09
10:12:31.0 -0400
***
*** 77,84 
initialized (see \code{\link{Startup}}).}
  
  \item{\code{deparse.max.lines}:}{controls the number of lines used
!   when deparsing in \code{\link{traceback}} and
!   \code{\link{browser}}.  Initially unset, and only used if set to
a positive integer.}
  
  \item{\code{digits}:}{controls the number of digits to print when
--- 77,85 
initialized (see \code{\link{Startup}}).}
  
  \item{\code{deparse.max.lines}:}{controls the number of lines used
!   when deparsing in \code{\link{traceback}}, \code{\link{browser}},
!   and upon entry to a function whose debugging flag is set.
!   Initially unset, and only used if set to
a positive integer.}
  
  \item{\code{digits}:}{controls the number of digits to print when
diff -cr R-2.8.1/src/main/eval.c R-2.8.1-patched/src/main/eval.c
*** R-2.8.1/src/main/eval.c 2008-10-05 18:05:02.0 -0400
--- R-2.8.1-patched/src/main/eval.c 2009-04-09 09:45:03.0 -0400
***
*** 526,531 
--- 526,532 
  volatile  SEXP newrho;
  SEXP f, a, tmp;
  RCNTXT cntxt;
+ int itmp;
  
  /* formals = list of formal parameters */
  /* actuals = values to be bound to formals */
***
*** 609,615 
--- 610,621 
  SET_DEBUG(newrho, DEBUG(op));
  if (DEBUG(op)) {
Rprintf("debugging in: ");
+ 
+   itmp = asInteger(GetOption(install("deparse.max.lines"), R_BaseEnv));
+   if(itmp != NA_INTEGER && itmp > 0) R_BrowseLines = itmp;
PrintValueRec(call,rho);
+   R_BrowseLines = 0;
+ 
/* Is the body a bare symbol (PR#6804) */
if (!isSymbol(body) & !isVectorAtomic(body)){
/* Find out if the body is function with only one statement. */

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 2.3.1: interacting bugs in load() and gzfile() (PR#9271)

2006-10-02 Thread jbrzusto

Hello,

If repeated calls are made to save() using the same pre-opened gzfile
connection to a file, and then the connection is closed, the objects
saved by the second and subsequent calls are not correctly restored by
repeated calls to load() with a new gzfile connection to the same file.

What follows are a session exposing the bugs, analysis (see ANALYSIS),
patches (see PATCHES), and a session showing correct behaviour after
patching (see FIXED).  The problematic code is still present in the
trunk of the svn repository.

This is really due to two minor bugs, but because they interact,
I'm submitting this as one report.

Regards,

John Brzustowski


THE BUGS (with comments inserted)

> sessionInfo()
Version 2.3.1 (2006-06-01) 
i686-pc-linux-gnu 

attached base packages:
[1] "methods"   "stats" "graphics"  "grDevices" "utils" "datasets" 
[7] "base" 

## create two objects and save them to a gzfile:

> x <- 1:10
> y <- x^2
> f <- gzfile("tmp.dat", "wb")
> save(x, file=f)

## notice where the second object will be written:

> seek(f, NA)
[1] 88
> save(y, file=f)
> close(f)

## open the file and try to load the two objects, one at a time

> f <- gzfile("tmp.dat", "rb")
> e <- new.env()
> load(f,e)
> ls(e)
[1] "x"

## x was loaded correctly, but the connection has been closed:

> f
 descriptionclass mode text 
"gzcon(tmp.dat)"  "gzcon" "rb" "binary" 
  opened can readcan write 
"closed"  "yes"   "no" 

##  (see ANALYSIS)
## Try again:

> load(f,e)
Warning message:
this is already a gzcon connection in: gzcon(con, level, allowNonCompressed) 
> ls(e)
[1] "x"

## why the warning?
## and why was nothing apparently read this time?

> f
 descriptionclass mode text 
"gzcon(tmp.dat)"  "gzcon" "rb" "binary" 
  opened can readcan write 
"opened""yes" "no" 

> load(f,e)
Warning message:
this is already a gzcon connection in: gzcon(con, level, allowNonCompressed) 
> ls(e)
[1] "x" "y"

## this time, y was read correctly, but again why the warning?

> f
 descriptionclass mode text 
"gzcon(tmp.dat)"  "gzcon" "rb" "binary" 
  opened can readcan write 
"closed""yes" "no" 

## 

## try again, but first seek on the connection to the location of the second 
object

> close(f)
> f <- gzfile("tmp.dat", "rb")
> e <- new.env()
> seek(f,88)
[1] 0
> load(f,e)
> ls(e)
[1] "x"

## this should have loaded the object "y", which according to the earlier call 
to seek()
## begins at file offset 88.


ANALYSIS

There are two minor bugs here:

1.  The internal function loadFromConnection() incorrectly closes a
connection that was already open when it was called, instead of
closing a connection that was *not* already open.

2.  gzfile() sets a class of its return value to "file", rather than
"gzfile":

> f <- gzfile("whatever.gz", "wb")
> class(f)
[1] "file"  "connection"

In contrast, bzfile() exhibits correct behaviour:

> f2<-bzfile("more.bz2", "wb")
> class(f2)
[1] "bzfile" "connection"

This prevents load() from noticing that the connection passed to it is already
a gzfile, since it tests for inheritance from class "gzfile".  Therefore,
load() always re-opens the gzfile connection with gzcon(), which causes
reading to proceed from the beginning of the file, regardless of any seek() on 
the original gzfile connection.  This reopening of a gzfile with gzcon() causes
the warning, and prevents the seek() before the load() from having any effect.

Interaction: because (with bug 2) load() always reopens a gzfile
connection, bug 1's effect is to not close that reopened connection.
Ironically, this allows (with warnings) objects from multiple calls to save() 
to be
restored by multiple calls to load(), as demonstrated by the
 section in the session above, provided load() is 
being
called with a filename or closed connection rather than an open gzfile
connection.


PATCHES:

for bug 1:

diff -u R-2.3.1.original/src/main/saveload.c R-2.3.1/src/main/saveload.c
--- R-2.3.1.original/src/main/saveload.c2006-04-09 18:19:51.0 
-0400
+++ R-2.3.1/src/main/saveload.c 2006-10-02 13:10:21.0 -0400
@@ -2301,6 +2301,7 @@
PROTECT(res = RestoreToEnv(R_Unserialize(&in), aenv));
if (wasopen) {
endcontext(&cntxt);
+} else {
con->close(con);
}
UNPROTECT(1);

for bug 2:

diff -u  R-2.3.1.original/src/main/connections.c R-2.3.1/src/main/connections