[Rd] "bug" and patch: quadratic running time for strsplit(..., fixed=TRUE) (PR#9902)

2007-09-07 Thread jbrzusto
Full_Name: John Brzustowski
Version: R-devel-trunk, R-2.4.0
OS: linux, gcc 4.0.3
Submission from: (NULL) (206.248.157.184)


This isn't a bug, but an easily-remedied performance issue.

SYMPTOM

> for (i in 1000 * (1:20)) {
   y <- paste(rep("asdf", times=i), collapse=" ")
   t <- system.time(strsplit(y, " ", fixed=TRUE))
   cat(sprintf("i=%5d time=%5d msec\n",i, round(1000*t[1])))
}

i= 1000 time=2 msec
i= 2000 time=9 msec
i= 3000 time=   20 msec
i= 4000 time=   34 msec
i= 5000 time=   57 msec
i= 6000 time=   77 msec
i= 7000 time=  107 msec
i= 8000 time=  136 msec
i= 9000 time=  177 msec
i=1 time=  230 msec
i=11000 time=  275 msec
i=12000 time=  308 msec
i=13000 time=  371 msec
i=14000 time=  446 msec
i=15000 time=  544 msec
i=16000 time=  639 msec
i=17000 time=  726 msec
i=18000 time=  864 msec
i=19000 time=  944 msec
i=2 time= 1106 msec

DIAGNOSIS

strsplit() uses strlen() in the bounds check clause of a for(;;)
statement, which forces a full scan of the source string for each
character in the source string.  Unlike R's LENGTH() macro, strlen for
C strings is an expensive operation, and in this case (at least),
gcc 4.0.3's -O2 level optimizer is not able to recognize the call as a loop
invariant, despite the declaration "const char *buf".

REMEDIED BEHAVIOUR

i= 1000 time=0 msec
i= 2000 time=1 msec
i= 3000 time=1 msec
i= 4000 time=0 msec
i= 5000 time=1 msec
i= 6000 time=1 msec
i= 7000 time=1 msec
i= 8000 time=2 msec
i= 9000 time=2 msec
i=1 time=2 msec
i=11000 time=2 msec
i=12000 time=2 msec
i=13000 time=2 msec
i=14000 time=2 msec
i=15000 time=4 msec
i=16000 time=3 msec
i=17000 time=3 msec
i=18000 time=4 msec
i=19000 time=3 msec
i=2 time=4 msec

RELATED ISSUES

A simple search turns up other instances of this usage in R's source.
For completeness, I'm submitting patches for all of them, but have not
tested whether they in fact cause a detectable performance problem.
In the case of modules/X11/dataentry.c, the patch also fixes a presumably
ineffectual "bug".

$ grep -nR "for *([^;]*;[^;]*strlen *(" *

main/rlocale.c:137: for (i = 0; i < strlen(lc_str) && i < 
sizeof(lc_str); i++)
main/printutils.c:486:  for(j = 0; j < strlen(buf); j++) *q++ = buf[j];
main/sysutils.c:493:for(j = 0; j < strlen(sub); j++) *outbuf++ = 
sub[j];
modules/X11/rotated.c:608:  for(i=0; i 1 && strncmp(bufp, split, slen))) continue;
ntok++;
@@ -480,7 +481,7 @@
/* This is UTF-8 safe since it compares whole strings,
   but it would be more efficient to skip along by chars.
 */
-   for(; bufp - buf < strlen(buf); bufp++) {
+   for(; bufp < ebuf; bufp++) {
if((slen == 1 && *bufp != *split) ||
   (slen > 1 && strncmp(bufp, split, slen))) continue;
if(slen) {
Index: src/main/rlocale.c
===
--- src/main/rlocale.c  (revision 42792)
+++ src/main/rlocale.c  (working copy)
@@ -127,14 +127,14 @@
 int Ri18n_wcwidth(wchar_t c)
 {
 char lc_str[128];
-unsigned int i;
+unsigned int i, j;
 
 static char *lc_cache = "";
 static int lc = 0;
 
 if (0 != strcmp(setlocale(LC_CTYPE, NULL), lc_cache)) {
strncpy(lc_str, setlocale(LC_CTYPE, NULL), sizeof(lc_str));
-   for (i = 0; i < strlen(lc_str) && i < sizeof(lc_str); i++)
+   for (i = 0, j = strlen(lc_str); i < j && i < sizeof(lc_str); i++)
lc_str[i] = toupper(lc_str[i]);
for (i = 0; i < (sizeof(cjk_locale_name)/sizeof(cjk_locale_name_t)); 
 i++) {
Index: src/main/printutils.c
===
--- src/main/printutils.c   (revision 42792)
+++ src/main/printutils.c   (working copy)
@@ -483,7 +483,8 @@
else
 #endif
snprintf(buf, 11, "\\u%04x", k);
-   for(j = 0; j < strlen(buf); j++) *q++ = buf[j];
+   memcpy(q, buf, j = strlen(buf));
+   q += j;
p += res;
}
i += (res - 1);
Index: src/main/sysutils.c
===
--- src/main/sysutils.c (revision 42792)
+++ src/main/sysutils.c (working copy)
@@ -490,8 +490,9 @@
R_AllocStringBuffer(2*cbuff.bufsize, &cbuff);
goto top_of_loop;
}
-   for(j = 0; j < strlen(sub); j++) *outbuf++ = sub[j];
-   outb -= strlen(sub);
+   memcpy(outbuf, sub, j = strlen(sub)); 
+   outbuf += j;
+   outb -= j;
}
inbuf++; inb--;
goto next_char;
Index: src/modules/

Re: [Rd] orthographic mistake in ?citation (PR#9901)

2007-09-07 Thread Duncan Murdoch
On 07/09/2007 3:31 AM, [EMAIL PROTECTED] wrote:
> from ?citation
> 
> 
> Details:
> 
>  The R core development team and the very active community of
>  package authors have invested a lot of time and effort in creating
>  R as it is today. Please give credit where credit is due and cite
>  R and R packages when you use them for data anlysis.

Fixed, thanks.  BTW, no need to report spelling errors to the bug list: 
  reporting them on R-devel is simpler.

Duncan Murdoch
>  ^^^
> 
> 
> --> change "anlysis" to "analysis".
> 
> 
> Bye  
>
>  
> 
> <>
> 
> 
> 
> --please do not edit the information below--
> 
> Version:
>  platform = i486-pc-linux-gnu
>  arch = i486
>  os = linux-gnu
>  system = i486, linux-gnu
>  status = 
>  major = 2
>  minor = 5.1
>  year = 2007
>  month = 06
>  day = 27
>  svn rev = 42083
>  language = R
>  version.string = R version 2.5.1 (2007-06-27)
> 
> Locale:
> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
>  
> 
> Search Path:
>  .GlobalEnv, package:datasets, package:rcompgen, package:grDevices,
> package:graphics, package:utils, package:stats, package:methods, Autoloads,
> package:base
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] orthographic mistake in ?citation (PR#9901)

2007-09-07 Thread lbraglia

from ?citation


Details:

 The R core development team and the very active community of
 package authors have invested a lot of time and effort in creating
 R as it is today. Please give credit where credit is due and cite
 R and R packages when you use them for data anlysis.
 ^^^


--> change "anlysis" to "analysis".


Bye  
   
 

<>



--please do not edit the information below--

Version:
 platform = i486-pc-linux-gnu
 arch = i486
 os = linux-gnu
 system = i486, linux-gnu
 status = 
 major = 2
 minor = 5.1
 year = 2007
 month = 06
 day = 27
 svn rev = 42083
 language = R
 version.string = R version 2.5.1 (2007-06-27)

Locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
 

Search Path:
 .GlobalEnv, package:datasets, package:rcompgen, package:grDevices,
package:graphics, package:utils, package:stats, package:methods, Autoloads,
package:base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rcmd install on Vista

2007-09-07 Thread Gabor Grothendieck
Has anyone successfully used Rcmd install or Rcmd check on
Windows Vista?  I have been successfully been running R itself,
just not Rcmd install and Rcmd check.

Rcmd check fails, the Ryacas.Rcheck it creates is
read-only, I don't have permission to delete it and I have
to reset the permissions on it just to delete it.

With Rcmd install I am getting the following which looks like I
am running into permissions problems and, of course, these also
show up in Rcmd check.

I had installed R into C:\Program Files\R\R-2.6.0 and my library
location is in C:/Users/... as seen below.

I found I had to set tmpdir and R_LIBS appropriately to run Rcmd
check and install to even get this far.

--- Making package Ryacas 
   adding build stamp to DESCRIPTION
   installing R files
   installing demos
   installing inst files
find: C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/doc: Permission denied
find: C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/yacdir:
Permission denied
...
make[2]: *** [C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/inst] Error 1
make[1]: *** [all] Error 2
make: *** [pkg-Ryacas] Error 2
*** installation of Ryacas failed ***

Removing 'C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas'
Can't read C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/doc:
Invalid argument at C:\PROGRA~1\R\R-26~1.0/bin/install line 434
Can't remove directory
C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/doc: Directory not
empty at C:\PROGRA~1\R\R-26~1.0/bin/install line 434
... and additionl similar lines ...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel